Private preview: This feature is available to select accounts. Contact your Immuta representative for details.
GCP Private Service Connect provides private connectivity from the Immuta SaaS platform to Databricks accounts hosted on Google Cloud Platform (GCP). It ensures that all traffic to the configured endpoints only traverses private networks over the Immuta private cloud exchange. This front-end Private Service Connect connection allows users to connect to the Databricks web application, REST API, and Databricks Connect API over a VPC endpoint.
Requirements
Ensure that your accounts meet the following requirements:
This process will require configuring a service account in GCP and administrative access to the Databricks account in GCP. For details about Databricks authentication with Google Identity, see the Databricks documentation on Google ID authentication.
Configure Databricks with GCP Private Service Connect
Follow these steps to establish private connectivity between Immuta and your Databricks environment:
Create a service account in GCP. Ensure that a principal (either a user or a different service account) has the roles/iam.serviceAccountTokenCreator role attached for this newly created service account. For more information, refer to the GCP documentation on service account impersonation.
Add the newly created service account email to your Databricks account with admin rights to be able to add network endpoints. For guidance, see the Databricks documentation on adding user accounts.
Immuta will create the Private Service Connect (PSC) endpoints in the different regions that contain your workspaces and attach a role to the provided service account that allows it to view the created VPC endpoints. Immuta will then provide you the following details:
VPC endpoint ID and region
Immuta project ID
Run the script (or manually make the necessary API calls to Databricks) to connect the Immuta-created PSC endpoints to your Databricks account using all of the information provided by the Immuta support team.
To run the script, you will need to have gcloud, curl, and jq installed and be logged in with a principal that can impersonate the service account that was provided to Immuta.
#!/bin/bash
# Script to connect a GCP PSC endpoint to a Databricks account by
# impersonating a service account and retrieving tokens.
#
# Usage:
# ./accept-databricks-psc.sh -s SERVICE_ACCOUNT_EMAIL \
# -d DATABRICKS_ACCOUNT_ID \
# -e ENDPOINT_NAME \
# -r ENDPOINT_REGION \
# -p PROJECT_ID \
# [OPTIONS]
#
# Example:
# ./accept-databricks-psc.sh -s [email protected] \
# -d 12345678-90ab-cdef-1234-567890abcdef \
# -e dbx-company-project-region \
# -r us-east4 \
# -p my-gcp-project \
# -j # (with JSON logging)
set -euo pipefail
# Default values
LOG_FORMAT="text"
# Function to display usage
usage() {
cat << EOF
Usage: $0 -s SERVICE_ACCOUNT_EMAIL [OPTIONS]
Verify gcloud setup and service account impersonation, retrieving access and ID tokens.
Required:
-s SERVICE_ACCOUNT Service account email to impersonate
-d DATABRICKS_ACCOUNT_ID Databricks account ID (UUID format)
-e ENDPOINT_NAME Name of the VPC endpoint to attach (e.g., "dbx-company-project-region")
-r ENDPOINT_REGION Region where the VPC endpoint is located (e.g., "us-east4")
-p PROJECT_ID GCP project ID where the VPC endpoint is located (Immuta's SaaS project ID)
Options:
-h Show this help message
-j Enable JSON logging format
-D DATABRICKS_ENDPOINT_NAME Databricks Endpoint Name (defaults to "immuta-<region>-psc-endpoint")
Example:
# Basic usage
$0 -s [email protected] \
-d 12345678-90ab-cdef-1234-567890abcdef \
-e dbx-company-project-region \
-r us-east4 \
-p immuta-gcp-project
EOF
}
DATABRICKS_ENDPOINT_NAME=""
# Parse command line arguments
while getopts "s:d:e:r:p:D:jh" opt; do
case ${opt} in
s)
SERVICE_ACCOUNT="${OPTARG}"
;;
d)
DATABRICKS_ACCOUNT_ID="${OPTARG}"
;;
e)
ENDPOINT_NAME="${OPTARG}"
;;
r)
ENDPOINT_REGION="${OPTARG}"
;;
p)
PROJECT_ID="${OPTARG}"
;;
D)
DATABRICKS_ENDPOINT_NAME="${OPTARG}"
;;
j)
LOG_FORMAT="json"
;;
h)
usage
exit 0
;;
\?)
echo "Error: Invalid option -${OPTARG}" >&2
echo "Run '$0 -h' for usage information"
exit 1
;;
:)
echo "Error: Option -${OPTARG} requires an argument" >&2
exit 1
;;
esac
done
# Check if service account was provided
if [[ -z "${SERVICE_ACCOUNT:-}" ]]; then
echo "Error: Service account email is required"
usage
exit 1
fi
# Check if Databricks account ID was provided
if [[ -z "${DATABRICKS_ACCOUNT_ID:-}" ]]; then
echo "Error: Databricks account ID is required"
usage
exit 1
fi
# Check if VPC endpoint name was provided
if [[ -z "${ENDPOINT_NAME:-}" ]]; then
echo "Error: VPC endpoint name is required"
usage
exit 1
fi
# Check if VPC endpoint region was provided
if [[ -z "${ENDPOINT_REGION:-}" ]]; then
echo "Error: VPC endpoint region is required"
usage
exit 1
fi
# Check if GCP project ID was provided
if [[ -z "${PROJECT_ID:-}" ]]; then
echo "Error: GCP project ID is required"
usage
exit 1
fi
# Check if Databricks endpoint name was provided, if not set default name
if [[ -z "${DATABRICKS_ENDPOINT_NAME:-}" ]]; then
DATABRICKS_ENDPOINT_NAME="immuta-${ENDPOINT_REGION}-psc-endpoint"
fi
# Function to log with simplified or JSON format
log() {
local level="${1:-INFO}"
# Shift the first argument (log level) so that $* does not include the level
shift || true
local message="$*"
local timestamp
# Use portable date format (macOS date doesn't support %N)
if date --version &>/dev/null; then
# GNU date
timestamp=$(date -u +"%Y-%m-%dT%H:%M:%S.%3NZ")
else
# BSD date (macOS)
timestamp=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
fi
if [[ "${LOG_FORMAT}" == "json" ]]; then
# JSON format: timestamp, level, and message (properly escaped)
jq -nc --arg ts "${timestamp}" --arg lvl "${level}" --arg msg "${message}" \
'{timestamp: $ts, level: $lvl, message: $msg}'
else
# Color codes for different log levels
local color_reset='\033[0m'
local color_level=""
case "${level}" in
INFO)
color_level='\033[0;36m' # Cyan
;;
WARN)
color_level='\033[0;33m' # Yellow
;;
ERROR)
color_level='\033[0;31m' # Red
;;
DEBUG)
color_level='\033[0;90m' # Gray
;;
*)
color_level='\033[0m' # No color
;;
esac
# Simple format with color: [timestamp] [level] message
printf "[%s] [${color_level}%s${color_reset}] %s\n" "${timestamp}" "${level}" "${message}"
fi
}
if [[ "${LOG_FORMAT}" == "json" ]]; then
log INFO "Starting Immuta Databricks GCP PSC Acceptance Script"
else
log INFO "================================================"
log INFO "Immuta Databricks GCP PSC Acceptance Script"
log INFO "================================================"
fi
log INFO "Service Account: ${SERVICE_ACCOUNT}"
log INFO "Databricks Account ID: ${DATABRICKS_ACCOUNT_ID}"
log INFO "VPC Endpoint Name: ${ENDPOINT_NAME}"
log INFO "VPC Endpoint Region: ${ENDPOINT_REGION}"
log INFO "Databricks Endpoint Name: ${DATABRICKS_ENDPOINT_NAME}"
# Step 1: Check if gcloud, curl, and jq are installed
log INFO "[1/6] Checking if dependencies are installed..."
if ! command -v gcloud &> /dev/null; then
log ERROR "❌ ERROR: gcloud CLI is not installed"
log ERROR " Please install it from: https://cloud.google.com/sdk/docs/install"
exit 1
fi
if ! command -v curl &> /dev/null; then
log ERROR "❌ ERROR: curl is not installed"
log ERROR " Please install it using your package manager (e.g., apt, yum, brew)"
exit 1
fi
if ! command -v jq &> /dev/null; then
log ERROR "❌ ERROR: jq is not installed"
log ERROR " Please install it using your package manager (e.g., apt, yum, brew)"
exit 1
fi
GCLOUD_VERSION=$(gcloud version --format="value(core)" 2>/dev/null || echo "unknown")
log INFO "✅ gcloud is installed (version: ${GCLOUD_VERSION})"
if [[ $LOG_FORMAT != "json" ]]; then log INFO ""; fi
# Step 2: Check if user is logged in
log INFO "[2/6] Checking if you are logged in to gcloud..."
CURRENT_ACCOUNT=$(gcloud config get-value account 2>/dev/null || echo "")
if [[ -z "${CURRENT_ACCOUNT}" ]]; then
log ERROR "❌ ERROR: Not logged in to gcloud"
log ERROR " Run: gcloud auth login"
exit 1
fi
log INFO "✅ Logged in as: ${CURRENT_ACCOUNT}"
if [[ $LOG_FORMAT != "json" ]]; then log INFO ""; fi
# Step 3: Verify impersonation by getting an access token
log INFO "[3/6] Verifying service account impersonation (access token)..."
set +e # Temporarily disable exit on error
GCLOUD_OUTPUT=$(gcloud auth print-access-token \
--impersonate-service-account="${SERVICE_ACCOUNT}" \
--verbosity=error 2>&1)
GCLOUD_EXIT=$?
set -e # Re-enable exit on error
if [[ ${GCLOUD_EXIT} -ne 0 ]]; then
log ERROR "Failed to impersonate service account for access token"
log ERROR "${GCLOUD_OUTPUT}"
# while IFS= read -r line; do
# [[ -n "${line}" ]] && log ERROR "${line}"
# done <<< "${GCLOUD_OUTPUT}"
exit 1
fi
ACCESS_TOKEN="${GCLOUD_OUTPUT}"
log INFO "✅ Successfully obtained access token"
if [[ $LOG_FORMAT != "json" ]]; then log INFO ""; fi
# Step 4: Get an ID token
log INFO "[4/6] Getting ID token for service account..."
set +e # Temporarily disable exit on error
GCLOUD_OUTPUT=$(gcloud auth print-identity-token \
--impersonate-service-account="${SERVICE_ACCOUNT}" \
--include-email \
--verbosity=error \
--audiences="https://accounts.gcp.databricks.com" 2>&1)
GCLOUD_EXIT=$?
set -e # Re-enable exit on error
if [[ ${GCLOUD_EXIT} -ne 0 ]]; then
log ERROR "Failed to get ID token"
while IFS= read -r line; do
[[ -n "${line}" ]] && log ERROR "${line}"
done <<< "${GCLOUD_OUTPUT}"
exit 1
fi
ID_TOKEN="${GCLOUD_OUTPUT}"
log INFO "✅ Successfully obtained ID token"
if [[ $LOG_FORMAT != "json" ]]; then log INFO ""; fi
# Step 5: List existing VPC endpoints
log INFO "[5/6] Listing existing VPC endpoints..."
if ! RESULT=$(curl -s -XGET \
--header "Authorization: Bearer ${ID_TOKEN}" \
"https://accounts.gcp.databricks.com/api/2.0/accounts/${DATABRICKS_ACCOUNT_ID}/vpc-endpoints"); then
log ERROR "❌ ERROR: Failed to call Databricks API to list VPC endpoints"
log ERROR " ${RESULT}"
exit 1
fi
echo "${RESULT}" | jq -er '.[] | select(.gcp_vpc_endpoint_info.psc_endpoint_name == "'"${ENDPOINT_NAME}"'")' > /dev/null && {
log ERROR "❌ Existing VPC endpoint found with name '${ENDPOINT_NAME}'"
log ERROR " If this is unexpected, please delete the existing endpoint in the Databricks console and try again"
exit 1
}
log INFO "✅ No existing VPC endpoint with name '${ENDPOINT_NAME}' found"
if [[ $LOG_FORMAT != "json" ]]; then log INFO ""; fi
log INFO "[6/6] Creating VPC endpoint attachment..."
REQUEST=$(cat <<EOF
{
"gcp_vpc_endpoint_info": {
"endpoint_region": "${ENDPOINT_REGION}",
"project_id": "${PROJECT_ID}",
"psc_endpoint_name": "${ENDPOINT_NAME}"
},
"vpc_endpoint_name": "${DATABRICKS_ENDPOINT_NAME}"
}
EOF
)
RESULT=$(curl -s -XPOST \
-d "${REQUEST}" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${ID_TOKEN}" \
--header "X-Databricks-GCP-SA-Access-Token: ${ACCESS_TOKEN}" \
"https://accounts.gcp.databricks.com/api/2.0/accounts/${DATABRICKS_ACCOUNT_ID}/vpc-endpoints")
if [[ "$(echo "${RESULT}" | jq -r '.error_code // empty')" != "" ]]; then
log ERROR "❌ ERROR: Failed to create VPC endpoint attachment"
log ERROR " ${RESULT}"
exit 1
fi
# Output based on mode
if [[ "${LOG_FORMAT}" == "json" ]]; then
log INFO "✅ VPC Endpoint Attachment Created"
else
# Display token information
log INFO "================================================"
log INFO "✅ SUCCESS - VPC Endpoint Attachment Created"
log INFO "================================================"
log INFO "The Endpoint has been attached to your account in Databricks."
fi