> For the complete documentation index, see [llms.txt](https://documentation.immuta.com/SaaS/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://documentation.immuta.com/SaaS/configuration/application-configuration/how-to-guides/private-networking-support/data-connection-private-networking/index/gcp-private-service-connect-for-databricks.md). # GCP Private Service Connect for Databricks {% hint style="info" %} **Private preview**: This feature is available to select accounts. Contact your Immuta representative for details. {% endhint %} [GCP Private Service Connect](https://docs.cloud.google.com/vpc/docs/private-service-connect) provides private connectivity from the Immuta SaaS platform to Databricks accounts hosted on Google Cloud Platform (GCP). It ensures that all traffic to the configured endpoints only traverses private networks over the Immuta private cloud exchange. This front-end Private Service Connect connection allows users to connect to the Databricks web application, REST API, and Databricks Connect API over a VPC endpoint.

### Requirements Ensure that your accounts meet the following requirements: * You have an Immuta SaaS tenant. * Your Databricks workspace is hosted on Google Cloud Platform (GCP) that has [been created with private access configured](https://docs.databricks.com/gcp/en/security/network/front-end/front-end-private-connect#requirements-and-limitations). * You have your Databricks Enterprise account ID. This process will require configuring a service account in GCP and administrative access to the Databricks account in GCP. For details about Databricks authentication with Google Identity, see the [Databricks documentation on Google ID authentication](https://docs.databricks.com/gcp/en/dev-tools/auth/authentication-google-id). ### Configure Databricks with GCP Private Service Connect Follow these steps to establish private connectivity between Immuta and your Databricks environment: 1. Create a service account in GCP. Ensure that a principal (either a user or a different service account) has the `roles/iam.serviceAccountTokenCreator` role attached for this newly created service account. For more information, refer to the [GCP documentation on service account impersonation](https://docs.cloud.google.com/iam/docs/service-account-permissions#impersonate). 2. Add the newly created service account email to your Databricks account with admin rights to be able to add network endpoints. For guidance, see the [Databricks documentation on adding user accounts](https://docs.databricks.com/gcp/en/admin/users-groups/users#add-user-account). 3. Open an [Immuta support ticket](https://support.immuta.com/) and provide the following information: * Service account email * Databricks account ID (See [Locate your Databricks account ID](https://docs.databricks.com/aws/en/admin/account-settings/#locate-your-account-id)) * GCP region(s) and workspace URLs in each region 4. Immuta will create the Private Service Connect (PSC) endpoints in the different regions that contain your workspaces and attach a role to the provided service account that allows it to view the created VPC endpoints. Immuta will then provide you the following details: * VPC endpoint ID and region * Immuta project ID 5. Run the script (or manually make the [necessary API calls to Databricks](https://docs.databricks.com/api/gcp/account/privateaccess/create)) to connect the Immuta-created PSC endpoints to your Databricks account using all of the information provided by the Immuta support team. To run the script, you will need to have `gcloud`, `curl`, and `jq` installed and be logged in with a principal that can impersonate the service account that was provided to Immuta. {% code lineNumbers="true" expandable="true" %} ```bash #!/bin/bash # Script to connect a GCP PSC endpoint to a Databricks account by # impersonating a service account and retrieving tokens. # # Usage: # ./accept-databricks-psc.sh -s SERVICE_ACCOUNT_EMAIL \ # -d DATABRICKS_ACCOUNT_ID \ # -e ENDPOINT_NAME \ # -r ENDPOINT_REGION \ # -p PROJECT_ID \ # [OPTIONS] # # Example: # ./accept-databricks-psc.sh -s psc-admin@my-project.iam.gserviceaccount.com \ # -d 12345678-90ab-cdef-1234-567890abcdef \ # -e dbx-company-project-region \ # -r us-east4 \ # -p my-gcp-project \ # -j # (with JSON logging) set -euo pipefail # Default values LOG_FORMAT="text" # Function to display usage usage() { cat << EOF Usage: $0 -s SERVICE_ACCOUNT_EMAIL [OPTIONS] Verify gcloud setup and service account impersonation, retrieving access and ID tokens. Required: -s SERVICE_ACCOUNT Service account email to impersonate -d DATABRICKS_ACCOUNT_ID Databricks account ID (UUID format) -e ENDPOINT_NAME Name of the VPC endpoint to attach (e.g., "dbx-company-project-region") -r ENDPOINT_REGION Region where the VPC endpoint is located (e.g., "us-east4") -p PROJECT_ID GCP project ID where the VPC endpoint is located (Immuta's SaaS project ID) Options: -h Show this help message -j Enable JSON logging format -D DATABRICKS_ENDPOINT_NAME Databricks Endpoint Name (defaults to "immuta--psc-endpoint") Example: # Basic usage $0 -s psc-admin@my-gcp-project.iam.gserviceaccount.com \ -d 12345678-90ab-cdef-1234-567890abcdef \ -e dbx-company-project-region \ -r us-east4 \ -p immuta-gcp-project EOF } DATABRICKS_ENDPOINT_NAME="" # Parse command line arguments while getopts "s:d:e:r:p:D:jh" opt; do case ${opt} in s) SERVICE_ACCOUNT="${OPTARG}" ;; d) DATABRICKS_ACCOUNT_ID="${OPTARG}" ;; e) ENDPOINT_NAME="${OPTARG}" ;; r) ENDPOINT_REGION="${OPTARG}" ;; p) PROJECT_ID="${OPTARG}" ;; D) DATABRICKS_ENDPOINT_NAME="${OPTARG}" ;; j) LOG_FORMAT="json" ;; h) usage exit 0 ;; \?) echo "Error: Invalid option -${OPTARG}" >&2 echo "Run '$0 -h' for usage information" exit 1 ;; :) echo "Error: Option -${OPTARG} requires an argument" >&2 exit 1 ;; esac done # Check if service account was provided if [[ -z "${SERVICE_ACCOUNT:-}" ]]; then echo "Error: Service account email is required" usage exit 1 fi # Check if Databricks account ID was provided if [[ -z "${DATABRICKS_ACCOUNT_ID:-}" ]]; then echo "Error: Databricks account ID is required" usage exit 1 fi # Check if VPC endpoint name was provided if [[ -z "${ENDPOINT_NAME:-}" ]]; then echo "Error: VPC endpoint name is required" usage exit 1 fi # Check if VPC endpoint region was provided if [[ -z "${ENDPOINT_REGION:-}" ]]; then echo "Error: VPC endpoint region is required" usage exit 1 fi # Check if GCP project ID was provided if [[ -z "${PROJECT_ID:-}" ]]; then echo "Error: GCP project ID is required" usage exit 1 fi # Check if Databricks endpoint name was provided, if not set default name if [[ -z "${DATABRICKS_ENDPOINT_NAME:-}" ]]; then DATABRICKS_ENDPOINT_NAME="immuta-${ENDPOINT_REGION}-psc-endpoint" fi # Function to log with simplified or JSON format log() { local level="${1:-INFO}" # Shift the first argument (log level) so that $* does not include the level shift || true local message="$*" local timestamp # Use portable date format (macOS date doesn't support %N) if date --version &>/dev/null; then # GNU date timestamp=$(date -u +"%Y-%m-%dT%H:%M:%S.%3NZ") else # BSD date (macOS) timestamp=$(date -u +"%Y-%m-%dT%H:%M:%SZ") fi if [[ "${LOG_FORMAT}" == "json" ]]; then # JSON format: timestamp, level, and message (properly escaped) jq -nc --arg ts "${timestamp}" --arg lvl "${level}" --arg msg "${message}" \ '{timestamp: $ts, level: $lvl, message: $msg}' else # Color codes for different log levels local color_reset='\033[0m' local color_level="" case "${level}" in INFO) color_level='\033[0;36m' # Cyan ;; WARN) color_level='\033[0;33m' # Yellow ;; ERROR) color_level='\033[0;31m' # Red ;; DEBUG) color_level='\033[0;90m' # Gray ;; *) color_level='\033[0m' # No color ;; esac # Simple format with color: [timestamp] [level] message printf "[%s] [${color_level}%s${color_reset}] %s\n" "${timestamp}" "${level}" "${message}" fi } if [[ "${LOG_FORMAT}" == "json" ]]; then log INFO "Starting Immuta Databricks GCP PSC Acceptance Script" else log INFO "================================================" log INFO "Immuta Databricks GCP PSC Acceptance Script" log INFO "================================================" fi log INFO "Service Account: ${SERVICE_ACCOUNT}" log INFO "Databricks Account ID: ${DATABRICKS_ACCOUNT_ID}" log INFO "VPC Endpoint Name: ${ENDPOINT_NAME}" log INFO "VPC Endpoint Region: ${ENDPOINT_REGION}" log INFO "Databricks Endpoint Name: ${DATABRICKS_ENDPOINT_NAME}" # Step 1: Check if gcloud, curl, and jq are installed log INFO "[1/6] Checking if dependencies are installed..." if ! command -v gcloud &> /dev/null; then log ERROR "❌ ERROR: gcloud CLI is not installed" log ERROR " Please install it from: https://cloud.google.com/sdk/docs/install" exit 1 fi if ! command -v curl &> /dev/null; then log ERROR "❌ ERROR: curl is not installed" log ERROR " Please install it using your package manager (e.g., apt, yum, brew)" exit 1 fi if ! command -v jq &> /dev/null; then log ERROR "❌ ERROR: jq is not installed" log ERROR " Please install it using your package manager (e.g., apt, yum, brew)" exit 1 fi GCLOUD_VERSION=$(gcloud version --format="value(core)" 2>/dev/null || echo "unknown") log INFO "✅ gcloud is installed (version: ${GCLOUD_VERSION})" if [[ $LOG_FORMAT != "json" ]]; then log INFO ""; fi # Step 2: Check if user is logged in log INFO "[2/6] Checking if you are logged in to gcloud..." CURRENT_ACCOUNT=$(gcloud config get-value account 2>/dev/null || echo "") if [[ -z "${CURRENT_ACCOUNT}" ]]; then log ERROR "❌ ERROR: Not logged in to gcloud" log ERROR " Run: gcloud auth login" exit 1 fi log INFO "✅ Logged in as: ${CURRENT_ACCOUNT}" if [[ $LOG_FORMAT != "json" ]]; then log INFO ""; fi # Step 3: Verify impersonation by getting an access token log INFO "[3/6] Verifying service account impersonation (access token)..." set +e # Temporarily disable exit on error GCLOUD_OUTPUT=$(gcloud auth print-access-token \ --impersonate-service-account="${SERVICE_ACCOUNT}" \ --verbosity=error 2>&1) GCLOUD_EXIT=$? set -e # Re-enable exit on error if [[ ${GCLOUD_EXIT} -ne 0 ]]; then log ERROR "Failed to impersonate service account for access token" log ERROR "${GCLOUD_OUTPUT}" # while IFS= read -r line; do # [[ -n "${line}" ]] && log ERROR "${line}" # done <<< "${GCLOUD_OUTPUT}" exit 1 fi ACCESS_TOKEN="${GCLOUD_OUTPUT}" log INFO "✅ Successfully obtained access token" if [[ $LOG_FORMAT != "json" ]]; then log INFO ""; fi # Step 4: Get an ID token log INFO "[4/6] Getting ID token for service account..." set +e # Temporarily disable exit on error GCLOUD_OUTPUT=$(gcloud auth print-identity-token \ --impersonate-service-account="${SERVICE_ACCOUNT}" \ --include-email \ --verbosity=error \ --audiences="https://accounts.gcp.databricks.com" 2>&1) GCLOUD_EXIT=$? set -e # Re-enable exit on error if [[ ${GCLOUD_EXIT} -ne 0 ]]; then log ERROR "Failed to get ID token" while IFS= read -r line; do [[ -n "${line}" ]] && log ERROR "${line}" done <<< "${GCLOUD_OUTPUT}" exit 1 fi ID_TOKEN="${GCLOUD_OUTPUT}" log INFO "✅ Successfully obtained ID token" if [[ $LOG_FORMAT != "json" ]]; then log INFO ""; fi # Step 5: List existing VPC endpoints log INFO "[5/6] Listing existing VPC endpoints..." if ! RESULT=$(curl -s -XGET \ --header "Authorization: Bearer ${ID_TOKEN}" \ "https://accounts.gcp.databricks.com/api/2.0/accounts/${DATABRICKS_ACCOUNT_ID}/vpc-endpoints"); then log ERROR "❌ ERROR: Failed to call Databricks API to list VPC endpoints" log ERROR " ${RESULT}" exit 1 fi echo "${RESULT}" | jq -er '.[] | select(.gcp_vpc_endpoint_info.psc_endpoint_name == "'"${ENDPOINT_NAME}"'")' > /dev/null && { log ERROR "❌ Existing VPC endpoint found with name '${ENDPOINT_NAME}'" log ERROR " If this is unexpected, please delete the existing endpoint in the Databricks console and try again" exit 1 } log INFO "✅ No existing VPC endpoint with name '${ENDPOINT_NAME}' found" if [[ $LOG_FORMAT != "json" ]]; then log INFO ""; fi log INFO "[6/6] Creating VPC endpoint attachment..." REQUEST=$(cat < ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.