# GCP Private Service Connect for Databricks

{% hint style="info" %}
**Private preview**: This feature is available to select accounts. Contact your Immuta representative for details.
{% endhint %}

[GCP Private Service Connect](https://docs.cloud.google.com/vpc/docs/private-service-connect) provides private connectivity from the Immuta SaaS platform to Databricks accounts hosted on Google Cloud Platform (GCP). It ensures that all traffic to the configured endpoints only traverses private networks over the Immuta private cloud exchange. This front-end Private Service Connect connection allows users to connect to the Databricks web application, REST API, and Databricks Connect API over a VPC endpoint.

<figure><img src="https://1751699907-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FlWBda5Pt4s8apEhzXGl7%2Fuploads%2FcCkKFwbDUMK7zISayrEq%2FDocumentation%20Private%20Link%20Diagrams%20-%20GCP%20Databricks.png?alt=media&#x26;token=29099365-0b1d-4918-a310-9e1f6aa06326" alt=""><figcaption></figcaption></figure>

### Requirements

Ensure that your accounts meet the following requirements:

* You have an Immuta SaaS tenant.
* Your Databricks workspace is hosted on Google Cloud Platform (GCP) that has [been created with private access configured](https://docs.databricks.com/gcp/en/security/network/front-end/front-end-private-connect#requirements-and-limitations).
* You have your Databricks Enterprise account ID.

This process will require configuring a service account in GCP and administrative access to the Databricks account in GCP. For details about Databricks authentication with Google Identity, see the [Databricks documentation on Google ID authentication](https://docs.databricks.com/gcp/en/dev-tools/auth/authentication-google-id).

### Configure Databricks with GCP Private Service Connect

Follow these steps to establish private connectivity between Immuta and your Databricks environment:

1. Create a service account in GCP. Ensure that a principal (either a user or a different service account) has the `roles/iam.serviceAccountTokenCreator` role attached for this newly created service account. For more information, refer to the [GCP documentation on service account impersonation](https://docs.cloud.google.com/iam/docs/service-account-permissions#impersonate).
2. Add the newly created service account email to your Databricks account with admin rights to be able to add network endpoints. For guidance, see the [Databricks documentation on adding user accounts](https://docs.databricks.com/gcp/en/admin/users-groups/users#add-user-account).
3. Open an [Immuta support ticket](https://support.immuta.com/) and provide the following information:
   * Service account email
   * Databricks account ID (See [Locate your Databricks account ID](https://docs.databricks.com/aws/en/admin/account-settings/#locate-your-account-id))
   * GCP region(s) and workspace URLs in each region
4. Immuta will create the Private Service Connect (PSC) endpoints in the different regions that contain your workspaces and attach a role to the provided service account that allows it to view the created VPC endpoints. Immuta will then provide you the following details:
   * VPC endpoint ID and region
   * Immuta project ID
5. Run the script (or manually make the [necessary API calls to Databricks](https://docs.databricks.com/api/gcp/account/privateaccess/create)) to connect the Immuta-created PSC endpoints to your Databricks account using all of the information provided by the Immuta support team.

   To run the script, you will need to have `gcloud`, `curl`, and `jq` installed and be logged in with a principal that can impersonate the service account that was provided to Immuta.

{% code lineNumbers="true" expandable="true" %}

```bash
#!/bin/bash

# Script to connect a GCP PSC endpoint to a Databricks account by
# impersonating a service account and retrieving tokens.
#
# Usage:
#   ./accept-databricks-psc.sh -s SERVICE_ACCOUNT_EMAIL \
#       -d DATABRICKS_ACCOUNT_ID \
#       -e ENDPOINT_NAME \
#       -r ENDPOINT_REGION \
#       -p PROJECT_ID \
#       [OPTIONS]
#
# Example:
#   ./accept-databricks-psc.sh -s psc-admin@my-project.iam.gserviceaccount.com \
#       -d 12345678-90ab-cdef-1234-567890abcdef \
#       -e dbx-company-project-region \
#       -r us-east4 \
#       -p my-gcp-project \
#       -j # (with JSON logging)

set -euo pipefail

# Default values
LOG_FORMAT="text"

# Function to display usage
usage() {
    cat << EOF
Usage: $0 -s SERVICE_ACCOUNT_EMAIL [OPTIONS]

Verify gcloud setup and service account impersonation, retrieving access and ID tokens.

Required:
    -s SERVICE_ACCOUNT        Service account email to impersonate
    -d DATABRICKS_ACCOUNT_ID  Databricks account ID (UUID format)
    -e ENDPOINT_NAME          Name of the VPC endpoint to attach (e.g., "dbx-company-project-region")
    -r ENDPOINT_REGION        Region where the VPC endpoint is located (e.g., "us-east4")
    -p PROJECT_ID             GCP project ID where the VPC endpoint is located (Immuta's SaaS project ID)
Options:
    -h                              Show this help message
    -j                              Enable JSON logging format
    -D DATABRICKS_ENDPOINT_NAME     Databricks Endpoint Name (defaults to "immuta-<region>-psc-endpoint")

Example:
    # Basic usage
    $0 -s psc-admin@my-gcp-project.iam.gserviceaccount.com \
       -d 12345678-90ab-cdef-1234-567890abcdef \
       -e dbx-company-project-region \
       -r us-east4 \
       -p immuta-gcp-project

EOF
}

DATABRICKS_ENDPOINT_NAME=""

# Parse command line arguments
while getopts "s:d:e:r:p:D:jh" opt; do
  case ${opt} in
    s)
      SERVICE_ACCOUNT="${OPTARG}"
      ;;
    d)
      DATABRICKS_ACCOUNT_ID="${OPTARG}"
      ;;
    e)
      ENDPOINT_NAME="${OPTARG}"
      ;;
    r)
      ENDPOINT_REGION="${OPTARG}"
      ;;
    p)
      PROJECT_ID="${OPTARG}"
      ;;
    D)
      DATABRICKS_ENDPOINT_NAME="${OPTARG}"
      ;;
    j)
      LOG_FORMAT="json"
      ;;
    h)
      usage
      exit 0
      ;;
    \?)
      echo "Error: Invalid option -${OPTARG}" >&2
      echo "Run '$0 -h' for usage information"
      exit 1
      ;;
    :)
      echo "Error: Option -${OPTARG} requires an argument" >&2
      exit 1
      ;;
  esac
done

# Check if service account was provided
if [[ -z "${SERVICE_ACCOUNT:-}" ]]; then
  echo "Error: Service account email is required"
  usage
  exit 1
fi

# Check if Databricks account ID was provided
if [[ -z "${DATABRICKS_ACCOUNT_ID:-}" ]]; then
  echo "Error: Databricks account ID is required"
  usage
  exit 1
fi

# Check if VPC endpoint name was provided
if [[ -z "${ENDPOINT_NAME:-}" ]]; then
  echo "Error: VPC endpoint name is required"
  usage
  exit 1
fi

# Check if VPC endpoint region was provided
if [[ -z "${ENDPOINT_REGION:-}" ]]; then
  echo "Error: VPC endpoint region is required"
  usage
  exit 1
fi

# Check if GCP project ID was provided
if [[ -z "${PROJECT_ID:-}" ]]; then
  echo "Error: GCP project ID is required"
  usage
  exit 1
fi

# Check if Databricks endpoint name was provided, if not set default name
if [[ -z "${DATABRICKS_ENDPOINT_NAME:-}" ]]; then
  DATABRICKS_ENDPOINT_NAME="immuta-${ENDPOINT_REGION}-psc-endpoint"
fi

# Function to log with simplified or JSON format
log() {
  local level="${1:-INFO}"
  # Shift the first argument (log level) so that $* does not include the level
  shift || true
  local message="$*"
  local timestamp
  # Use portable date format (macOS date doesn't support %N)
  if date --version &>/dev/null; then
    # GNU date
    timestamp=$(date -u +"%Y-%m-%dT%H:%M:%S.%3NZ")
  else
    # BSD date (macOS)
    timestamp=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
  fi

  if [[ "${LOG_FORMAT}" == "json" ]]; then
    # JSON format: timestamp, level, and message (properly escaped)
    jq -nc --arg ts "${timestamp}" --arg lvl "${level}" --arg msg "${message}" \
      '{timestamp: $ts, level: $lvl, message: $msg}'
  else
    # Color codes for different log levels
    local color_reset='\033[0m'
    local color_level=""

    case "${level}" in
      INFO)
        color_level='\033[0;36m'  # Cyan
        ;;
      WARN)
        color_level='\033[0;33m'  # Yellow
        ;;
      ERROR)
        color_level='\033[0;31m'  # Red
        ;;
      DEBUG)
        color_level='\033[0;90m'  # Gray
        ;;
      *)
        color_level='\033[0m'     # No color
        ;;
    esac

    # Simple format with color: [timestamp] [level] message
    printf "[%s] [${color_level}%s${color_reset}] %s\n" "${timestamp}" "${level}" "${message}"
  fi
}

if [[ "${LOG_FORMAT}" == "json" ]]; then
  log INFO "Starting Immuta Databricks GCP PSC Acceptance Script"
else
  log INFO "================================================"
  log INFO "Immuta Databricks GCP PSC Acceptance Script"
  log INFO "================================================"
fi
log INFO "Service Account: ${SERVICE_ACCOUNT}"
log INFO "Databricks Account ID: ${DATABRICKS_ACCOUNT_ID}"
log INFO "VPC Endpoint Name: ${ENDPOINT_NAME}"
log INFO "VPC Endpoint Region: ${ENDPOINT_REGION}"
log INFO "Databricks Endpoint Name: ${DATABRICKS_ENDPOINT_NAME}"

# Step 1: Check if gcloud, curl, and jq are installed
log INFO "[1/6] Checking if dependencies are installed..."
if ! command -v gcloud &> /dev/null; then
  log ERROR "❌ ERROR: gcloud CLI is not installed"
  log ERROR "   Please install it from: https://cloud.google.com/sdk/docs/install"
  exit 1
fi
if ! command -v curl &> /dev/null; then
  log ERROR "❌ ERROR: curl is not installed"
  log ERROR "   Please install it using your package manager (e.g., apt, yum, brew)"
  exit 1
fi
if ! command -v jq &> /dev/null; then
  log ERROR "❌ ERROR: jq is not installed"
  log ERROR "   Please install it using your package manager (e.g., apt, yum, brew)"
  exit 1
fi
GCLOUD_VERSION=$(gcloud version --format="value(core)" 2>/dev/null || echo "unknown")
log INFO "✅ gcloud is installed (version: ${GCLOUD_VERSION})"
if [[ $LOG_FORMAT != "json" ]]; then log INFO ""; fi

# Step 2: Check if user is logged in
log INFO "[2/6] Checking if you are logged in to gcloud..."
CURRENT_ACCOUNT=$(gcloud config get-value account 2>/dev/null || echo "")
if [[ -z "${CURRENT_ACCOUNT}" ]]; then
  log ERROR "❌ ERROR: Not logged in to gcloud"
  log ERROR "   Run: gcloud auth login"
  exit 1
fi
log INFO "✅ Logged in as: ${CURRENT_ACCOUNT}"
if [[ $LOG_FORMAT != "json" ]]; then log INFO ""; fi

# Step 3: Verify impersonation by getting an access token
log INFO "[3/6] Verifying service account impersonation (access token)..."
set +e  # Temporarily disable exit on error
GCLOUD_OUTPUT=$(gcloud auth print-access-token \
  --impersonate-service-account="${SERVICE_ACCOUNT}" \
  --verbosity=error 2>&1)
GCLOUD_EXIT=$?
set -e  # Re-enable exit on error

if [[ ${GCLOUD_EXIT} -ne 0 ]]; then
  log ERROR "Failed to impersonate service account for access token"
  log ERROR "${GCLOUD_OUTPUT}"
  # while IFS= read -r line; do
  #   [[ -n "${line}" ]] && log ERROR "${line}"
  # done <<< "${GCLOUD_OUTPUT}"
  exit 1
fi
ACCESS_TOKEN="${GCLOUD_OUTPUT}"
log INFO "✅ Successfully obtained access token"
if [[ $LOG_FORMAT != "json" ]]; then log INFO ""; fi

# Step 4: Get an ID token
log INFO "[4/6] Getting ID token for service account..."
set +e  # Temporarily disable exit on error
GCLOUD_OUTPUT=$(gcloud auth print-identity-token \
  --impersonate-service-account="${SERVICE_ACCOUNT}" \
  --include-email \
  --verbosity=error \
  --audiences="https://accounts.gcp.databricks.com" 2>&1)
GCLOUD_EXIT=$?
set -e  # Re-enable exit on error

if [[ ${GCLOUD_EXIT} -ne 0 ]]; then
  log ERROR "Failed to get ID token"
  while IFS= read -r line; do
    [[ -n "${line}" ]] && log ERROR "${line}"
  done <<< "${GCLOUD_OUTPUT}"
  exit 1
fi
ID_TOKEN="${GCLOUD_OUTPUT}"
log INFO "✅ Successfully obtained ID token"
if [[ $LOG_FORMAT != "json" ]]; then log INFO ""; fi

# Step 5: List existing VPC endpoints
log INFO "[5/6] Listing existing VPC endpoints..."
if ! RESULT=$(curl -s -XGET \
  --header "Authorization: Bearer ${ID_TOKEN}" \
  "https://accounts.gcp.databricks.com/api/2.0/accounts/${DATABRICKS_ACCOUNT_ID}/vpc-endpoints"); then
  log ERROR "❌ ERROR: Failed to call Databricks API to list VPC endpoints"
  log ERROR "   ${RESULT}"
  exit 1
fi
echo "${RESULT}" | jq -er '.[] | select(.gcp_vpc_endpoint_info.psc_endpoint_name == "'"${ENDPOINT_NAME}"'")' > /dev/null && {
  log ERROR "❌ Existing VPC endpoint found with name '${ENDPOINT_NAME}'"
  log ERROR "    If this is unexpected, please delete the existing endpoint in the Databricks console and try again"
  exit 1
}
log INFO "✅ No existing VPC endpoint with name '${ENDPOINT_NAME}' found"
if [[ $LOG_FORMAT != "json" ]]; then log INFO ""; fi

log INFO "[6/6] Creating VPC endpoint attachment..."
REQUEST=$(cat <<EOF
{
  "gcp_vpc_endpoint_info": {
    "endpoint_region": "${ENDPOINT_REGION}",
    "project_id": "${PROJECT_ID}",
    "psc_endpoint_name": "${ENDPOINT_NAME}"
  },
  "vpc_endpoint_name": "${DATABRICKS_ENDPOINT_NAME}"
}
EOF
)
RESULT=$(curl -s -XPOST \
  -d "${REQUEST}" \
  --header "Content-Type: application/json" \
  --header "Authorization: Bearer ${ID_TOKEN}" \
  --header "X-Databricks-GCP-SA-Access-Token: ${ACCESS_TOKEN}" \
  "https://accounts.gcp.databricks.com/api/2.0/accounts/${DATABRICKS_ACCOUNT_ID}/vpc-endpoints")
if [[ "$(echo "${RESULT}" | jq -r '.error_code // empty')" != "" ]]; then
  log ERROR "❌ ERROR: Failed to create VPC endpoint attachment"
  log ERROR "   ${RESULT}"
  exit 1
fi


# Output based on mode
if [[ "${LOG_FORMAT}" == "json" ]]; then
  log INFO "✅ VPC Endpoint Attachment Created"
else
  # Display token information
  log INFO "================================================"
  log INFO "✅ SUCCESS - VPC Endpoint Attachment Created"
  log INFO "================================================"
  log INFO "The Endpoint has been attached to your account in Databricks."
fi
```

{% endcode %}

6. Validate that any private access settings attached to your workspaces that need connectivity have the newly created endpoints added (either accepting the **Account** **level** or [adding the specific endpoints to the private access setting](https://docs.databricks.com/gcp/en/security/network/front-end/front-end-private-connect#step-4-create-a-databricks-private-access-settings-object)).

After these steps, you should be able to connect your Immuta tenant to Databricks workspaces in GCP under the connected account.
