Register a Databricks Unity Catalog Connection

The connection API is a REST API which allows users to register a Databricks Unity Catalog to Immuta with a single set of credentials rather than configuring an integration and creating data sources separately. Then Immuta can manage and enforce access controls on your data through that connection. To manage your connection, see the Manage a connection reference guide.

Requirements

APPLICATION_ADMIN Immuta permission
The Databricks user registering the connection and running the script must have the following privileges:
- CREATE CATALOG privilege on the Unity Catalog metastore to create an Immuta-owned catalog and tables

See the Databricks documentation for more details about Unity Catalog privileges and securable objects.

Prerequisites

Unity Catalog metastore created and attached to a Databricks workspace.
Unity Catalog enabled on your Databricks cluster or SQL warehouse. All SQL warehouses have Unity Catalog enabled if your workspace is attached to a Unity Catalog metastore. Immuta recommends linking a SQL warehouse to your Immuta tenant rather than a cluster for both performance and availability reasons.

Complete the following steps to register a Databricks Unity Catalog connection:

Create a service principal in Databricks Unity Catalog with the proper Databricks privileges for Immuta to use to manage policies in Unity Catalog.
Set up Unity Catalog system tables for query audit.
Use the /integrations/scripts/create endpoint to receive a script.
Run the script in Databricks Unity Catalog.
Use the /data/connection endpoint to finish registering your connection in Immuta.

Step 1: Create your service principal

Create a Databricks service principal with the Databricks privileges listed below and set up with personal access token (PAT), which can be an on-behalf token created in Databricks, or OAuth machine-to-machine (M2M) authentication. Immuta uses this service principal continuously to orchestrate Unity Catalog policies and maintain state between Immuta and Databricks.

USE CATALOG and MANAGE on all catalogs containing securables registered as Immuta data sources.
USE SCHEMA on all schemas containing securables registered as Immuta data sources.
MODIFY and SELECT on all securables you want registered as Immuta data sources. The MODIFY privilege is not required for materialized views registered as Immuta data sources, since MODIFY is not a supported privilege on that object type in Databricks.

MANAGE and MODIFY are required so that the service principal can apply row filters and column masks on the securable; to do so, the service principal must also have SELECT on the securable as well as USE CATALOG on its parent catalog and USE SCHEMA on its parent schema. Since privileges are inherited, you can grant the service principal the MODIFY and SELECT privilege on all catalogs or schemas containing Immuta data sources, which automatically grants the service principal the MODIFY and SELECT privilege on all current and future securables in the catalog or schema. The service principal also inherits MANAGE from the parent catalog for the purpose of applying row filters and column masks, but that privilege must be set directly on the parent catalog in order for grants to be fully applied.

Step 2: Set up query audit

Grant the service principal from step 1 access to the Databricks Unity Catalog system tables. For Databricks Unity Catalog audit to work, Immuta must have, at minimum, the following access.

USE CATALOG on the system catalog
USE SCHEMA on the system.access and system.query schemas
SELECT on the following system tables:
- system.access.table_lineage
- system.access.column_lineage
- system.access.audit
- system.query.history
Access to system tables is governed by Unity Catalog. No user has access to these system schemas by default. To grant access, a user that is both a metastore admin and an account admin must grant USE_SCHEMA and SELECT permissions on the system schemas to the service principal. See Manage privileges in Unity Catalog.

Step 3: Generate the script

POST /integrations/scripts/create

Using the example request, update the <placeholder_values> with your connection details.
Copy the config object to use later in the setup process.
Run the request.
Copy the returned script and use it in the next step.

Find descriptions of the editable attributes in the table below and of the full payload in the Integration configuration payload reference guide.

curl -X 'POST' \
   'https://<your-immuta-url>/integrations/scripts/create' \
   -H 'accept: application/json' \
   -H 'Content-Type: application/json' \
   -H 'Authorization: <your-bearer-token>' \
   -d '{
   "type": "Databricks",
   "autoBootstrap": false,
   "config": {
     "workspaceUrl": "<www.your-workspace.cloud.databricks.com>",
     "httpPath": "<sql/protocolv1/o/0/your-path>",
     "authenticationType": "token",
     "token": "<service-principal-pat>",
     "catalog": "<new-catalog>",
     "groupPattern": { "deny": "<your-exemption-group>" },
     "audit": {"enabled": true}
   }
   }'

Create a separate Immuta catalog for each Immuta tenant

If multiple Immuta tenants are connected to your Databricks environment, create a separate Immuta catalog for each of those tenants. Having multiple Immuta tenants use the same Immuta catalog causes failures in policy enforcement.

Payload parameters

Attribute

Description

Required

config.workspaceUrl string

Your Databricks workspace URL.

Yes

config.httpPath string

The HTTP path of your Databricks cluster or SQL warehouse.

Yes

config.token string

The Databricks personal access token for the service principal created in step one for Immuta.

Yes

config.catalog string

The name of the Databricks catalog Immuta will create to store internal entitlements and other user data specific to Immuta. This catalog will only be readable for the Immuta service principal and should not be granted to other users. The catalog name may only contain letters, numbers, and underscores and cannot start with a number.

Yes

config.groupPattern object

This object allows you to exclude an account-level group in Databricks from data policies. See the Databricks Unity Catalog group pattern object description for details.

config.audit object

This object enables Databricks Unity Catalog query audit.

config.audit.enabled boolean

If true, Databricks Unity Catalog query audit is enabled. Set to true for the recommended configuration.

curl -X 'POST' \
   'https://<your-immuta-url>/integrations/scripts/create' \
   -H 'accept: application/json' \
   -H 'Content-Type: application/json' \
   -H 'Authorization: <your-bearer-token>' \
   -d '{
   "type": "Databricks",
   "autoBootstrap": false,
   "config": {
     "workspaceUrl": "<www.your-workspace.cloud.databricks.com>",
     "httpPath": "<sql/protocolv1/o/0/your-path>",
     "authenticationType": "oAuthM2M",
     "oAuthClientConfig": {
       "useCertificate": ,
       "clientId": "<your-client-ID>",
       "clientSecret": "<your-client-secret>",
       "": "all-apis",
       "authorityUrl": "<your.authority.com>"
     },
     "catalog": "<new-catalog>",
     "groupPattern": { "deny": "<your-exemption-group>" },
     "audit": {"enabled": true}
   }
   }'

Create a separate Immuta catalog for each Immuta tenant

Payload parameters

Attribute

Description

Required

config.workspaceUrl string

Your Databricks workspace URL.

Yes

config.httpPath string

The HTTP path of your Databricks cluster or SQL warehouse.

Yes

config.oAuthClientConfig object

The oAuthClientConfig object represents your OAuth configuration in Databricks Unity Catalog.

Yes

config.oAuthClientConfig.clientId string

The client identifier of the Immuta service principal you configured. This is the client ID displayed in Databricks when creating the client secret for the service principal.

Yes

config.oAuthClientConfig.clientSecret string

Client secret created for the Immuta service principal.

Yes

config.oAuthClientConfig.authorityUrl string

Authority URL of your identity provider.

Yes

config.catalog string

Yes

config.groupPattern object

This object allows you to exclude an account-level group in Databricks from data policies. See the Databricks Unity Catalog group pattern object description for details.

config.audit object

This object enables Databricks Unity Catalog query audit.

config.audit.enabled boolean

If true, Databricks Unity Catalog query audit is enabled. Set to true for the recommended configuration.

Step 4: Run the script in Databricks Unity Catalog

The previous step will return a script. Copy the script and run it in your Databricks Unity Catalog environment as a user with the privileges listed in the requirements section.

The script will use the service principal that will authenticate using the authentication that you specified. Additionally, the script will create the catalog you specified.

Step 5: Create the connection in Immuta

POST /data/connection

Copy the request and update the <placeholder_values> with your connection details. Note that the connection details here should match the ones used when generating the script. Then submit the request.

Find descriptions of the editable attributes in the table below and of the full payload in the Databricks Unity Catalog connection payload table. All values should be included and those you should not edit are noted.

Test run

Opt to test and validate the create connection payload using a dry run:

POST /data/connection/test

curl -X 'POST' \
   'https://<your-immuta-url>/data/connection' \
   -H 'accept: application/json' \
   -H 'Content-Type: application/json' \
   -H 'Authorization: <your-bearer-token>' \
   -d '{
   "connectionKey": "<your-connection-key-name>",
   "connection": {
     "technology": "Databricks",
     "hostname": "<www.your-workspace.cloud.databricks.com>",
     "port": <your-Databricks-port>,
     "httpPath": "<your-Databricks-warehouse-path>",
     "authenticationType": "token",
     "token": "<your-service-principal-pat>"
   },
   "settings": {
     "isActive": false
   },
   "options": {
     "forceRecursiveCrawl": true
   },
   : {
     "type": "Databricks",
     : false,
     : true,
     "config": {
       "authenticationType": "token",
       "token": "<your-service-principal-pat>",
       "host": "<www.your-workspace.cloud.databricks.com>",
       "port": <your-Databricks-port>,
       "catalog": "<your-immuta-catalog>",
       "audit": { "enabled": true },
       "workspaceIds": ["<your-workspace>", <"another-workspace">],
       : false,
       "groupPattern": { "deny": "<your-exemption-group>" },
       "jobConfig": {
         : "/Workspace/ImmutaArtifacts",
         : "undefined"
       }
   }
   }'

Create a separate Immuta catalog for each Immuta tenant

Payload parameters

Attribute

Description

Required

connectionKey string

A unique name for the connection.

Yes

connection object

Configuration attributes that should match the values used when getting the script from the integration endpoint.

Yes

connection.technology string

The technology backing the new connection.

Yes

connection.hostname string

Your Databricks workspace URL. This is the same as host and workspaceURL.

Yes

connection.port integer

The port to use when connecting to your Databricks account.

Yes

connection.httpPath string

The HTTP path of your Databricks cluster or SQL warehouse.

Yes

connection.authenticationType string

The authentication type to register the connection. Make sure this auth type is the same used when requesting the script.

Yes

connection.token string

The Databricks personal access token for the service principal created in step one for Immuta.

Yes

settings array

Specifications of the connection's settings, including status.

settings.isActive boolean

When false, data objects will be inactive (disabled) by default when created in Immuta. Set to false for the recommended configuration. If you set this to true for a data object and it has no subscription policy set on it, Immuta will REVOKE access to the data in Databricks for all Immuta users, even if they had been directly granted access to the table in Unity Catalog.

If a user is not registered in Immuta, Immuta will have no effect on that user's access to data in Unity Catalog. See the Databricks Unity Catalog reference guide for more details.

options array

Specification of the connection's default behavior for object crawls.

options.forceRecursiveCrawl boolean

If false, only active (enabled) objects will be crawled. If true, both active (enabled) and inactive (disabled) data objects will be crawled; any child objects from inactive (disabled) objects will be set as inactive (disabled). Set to true for the recommended configuration.

nativeIntegration object

Configuration attributes that should match the values used when getting the script from the integration endpoint.

Yes

nativeIntegration.type string

Same as connection.technology

Yes

nativeIntegration.autoBootstrap boolean

Use the same setting as the script generation.

Yes

nativeIntegration.unityCatalog boolean

Use the same setting as the script generation.

Yes

nativeIntegration.config.authenticationType string

Same as connection.authenticationType

Yes

nativeIntegration.config.token string

Same as connection.token

Yes

nativeIntegration.config.host string

Same as connection.hostname

Yes

nativeIntegration.config.port integer

Same as connection.port

Yes

nativeIntegration.config.catalog string

Use the same setting as the script generation.

Yes

nativeIntegration.config.audit object

Use the same setting as the script generation.

Yes

nativeIntegration.config.workspaceIds array

Use the same setting as the script generation.

nativeIntegration.config.enableNativeQueryParsing boolean

Use the same setting as the script generation.

nativeIntegration.config.groupPattern object

Use the same setting as the script generation.

nativeIntegration.config.jobConfig.workspaceDirectoryPath string

Use the same setting as the script generation.

nativeIntegration.config.jobConfig.jobClusterId string

Use the same setting as the script generation.

curl -X 'POST' \
   'https://<your-immuta-url>/data/connection' \
   -H 'accept: application/json' \
   -H 'Content-Type: application/json' \
   -H 'Authorization: <your-bearer-token>' \
   -d '{
   "connectionKey": "<your-connection-key-name>",
   "connection": {
     "technology": "Databricks",
     "hostname": "<www.your-workspace.cloud.databricks.com>",
     "port": <your-Databricks-port>,
     "httpPath": "<your-Databricks-warehouse-path>",
     "authenticationType": "oAuthM2M",
     "oAuthClientConfig": {
       "useCertificate": ,
       "clientId": "<your-client-ID>",
       "clientSecret": "<your-client-secret>",
       "": "all-apis",
       "authorityUrl": "<your.authority.com>"
     }
   },
   "settings": {
     "isActive": false
   },
   "options": {
     "forceRecursiveCrawl": true
   },
   : {
     "type": "Databricks",
     : false,
     : true,
     "config": {
       "host": "<www.your-workspace.cloud.databricks.com>",
       "port": <your-Databricks-port>,
       "authenticationType": "oAuthM2M",
       "oAuthClientConfig": {
         "useCertificate": ,
         "clientId": "<your-client-ID>",
         "clientSecret": "<your-client-secret>",
         "": "all-apis",
         "authorityUrl": "<your.authority.com>"
       },
       "catalog": "<your-immuta-catalog>",
       "audit": { "enabled": true },
       "workspaceIds": ["<your-workspace>", <"another-workspace">],
       : false,
       "groupPattern": { "deny": "<your-exemption-group>" },
       "jobConfig": {
         : "/Workspace/ImmutaArtifacts",
         : "undefined"
       }
   }
   }'

Create a separate Immuta catalog for each Immuta tenant

Payload parameters

Attribute

Description

Required

connectionKey string

A unique name for the connection.

Yes

connection object

Configuration attributes that should match the values used when getting the script from the integration endpoint.

Yes

connection.technology string

The technology backing the new connection.

Yes

connection.hostname string

Your Databricks workspace URL. This is the same as host and workspaceURL.

Yes

connection.port integer

The port to use when connecting to your Databricks account.

Yes

connection.httpPath string

The HTTP path of your Databricks cluster or SQL warehouse.

Yes

connection.authenticationType string

The authentication type to register the connection. Make sure this auth type is the same used when requesting the script.

Yes

connection.oAuthClientConfig object

The oAuthClientConfig object represents your OAuth configuration in Databricks Unity Catalog.

Yes

connection.oAuthClientConfig.clientId string

The client identifier of the Immuta service principal you configured. This is the client ID displayed in Databricks when creating the client secret for the service principal.

Yes

connection.oAuthClientConfig.clientSecret string

Client secret created for the Immuta service principal.

Yes

connection.oAuthClientConfig.authorityUrl string

Authority URL of your identity provider.

Yes

settings array

Specifications of the connection's settings, including status.

settings.isActive boolean

If a user is not registered in Immuta, Immuta will have no effect on that user's access to data in Unity Catalog. See the Databricks Unity Catalog reference guide for more details.

options array

Specification of the connection's default behavior for object crawls.

options.forceRecursiveCrawl boolean

nativeIntegration object

Configuration attributes that should match the values used when getting the script from the integration endpoint.

Yes

nativeIntegration.type string

Same as connection.technology

Yes

nativeIntegration.autoBootstrap boolean

Use the same setting as the script generation.

Yes

nativeIntegration.unityCatalog boolean

Use the same setting as the script generation.

Yes

nativeIntegration.config.authenticationType string

Same as connection.authenticationType

Yes

nativeIntegration.config.oAuthClientConfig object

Same as connection.oAuthClientConfig

Yes

nativeIntegration.config.host string

Same as connection.hostname

Yes

nativeIntegration.config.port integer

Same as connection.port

Yes

nativeIntegration.config.catalog string

Use the same setting as the script generation.

Yes

nativeIntegration.config.audit object

Use the same setting as the script generation.

Yes

nativeIntegration.config.workspaceIds array

Use the same setting as the script generation.

nativeIntegration.config.enableNativeQueryParsing boolean

Use the same setting as the script generation.

nativeIntegration.config.groupPattern object

Use the same setting as the script generation.

nativeIntegration.config.jobConfig.workspaceDirectoryPath string

Use the same setting as the script generation.

nativeIntegration.config.jobConfig.jobClusterId string

Use the same setting as the script generation.

Response schema

Attribute

Description

objectPath string

The list of names that uniquely identify the path to a data object in the remote platform's hierarchy. The first element should be the associated connectionKey.

bulkId string

A bulk ID that can be used to search for the status of background jobs triggered by this request.

Example response

{
  "objectPath": ['<your-connection-key-name>'],
  "bulkId": "a-new-uuid"
}

PreviousRegister an AWS Lake Formation Connection NextRegister a Snowflake Connection

Last updated 16 days ago

Was this helpful?