Register a Databricks Unity Catalog Connection
The connection API is a REST API which allows users to register a Databricks Unity Catalog to Immuta with a single set of credentials rather than configuring an integration and creating data sources separately. Then Immuta can manage and enforce access controls on your data through that connection. To manage your connection, see the Manage a connection reference guide.
Requirements
The following permissions and personas are used in the registration process.
An Immuta user with the CREATE_DATA_SOURCE
Immuta permission must register the Databricks Unity Catalog connection.
A Databricks user authorized to create a Databricks service principal must create one for Immuta. This service principal is used continuously by Immuta to orchestrate Unity Catalog policies and maintain state between Immuta and Databricks. This service principal needs the following Databricks privileges:
USE CATALOG
andMANAGE
on all catalogs containing securables registered as Immuta data sources andUSE SCHEMA
on all schemas containing securables registered as Immuta data sources.MODIFY
andSELECT
on all securables registered as Immuta data sources.MANAGE
andMODIFY
are required so that the service principal can apply row filters and column masks on the securable; to do so, the service principal must also haveSELECT
on the securable as well asUSE CATALOG
on its parent catalog andUSE SCHEMA
on its parent schema. Since privileges are inherited, you can grant the service principal theMODIFY
andSELECT
privilege on all catalogs or schemas containing Immuta data sources, which automatically grants the service principal theMODIFY
andSELECT
privilege on all current and future securables in the catalog or schema. The service principal also inheritsMANAGE
from the parent catalog for the purpose of applying row filters and column masks, but that privilege must be set directly on the parent catalog in order for grants to be fully applied.
See the Databricks documentation for more details about Unity Catalog privileges and securable objects.
Optionally, to include audit, the service principal needs the following additional privileges:
USE CATALOG
onsystem
catalogUSE SCHEMA
onsystem.access
schemaSELECT
onsystem.access.audit
tableSELECT
onsystem.access.table_lineage
tableSELECT
onsystem.access.column_lineage
table
Access to system tables is governed by Unity Catalog. No user has access to these system schemas by default. To grant access, a user that is both a metastore admin and an account admin must grant USE
and SELECT
permissions on the system schemas to the service principal. See Manage privileges in Unity Catalog. The system.access
schema must also be enabled on the metastore before it can be used.
Prerequisites
Unity Catalog metastore created and attached to a Databricks workspace. See the Databricks Unity Catalog reference guide for information on workspaces and catalog isolation support with Immuta.
Unity Catalog enabled on your Databricks cluster or SQL warehouse. All SQL warehouses have Unity Catalog enabled if your workspace is attached to a Unity Catalog metastore. Immuta recommends linking a SQL warehouse to your Immuta tenant rather than a cluster for both performance and availability reasons.
Complete the following steps to register a Databricks Unity Catalog connection:
Create a service principal in Databricks Unity Catalog with the proper Databricks privileges for Immuta to use to manage policies in Unity Catalog.
Set up Unity Catalog system tables for native query audit.
Use the
/integrations/scripts/create
endpoint to receive a script.Run the script in Databricks Unity Catalog.
Use the
/data/connection
endpoint to finish registering your connection in Immuta.
Step 1: Create your service principal
Create a Databricks service principal with the Databricks privileges outlined above and set up with personal access token (PAT) authentication.
The Immuta service principal you create requires specific Databricks privileges to connect to Databricks to create the integration catalog, configure the necessary procedures and functions, and maintain state between Databricks and Immuta.
Step 2: Set up native query audit
Enable native query audit by completing these steps in Unity Catalog:
Grant the service principal from step 1 access to the Databricks Unity Catalog system tables. For Databricks Unity Catalog audit to work, Immuta must have, at minimum, the following access.
USE CATALOG
on thesystem
catalogUSE SCHEMA
on thesystem.access
schemaSELECT
on the following system tables:system.access.audit
system.access.table_lineage
system.access.column_lineage
Step 3: Generate the script
POST
/integrations/scripts/create
Using the example request, update the
<placeholder_values>
with your connection details.Copy the
config
object to use later in the setup process.Run the request.
Copy the returned script and use it in the next step.
Find descriptions of the editable attributes in the table below and of the full payload in the Integration configuration payload reference guide.
Payload parameters
config.workspaceUrl string
Your Databricks workspace URL.
Yes
config.httpPath string
The HTTP path of your Databricks cluster or SQL warehouse.
Yes
config.token string
The Databricks personal access token for the service principal created in step one for Immuta.
Yes
config.catalog string
The name of the Databricks catalog Immuta will create to store internal entitlements and other user data specific to Immuta. This catalog will only be readable for the Immuta service principal and should not be granted to other users. The catalog name may only contain letters, numbers, and underscores and cannot start with a number.
Yes
config.audit object
This object enables Databricks Unity Catalog query audit.
No
config.audit.enabled boolean
If true
, Databricks Unity Catalog query audit is enabled. Set to true
for the recommended configuration.
No
Step 4: Run the script in Databricks Unity Catalog
The previous step will return a script. Copy the script and run it in your Databricks Unity Catalog environment as a user with the privileges listed in the requirements section.
The script will use the service principal that will authenticate using the personal access token (PAT) that you specified. Additionally, the script will create the catalog you specified.
Step 5: Create the connection in Immuta
POST
/data/connection
Copy the request and update the <placeholder_values>
with your connection details. Note that the connection details here should match the ones used when generating the script. Then submit the request.
Find descriptions of the editable attributes in the table below and of the full payload in the Databricks Unity Catalog connection payload table. All values should be included and those you should not edit are noted.
Test run
Opt to test and validate the create connection payload using a dry run:
POST
/data/connection/test
Payload parameters
connectionKey string
A unique name for the connection.
Yes
connection object
Configuration attributes that should match the values used when getting the script from the integration endpoint.
Yes
connection.technology string
The technology backing the new connection.
Yes
connection.hostname string
Your Databricks workspace URL. This is the same as host
and workspaceURL.
Yes
connection.port integer
The port to use when connecting to your Databricks account. Defaults to 443
.
Yes
connection.httpPath string
The HTTP path of your Databricks cluster or SQL warehouse.
Yes
connection.authenticationType string
The authentication type to register the connection. Make sure this auth type is the same used when requesting the script.
Yes
connection.token string
The Databricks personal access token for the service principal created in step one for Immuta.
Yes
settings array
Specifications of the connection's settings, including active status.
No
settings.isActive boolean
When false
, data objects will be inactive by default when created in Immuta. Set to false
for the recommended configuration.
No
options array
Specification of the connection's default behavior for object crawls.
No
options.forceRecursiveCrawl boolean
If false
, only active objects will be crawled. If true
, both active and inactive data objects will be crawled; any child objects from inactive objects will be set as inactive. Set to true
for the recommended configuration.
No
nativeIntegration object
Configuration attributes that should match the values used when getting the script from the integration endpoint.
Yes
nativeIntegration.type string
Same as connection.technology
Yes
nativeIntegration.autoBootstrap boolean
Use the same setting as the script generation.
Yes
nativeIntegration.unityCatalog boolean
Use the same setting as the script generation.
Yes
nativeIntegration.config.authenticationType string
Same as connection.authenticationType
Yes
nativeIntegration.config.token string
Same as connection.token
Yes
nativeIntegration.config.host string
Same as connection.hostname
Yes
nativeIntegration.config.port integer
Same as connection.port
Yes
nativeIntegration.config.catalog string
Use the same setting as the script generation.
Yes
nativeIntegration.config.audit object
Use the same setting as the script generation.
Yes
nativeIntegration.config.workspaceIds array
Use the same setting as the script generation.
No
nativeIntegration.config.enableNativeQueryParsing boolean
Use the same setting as the script generation.
No
nativeIntegration.config.groupPattern object
Use the same setting as the script generation.
No
nativeIntegration.config.jobConfig.workspaceDirectoryPath string
Use the same setting as the script generation.
No
nativeIntegration.config.jobConfig.jobClusterId string
Use the same setting as the script generation.
No
Response schema
objectPath string
The list of names that uniquely identify the path to a data object in the remote platform's hierarchy. The first element should be the associated connectionKey
.
bulkId string
A bulk ID that can be used to search for the status of background jobs triggered by this request.
Example response
Last updated