Register a Databricks Unity Catalog Host

The enhanced onboarding API is a REST API which allows users to register a Databricks Unity Catalog to Immuta with a single set of credentials rather than configuring an integration and creating data sources separately. Then Immuta can manage and enforce access controls on your data through that host. To manage your host, see the Manage a host reference guide.

Requirements

The following permissions and personas are used in the registration process:

  • Immuta permission: CREATE_DATA_SOURCE

  • Databricks privileges for the user registering the host and running the script:

    • Account or workspace admin

    • CREATE CATALOG privilege on the Unity Catalog metastore to create an Immuta-owned catalog and tables

  • Databricks privileges for the service principal you create:

    • OWNER privilege on the Immuta catalog you configure.

    • OWNER privilege on catalogs with schemas and tables registered as Immuta data sources so that Immuta can administer Unity Catalog row-level and column-level security controls. This privilege can be applied by granting OWNER on a catalog to a Databricks group that includes the Immuta service principal to allow for multiple owners. If the OWNER privilege cannot be applied at the catalog- or schema-level, each table registered as an Immuta data source must individually have the OWNER privilege granted to the Immuta service principal.

    • USE CATALOG and USE SCHEMA on parent catalogs and schemas of tables registered as Immuta data sources so that the Immuta service principal can interact with those tables.

    • SELECT and MODIFY on all tables registered as Immuta data sources so that the Immuta service principal can grant and revoke access to tables and apply Unity Catalog row- and column-level security controls.

    • USE CATALOG on the system catalog for native query audit.

    • USE SCHEMA on the system.access schema for native query audit.

    • SELECT on the following system tables for native query audit:

      • system.access.audit

      • system.access.table_lineage

      • system.access.column_lineage

Prerequisites

  • Unity Catalog metastore created and attached to a Databricks workspace. See the Databricks Unity Catalog reference guide for information on workspaces and catalog isolation support with Immuta.

  • Unity Catalog enabled on your Databricks cluster or SQL warehouse. All SQL warehouses have Unity Catalog enabled if your workspace is attached to a Unity Catalog metastore. Immuta recommends linking a SQL warehouse to your Immuta tenant rather than a cluster for both performance and availability reasons.

Complete the following steps to register a Databricks Unity Catalog host:

  1. Create a service principal in Databricks Unity Catalog with the proper Databricks privileges for Immuta to use to manage policies in Unity Catalog.

  2. Enable Databricks Unity Catalog in Immuta.

  3. Set up Unity Catalog system tables for native query audit.

  4. Use the /integrations/scripts/create endpoint to receive a script.

  5. Run the script in Databricks Unity Catalog.

  6. Use the /data/connection endpoint to finish registering your host in Immuta.

Step 1: Create your service principal

Create a Databricks service principal with the Databricks privileges outlined above and set up with personal access token (PAT) authentication.

The Immuta service principal you create requires specific Databricks privileges to connect to Databricks to create the integration catalog, configure the necessary procedures and functions, and maintain state between Databricks and Immuta.

Step 2: Enable Unity Catalog in Immuta

Enable Databricks Unity Catalog on the Immuta app settings page:

  1. Click the App Settings icon in the left sidebar.

  2. Scroll to the Global Integrations Settings section and check the Enable Databricks Unity Catalog support in Immuta checkbox.

Step 3: Set up native query audit

Enable native query audit by completing these steps in Unity Catalog:

  1. Grant the service principal from step 1 access to the Databricks Unity Catalog system tables. For Databricks Unity Catalog audit to work, Immuta must have, at minimum, the following access.

    • USE CATALOG on the system catalog

    • USE SCHEMA on the system.access schema

    • SELECT on the following system tables:

      • system.access.audit

      • system.access.table_lineage

      • system.access.column_lineage

Step 4: Generate the script

POST /integrations/scripts/create

  1. Using the example request, update the <placeholder_values> with your connection details.

  2. Copy the config object to use later in the setup process.

  3. Run the request.

  4. Copy the returned script and use it in the next step.

Find descriptions of the editable attributes in the table below and of the full payload in the Integration configuration payload reference guide.

curl -X 'POST' \
    'https://<your-immuta-url>/integrations/scripts/create' \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -H 'Authorization: <your-bearer-token>' \
    -d '{
    "type": "Databricks",
    "autoBootstrap": false,
    "config": {
      "workspaceUrl": "<www.your-workspace.cloud.databricks.com>",
      "httpPath": "<sql/protocolv1/o/0/your-path>",
      "": "token",
      "token": "<service-principal-pat>",
      "catalog": "<new-catalog>",
      "audit": {"enabled": true}
    }
    }'

Payload parameters

AttributeDescriptionRequired

config.workspaceUrl string

Your Databricks workspace URL.

Yes

config.httpPath string

The HTTP path of your Databricks cluster or SQL warehouse.

Yes

config.token string

The Databricks personal access token for the service principal created in step one for Immuta.

Yes

config.catalog string

The name of the Databricks catalog Immuta will create to store internal entitlements and other user data specific to Immuta. This catalog will only be readable for the Immuta service principal and should not be granted to other users. The catalog name may only contain letters, numbers, and underscores and cannot start with a number.

Yes

config.audit object

This object enables Databricks Unity Catalog query audit.

No

config.audit.enabled boolean

If true, Databricks Unity Catalog query audit is enabled. Set to true for the recommended configuration.

No

Step 5: Run the script in Databricks Unity Catalog

Step one will return a script. Copy the script and run it in your Databricks Unity Catalog environment as a user with the privileges listed in the requirements section.

The script will use the service principal that will authenticate using the personal access token (PAT) that you specified in step four. Additionally, the script will create the catalog you specified in step four.

Step 6: Create the host in Immuta

POST /data/connection

Copy the request and update the <placeholder_values> with your connection details. Note that the connection details here should match the ones used in step four. Then submit the request.

Find descriptions of the editable attributes in the table below and of the full payload in the Databricks Unity Catalog host payload table. All values should be included and those you should not edit are noted.

Test run

Opt to test and validate the create connection payload using a dry run:

POST /data/connection/test

curl -X 'POST' \
    'https://<your-immuta-url>/data/connection' \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -H 'Authorization: <your-bearer-token>' \
    -d '{
     connectionKey: "<your-connection-key-name>",
     connection: {
       technology: "Databricks",
       hostname: "<www.your-workspace.cloud.databricks.com>",
       port: <your-Databricks-port>,
       httpPath: "<your-Databricks-warehouse-path>",
       authenticationType: "token",
       token: "<your-service-principal-pat>"
     },
     settings: {
         isActive: false
     },
     options: {
       forceRecursiveCrawl: true
     },
     : {
       type: "Databricks",
       : false,
       : true,
       config: {
         authenticationType: "token",
         token: "<your-service-principal-pat>",
         host: "<www.your-workspace.cloud.databricks.com>",
         port: <your-Databricks-port>,
         catalog: "<your-immuta-catalog>",
         audit: { enabled: true },
         workspaceIds: ["<your-workspace>", <"another-workspace">],
         : false,
         groupPattern: { deny: "<your-exemption-group>" },
         jobConfig: {
           : "/Workspace/ImmutaArtifacts",
           : undefined
       }
     }
    }'
    

Payload parameters

AttributeDescriptionRequired

connectionKey string

A unique name for the host connection.

Yes

connection object

Configuration attributes that should match the values used when getting the script from the integration endpoint.

Yes

connection.technology string

The technology backing the new host.

Yes

connection.hostname string

Your Databricks workspace URL. This is the same as host and workspaceURL.

Yes

connection.port integer

The port to use when connecting to your Databricks account host. Defaults to 443.

Yes

connection.httpPath string

The HTTP path of your Databricks cluster or SQL warehouse.

Yes

connection.authenticationType string

The authentication type to connect to the host. Make sure this auth type is the same used when requesting the script.

Yes

connection.token string

The Databricks personal access token for the service principal created in step one for Immuta.

Yes

settings array

Specifications of the host's settings, including active status.

No

settings.isActive boolean

When false, data objects will be inactive by default when created in Immuta. Set to false for the recommended configuration.

No

options array

Specification of the host's default behavior for object crawls.

No

options.forceRecursiveCrawl boolean

If false, only active objects will be crawled. If true, both active and inactive data objects will be crawled; any child objects from inactive objects will be set as inactive. Set to true for the recommended configuration.

No

nativeIntegration object

Configuration attributes that should match the values used when getting the script from the integration endpoint.

Yes

nativeIntegration.type string

Same as connection.technology

Yes

nativeIntegration.autoBootstrap boolean

Use the same setting as the script generation.

Yes

nativeIntegration.unityCatalog boolean

Use the same setting as the script generation.

Yes

nativeIntegration.config.authenticationType string

Same as connection.authenticationType

Yes

nativeIntegration.config.token string

Same as connection.token

Yes

nativeIntegration.config.host string

Same as connection.hostname

Yes

nativeIntegration.config.port integer

Same as connection.port

Yes

nativeIntegration.config.catalog string

Use the same setting as the script generation.

Yes

nativeIntegration.config.audit object

Use the same setting as the script generation.

Yes

nativeIntegration.config.workspaceIds array

Use the same setting as the script generation.

No

nativeIntegration.config.enableNativeQueryParsing boolean

Use the same setting as the script generation.

No

nativeIntegration.config.groupPattern object

Use the same setting as the script generation.

No

nativeIntegration.config.jobConfig.workspaceDirectoryPath string

Use the same setting as the script generation.

No

nativeIntegration.config.jobConfig.jobClusterId string

Use the same setting as the script generation.

No

Response schema

AttributeDescription

objectPath string

The list of names that uniquely identify the path to a data object in the remote platform's hierarchy. The first element should be the associated connectionKey.

bulkId string

A bulk ID that can be used to search for the status of background jobs triggered by this request.

Example response

{
  objectPath: ['<your-connection-key-name>'],
  bulkId: "a-new-uuid"
}

Last updated

Self-managed versions

2024.32024.22024.1

Copyright © 2014-2024 Immuta Inc. All rights reserved.

#141: DSIA API Updates

Change request updated