Register a Databricks Unity Catalog Host
Last updated
Last updated
The enhanced onboarding API is a REST API which allows users to register a Databricks Unity Catalog to Immuta with a single set of credentials rather than configuring an integration and creating data sources separately. Then Immuta can manage and enforce access controls on your data through that host. To manage your host, see the .
The following permissions and personas are used in the registration process:
Immuta permission: CREATE_DATA_SOURCE
Databricks privileges for the user registering the host and running the script:
Account or workspace admin
CREATE CATALOG
privilege on the Unity Catalog metastore to create an Immuta-owned catalog and tables
Databricks privileges for the service principal you create:
OWNER
privilege on the Immuta catalog you configure.
OWNER
privilege on catalogs with schemas and tables registered as Immuta data sources so that Immuta can administer Unity Catalog row-level and column-level security controls. This privilege can be applied by granting OWNER
on a catalog to a Databricks group that includes the Immuta service principal to allow for multiple owners. If the OWNER
privilege cannot be applied at the catalog- or schema-level, each table registered as an Immuta data source must individually have the OWNER
privilege granted to the Immuta service principal.
USE CATALOG
and USE SCHEMA
on parent catalogs and schemas of tables registered as Immuta data sources so that the Immuta service principal can interact with those tables.
SELECT
and MODIFY
on all tables registered as Immuta data sources so that the Immuta service principal can grant and revoke access to tables and apply Unity Catalog row- and column-level security controls.
USE CATALOG
on the system
catalog for native query audit.
USE SCHEMA
on the system.access
schema for native query audit.
SELECT
on the following system tables for native query audit:
system.access.audit
system.access.table_lineage
system.access.column_lineage
Unity Catalog enabled on your Databricks cluster or SQL warehouse. All SQL warehouses have Unity Catalog enabled if your workspace is attached to a Unity Catalog metastore. Immuta recommends linking a SQL warehouse to your Immuta tenant rather than a cluster for both performance and availability reasons.
Complete the following steps to register a Databricks Unity Catalog host:
Create a service principal in Databricks Unity Catalog with the proper Databricks privileges for Immuta to use to manage policies in Unity Catalog.
Set up Unity Catalog system tables for native query audit.
Use the /integrations/scripts/create
endpoint to receive a script.
Run the script in Databricks Unity Catalog.
Use the /data/connection
endpoint to finish registering your host in Immuta.
Enable native query audit by completing these steps in Unity Catalog:
USE CATALOG
on the system
catalog
USE SCHEMA
on the system.access
schema
SELECT
on the following system tables:
system.access.audit
system.access.table_lineage
system.access.column_lineage
POST
/integrations/scripts/create
Using the example request, update the <placeholder_values>
with your connection details.
Copy the config
object to use later in the setup process.
Run the request.
Copy the returned script and use it in the next step.
Payload parameters
The script will use the service principal that will authenticate using the personal access token (PAT) that you specified. Additionally, the script will create the catalog you specified.
POST
/data/connection
Copy the request and update the <placeholder_values>
with your connection details. Note that the connection details here should match the ones used when generating the script. Then submit the request.
Test run
Opt to test and validate the create connection payload using a dry run:
POST
/data/connection/test
Unity Catalog and attached to a Databricks workspace. See the for information on workspaces and catalog isolation support with Immuta.
Create a Databricks with the Databricks privileges outlined and set up with personal access token (PAT) authentication.
The Immuta service principal you create requires to connect to Databricks to create the integration catalog, configure the necessary procedures and functions, and maintain state between Databricks and Immuta.
.
. For Databricks Unity Catalog audit to work, Immuta must have, at minimum, the following access.
.
Find descriptions of the editable attributes in the table below and of the full payload in the .
Attribute | Description | Required |
---|
The previous step will return a script. Copy the script and run it in your Databricks Unity Catalog environment as a user with the privileges listed in .
Find descriptions of the editable attributes in the table below and of the full payload in the . All values should be included and those you should not edit are noted.
Attribute | Description | Required |
---|
Attribute | Description |
---|
config.workspaceUrl | Your Databricks workspace URL. | Yes |
config.httpPath | The HTTP path of your Databricks cluster or SQL warehouse. | Yes |
config.token | The Databricks personal access token for the service principal created in step one for Immuta. | Yes |
config.catalog | The name of the Databricks catalog Immuta will create to store internal entitlements and other user data specific to Immuta. This catalog will only be readable for the Immuta service principal and should not be granted to other users. The catalog name may only contain letters, numbers, and underscores and cannot start with a number. | Yes |
config.audit | This object enables Databricks Unity Catalog query audit. | No |
config.audit.enabled | If | No |
connectionKey | A unique name for the host connection. | Yes |
connection | Configuration attributes that should match the values used when getting the script from the integration endpoint. | Yes |
connection.technology | The technology backing the new host. | Yes |
connection.hostname | Your Databricks workspace URL. This is the same as | Yes |
connection.port | The port to use when connecting to your Databricks account host. Defaults to | Yes |
connection.httpPath | The HTTP path of your Databricks cluster or SQL warehouse. | Yes |
connection.authenticationType | The authentication type to connect to the host. Make sure this auth type is the same used when requesting the script. | Yes |
connection.token | The Databricks personal access token for the service principal created in step one for Immuta. | Yes |
settings | Specifications of the host's settings, including active status. | No |
settings.isActive | When | No |
options | Specification of the host's default behavior for object crawls. | No |
options.forceRecursiveCrawl | If | No |
nativeIntegration | Configuration attributes that should match the values used when getting the script from the integration endpoint. | Yes |
nativeIntegration.type | Same as | Yes |
nativeIntegration.autoBootstrap | Use the same setting as the script generation. | Yes |
nativeIntegration.unityCatalog | Use the same setting as the script generation. | Yes |
nativeIntegration.config.authenticationType | Same as | Yes |
nativeIntegration.config.token | Same as | Yes |
nativeIntegration.config.host | Same as | Yes |
nativeIntegration.config.port | Same as | Yes |
nativeIntegration.config.catalog | Use the same setting as the script generation. | Yes |
nativeIntegration.config.audit | Use the same setting as the script generation. | Yes |
nativeIntegration.config.workspaceIds | Use the same setting as the script generation. | No |
nativeIntegration.config.enableNativeQueryParsing | Use the same setting as the script generation. | No |
nativeIntegration.config.groupPattern | Use the same setting as the script generation. | No |
nativeIntegration.config.jobConfig.workspaceDirectoryPath | Use the same setting as the script generation. | No |
nativeIntegration.config.jobConfig.jobClusterId | Use the same setting as the script generation. | No |
objectPath | The list of names that uniquely identify the path to a data object in the remote platform's hierarchy. The first element should be the associated |
bulkId | A bulk ID that can be used to search for the status of background jobs triggered by this request. |