Configure a Databricks Unity Catalog Integration
Databricks Unity Catalog allows you to manage and access data in your Databricks account across all of your workspaces. With Immuta’s Databricks Unity Catalog integration, you can write your policies in Immuta and have them enforced automatically by Databricks across data in your Unity Catalog metastore.
This page details how to configure the integration. To configure the Databricks Unity Catalog integration and register data sources using the simplified workflow, see this how-to guide.
Permissions
Several different accounts are used to set up and maintain the Databricks Unity Catalog integration. The permissions required for each are outlined below.
Immuta account (required): This user configures the integration on the app settings page in Immuta. To access the app settings page, this user needs the following permission:
APPLICATION_ADMIN
Immuta permission
Databricks service principal (required): This service principal is used continuously by Immuta to orchestrate Unity Catalog policies and maintain state between Immuta and Databricks. In the automatic setup option, Immuta also uses this service principal to create the Immuta-managed catalog. This service principal needs the following Databricks privileges:
CREATE CATALOG
privilege on the Unity Catalog metastore. This is only required if you have Immuta automatically configure the integration in Databricks for you. If a separate user will run the Immuta script in Databricks to manually configure the integration, that Databricks user account needs this privilege instead.OWNER
permission on the Immuta catalog you configure.OWNER
privilege on one of the securables below so that Immuta can administer Unity Catalog row-level and column-level security controls.on catalogs with schemas and tables registered as Immuta data sources. This permission could also be applied by granting
OWNER
on a catalog to a Databricks group that includes the Immuta service principal to allow for multiple owners.on schemas with tables registered as Immuta data sources.
on all tables registered as Immuta data sources - if the
OWNER
permission cannot be applied at the catalog- or schema-level. In this case, each table registered as an Immuta data source must individually have theOWNER
permission granted to the Immuta service principal.
USE CATALOG
andUSE SCHEMA
on parent catalogs and schemas of tables registered as Immuta data sources so that the Immuta service principal canSELECT
andMODIFY
securables within the parent catalog and schema.SELECT
andMODIFY
on all tables registered as Immuta data sources so that the Immuta service principal can grant and revoke access to tables and apply Unity Catalog row- and column-level security controls.For native query audit (optional)
USE CATALOG
on thesystem
catalogUSE SCHEMA
on thesystem.access
schemaSELECT
on the following system tables:system.access.audit
system.access.table_lineage
system.access.column_lineage
Databricks account (recommended): This user account can manually configure the integration in Databricks to create the Immuta-managed catalog. To do so, this account requires the following Databricks privileges:
CREATE CATALOG
on the Unity Catalog metastoreACCOUNT ADMIN
on the Unity Catalog metastore for native query audit (optional)
Requirements
Before you configure the Databricks Unity Catalog integration, ensure that you have fulfilled the following requirements:
Unity Catalog metastore created and attached to a Databricks workspace. Immuta supports configuring a single metastore for each configured integration, and that metastore may be attached to multiple Databricks workspaces.
Unity Catalog enabled on your Databricks cluster or SQL warehouse. All SQL warehouses have Unity Catalog enabled if your workspace is attached to a Unity Catalog metastore. Immuta recommends linking a SQL warehouse to your Immuta tenant rather than a cluster for both performance and availability reasons.
Unity Catalog best practices
Ensure your integration with Unity Catalog goes smoothly by following these guidelines:
Use a Databricks SQL warehouse to configure the integration. Databricks SQL warehouses are faster to start than traditional clusters, require less management, and can run all the SQL that Immuta requires for policy administration. A serverless warehouse provides nearly instant startup time and is the preferred option for connecting to Immuta.
Move all data into Unity Catalog before configuring Immuta with Unity Catalog. The default catalog used once Unity Catalog support is enabled in Immuta is the
hive_metastore
, which is not supported by the Unity Catalog native integration. Data sources in the Hive Metastore must be managed by the Databricks Spark integration. Existing data sources will need to be re-created after they are moved to Unity Catalog and the Unity Catalog integration is configured.
Migrate data to Unity Catalog
Ensure that all Databricks clusters that have Immuta installed are stopped and the Immuta configuration is removed from the cluster. Immuta-specific cluster configuration is no longer needed with the Databricks Unity Catalog integration.
Move all data into Unity Catalog before configuring Immuta with Unity Catalog. Existing data sources will need to be re-created after they are moved to Unity Catalog and the Unity Catalog integration is configured. If you don't move all data before configuring the integration, metastore magic will protect your existing data sources throughout the migration process.
Create the Databricks service principal
In Databricks, create a service principal with the privileges listed above.
Opt to enable native query audit for Unity Catalog
If you will configure the integration using the manual setup option, the Immuta script you will use includes the SQL statements for granting required privileges to the service principal, so you can skip this step and continue to the manual setup section. Otherwise, manually grant the Immuta service principal access to the Databricks Unity Catalog system tables. For Databricks Unity Catalog audit to work, the service principal must have the following access at minimum:
USE CATALOG
on thesystem
catalogUSE SCHEMA
on thesystem.access
schemaSELECT
on the following system tables:system.access.audit
system.access.table_lineage
system.access.column_lineage
Configure the Databricks Unity Catalog integration
Existing data source migration: If you have existing Databricks data sources, complete these migration steps before proceeding.
You have two options for configuring your Databricks Unity Catalog integration:
Automatic setup: Immuta creates the catalogs, schemas, tables, and functions using the integration's configured service principal.
Manual setup: Run the Immuta script in Databricks yourself to create the catalog. You can also modify the script to customize your storage location for tables, schemas, or catalogs. The user running the script must have the Databricks privileges listed above.
Automatic setup
Required permissions: When performing an automatic setup, the Immuta service principal must have the permissions listed above.
Click the App Settings icon in the left sidebar.
Scroll to the Global Integrations Settings section and check the Enable Databricks Unity Catalog support in Immuta checkbox.
Click the Integrations tab.
Click + Add Native Integration and select Databricks Unity Catalog from the dropdown menu.
Complete the following fields:
Server Hostname is the hostname of your Databricks workspace.
HTTP Path is the HTTP path of your Databricks cluster or SQL warehouse.
Immuta Catalog is the name of the catalog Immuta will create to store internal entitlements and other user data specific to Immuta. This catalog will only be readable for the Immuta service principal and should not be granted to other users. The catalog name may only contain letters, numbers, and underscores and cannot start with a number.
If using a proxy server with Databricks Unity Catalog, click the Enable Proxy Support checkbox and complete the Proxy Host and Proxy Port fields. The username and password fields are optional.
Opt to fill out the Exemption Group field with the name of a group in Databricks that will be excluded from having data policies applied and must not be changed from the default value. Create this account-level group for privileged users and service accounts that require an unmasked view of data before configuring the integration in Immuta.
Opt to scope the query audit ingestion by entering in Unity Catalog Workspace IDs. Enter a comma-separated list of the workspace IDs that you want Immuta to ingest audit records for. If left empty, Immuta will audit all tables and users in Unity Catalog.
Unity Catalog query audit is enabled by default; you can disable it by clicking the Enable Native Query Audit checkbox. Ensure you have enabled system tables in Unity Catalog and provided the required access to the Immuta service principal.
Configure the audit frequency by scrolling to Integrations Settings and find the Unity Catalog Audit Sync Schedule section.
Enter how often, in hours, you want Immuta to ingest audit events from Unity Catalog as an integer between 1 and 24.
Continue with your integration configuration.
Select your authentication method from the dropdown:
Access Token: Enter a Databricks Personal Access Token. This is the access token for the Immuta service principal. This service principal must have the metastore privileges listed above for the metastore associated with the Databricks workspace. If this token is configured to expire, update this field regularly for the integration to continue to function.
OAuth machine-to-machine (M2M):
AWS Databricks:
Follow Databricks documentation to create a client secret for the Immuta service principal and assign this service principal the privileges listed above for the metastore associated with the Databricks workspace.
Fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is
https://<your workspace name>.cloud.databricks.com/oidc/v1/token
.Fill out the Client ID. This is a combination of letters, numbers, or symbols, used as a public identifier and is the client ID displayed in Databricks when creating the client secret for the service principal.
Enter the Scope (string). The scope limits the operations and roles allowed in Databricks by the access token. See the OAuth 2.0 documentation for details about scopes.
Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
Azure Databricks:
Follow Databricks documentation to create a service principal within Azure and then populate to your Databricks account and workspace.
Assign this service principal the privileges listed above for the metastore associated with the Databricks workspace.
Within Databricks, create an OAuth client secret for the service principal. This completes your Databricks-based service principal setup.
Within Immuta, fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is
https://<your workspace name>.azuredatabricks.net/oidc/v1/token
.Fill out the Client ID. This is a combination of letters, numbers, or symbols, used as a public identifier and is the client ID displayed in Databricks when creating the client secret for the service principal (note that Azure Databricks uses the Azure SP Client ID; it will be identical).
Enter the Scope (string). The scope limits the operations and roles allowed in Databricks by the access token. See the OAuth 2.0 documentation for details about scopes.
Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
Click Save.
Manual setup
Required permissions: When performing a manual setup, a service principal and a Databricks account must have the permissions listed above.
Click the App Settings icon in the left sidebar.
Scroll to the Global Integrations Settings section and check the Enable Databricks Unity Catalog support in Immuta checkbox.
Click the Integrations tab.
Click + Add Native Integration and select Databricks Unity Catalog from the dropdown menu.
Complete the following fields:
Server Hostname is the hostname of your Databricks workspace.
HTTP Path is the HTTP path of your Databricks cluster or SQL warehouse.
Immuta Catalog is the name of the catalog Immuta will create to store internal entitlements and other user data specific to Immuta. This catalog will only be readable for the Immuta service principal and should not be granted to other users. The catalog name may only contain letters, numbers, and underscores and cannot start with a number.
If using a proxy server with Databricks Unity Catalog, click the Enable Proxy Support checkbox and complete the Proxy Host and Proxy Port fields. The username and password fields are optional.
Opt to fill out the Exemption Group field with the name of a group in Databricks that will be excluded from having data policies applied and must not be changed from the default value. Create this account-level group for privileged users and service accounts that require an unmasked view of data before configuring the integration in Immuta.
Opt to scope the query audit ingestion by entering in Unity Catalog Workspace IDs. Enter a comma-separated list of the workspace IDs that you want Immuta to ingest audit records for. If left empty, Immuta will audit all tables and users in Unity Catalog.
Unity Catalog query audit is enabled by default; you can disable it by clicking the Enable Native Query Audit checkbox. Ensure you have enabled system tables in Unity Catalog and provided the required access to the Immuta service principal.
Configure the audit frequency by scrolling to Integrations Settings and find the Unity Catalog Audit Sync Schedule section.
Enter how often, in hours, you want Immuta to ingest audit events from Unity Catalog as an integer between 1 and 24.
Continue with your integration configuration.
Select your authentication method from the dropdown:
Access Token: Enter a Databricks Personal Access Token. This is the access token for the Immuta service principal. This service principal must have the metastore privileges listed above for the metastore associated with the Databricks workspace. If this token is configured to expire, update this field regularly for the integration to continue to function.
OAuth machine-to-machine (M2M):
AWS Databricks:
Follow Databricks documentation to create a client secret for the Immuta service principal and assign this service principal the privileges listed above for the metastore associated with the Databricks workspace.
Fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is
https://<your workspace name>.cloud.databricks.com/oidc/v1/token
.Fill out the Client ID. This is a combination of letters, numbers, or symbols, used as a public identifier and is the client ID displayed in Databricks when creating the client secret for the service principal.
Enter the Scope (string). The scope limits the operations and roles allowed in Databricks by the access token. See the OAuth 2.0 documentation for details about scopes.
Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
Azure Databricks:
Follow Databricks documentation to create a service principal within Azure and then populate to your Databricks account and workspace.
Assign this service principal the privileges listed above for the metastore associated with the Databricks workspace.
Within Databricks, create an OAuth client secret for the service principal. This completes your Databricks-based service principal setup.
Within Immuta, fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is
https://<your workspace name>.azuredatabricks.net/oidc/v1/token
.Fill out the Client ID. This is a combination of letters, numbers, or symbols, used as a public identifier and is the client ID displayed in Databricks when creating the client secret for the service principal (note that Azure Databricks uses the Azure SP Client ID; it will be identical).
Enter the Scope (string). The scope limits the operations and roles allowed in Databricks by the access token. See the OAuth 2.0 documentation for details about scopes.
Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
Select the Manual toggle and copy or download the script. You can modify the script to customize your storage location for tables, schemas, or catalogs.
Run the script in Databricks.
Click Save.
Map Databricks users to Immuta
If the usernames in Immuta do not match usernames in Databricks, map each Databricks username to each Immuta user account to ensure Immuta properly enforces policies using one of the methods linked below:
If the Databricks user doesn't exist in Databricks when you configure the integration, manually link their Immuta username to Databricks after they are created in Databricks. Otherwise, policies will not be enforced correctly for them in Databricks. Databricks user identities for Immuta users are automatically marked as invalid when the user is not found during policy application, preventing them from being affected by Databricks policy until their Immuta user identity is manually mapped to their Databricks identity.
Opt to enable Databricks Unity Catalog tag ingestion
Design partner preview
This feature is only available to select accounts. Reach out to your Immuta representative to enable this feature.
Requirement: A Databricks Unity Catalog integration must be configured for tags to be ingested.
To allow Immuta to automatically import table and column tags from Databricks Unity Catalog, enable Databricks Unity Catalog tag ingestion in the external catalog section of the Immuta app settings page.
Navigate to the App Settings page.
Scroll to 2 External Catalogs, and click Add Catalog.
Enter a Display Name and select Databricks Unity Catalog from the dropdown menu.
Click Save and confirm your changes.
Register data
Last updated