Configure a Databricks Unity Catalog Integration
Last updated
Was this helpful?
Last updated
Was this helpful?
allows you to manage and access data in your Databricks account across all of your workspaces. With Immuta’s Databricks Unity Catalog integration, you can write your policies in Immuta and have them enforced automatically by Databricks across data in your Unity Catalog metastore.
The following permissions and personas are used in the registration process.
Immuta user: An Immuta user with the APPLICATION_ADMIN
Immuta permission must configure the Databricks Unity Catalog integration.
Databricks user: The Databricks user must have the following privileges.
Account admin
CREATE CATALOG
privilege on the Unity Catalog metastore to create an Immuta-owned catalog and tables
(only required if enabling query audit)
:
USE CATALOG
and MANAGE
on all catalogs containing securables registered as Immuta data sources and USE SCHEMA
on all schemas containing securables registered as Immuta data sources.
MODIFY
and SELECT
on all securables registered as Immuta data sources. MANAGE
and MODIFY
are required so that the service principal can apply row filters and column masks on the securable; to do so, the service principal must also have SELECT
on the securable as well as USE CATALOG
on its parent catalog and USE SCHEMA
on its parent schema. Since privileges are inherited, you can grant the service principal the MODIFY
and SELECT
privilege on all catalogs or schemas containing Immuta data sources, which automatically grants the service principal the MODIFY
and SELECT
privilege on all current and future securables in the catalog or schema. The service principal also inherits MANAGE
from the parent catalog for the purpose of applying row filters and column masks, but that privilege must be set directly on the parent catalog in order for grants to be fully applied.
Optionally, to include audit, the service principal needs the following additional privileges:
USE CATALOG
on system
catalog
USE SCHEMA
on system.access
schema
SELECT
on system.access.audit
table
SELECT
on system.access.table_lineage
table
SELECT
on system.access.column_lineage
table
Access to system tables is governed by Unity Catalog. No user has access to these system schemas by default. To grant access, a user that is both a metastore admin and an account admin must grant USE
and SELECT
permissions on the system schemas to the service principal. See . The system.access
schema must also be on the metastore before it can be used.
Before you configure the Databricks Unity Catalog integration, ensure that you have fulfilled the following requirements:
Unity Catalog enabled on your Databricks cluster or SQL warehouse. All SQL warehouses have Unity Catalog enabled if your workspace is attached to a Unity Catalog metastore. Immuta recommends linking a SQL warehouse to your Immuta tenant rather than a cluster for both performance and availability reasons.
If you select single user access mode for your cluster, you must
enable serverless compute for your workspace.
Ensure that all Databricks clusters that have Immuta installed are stopped and the Immuta configuration is removed from the cluster. Immuta-specific cluster configuration is no longer needed with the Databricks Unity Catalog integration.
USE CATALOG
on the system
catalog
USE SCHEMA
on the system.access
schema
SELECT
on the following system tables:
system.access.audit
system.access.table_lineage
system.access.column_lineage
You have two options for configuring your Databricks Unity Catalog integration:
Click the App Settings icon in the left sidebar.
Scroll to the Global Integrations Settings section and check the Enable Databricks Unity Catalog support in Immuta checkbox.
Click the Integrations tab.
Click + Add Integration and select Databricks Unity Catalog from the dropdown menu.
Complete the following fields:
Server Hostname is the hostname of your Databricks workspace.
HTTP Path is the HTTP path of your Databricks cluster or SQL warehouse.
Immuta Catalog is the name of the catalog Immuta will create to store internal entitlements and other user data specific to Immuta. This catalog will only be readable for the Immuta service principal and should not be granted to other users. The catalog name may only contain letters, numbers, and underscores and cannot start with a number.
If using a proxy server with Databricks Unity Catalog, click the Enable Proxy Support checkbox and complete the Proxy Host and Proxy Port fields. The username and password fields are optional.
Opt to fill out the Exemption Group field with the name of a group in Databricks that will be excluded from having data policies applied and must not be changed from the default value. Create this account-level group for privileged users and service accounts that require an unmasked view of data before configuring the integration in Immuta.
Opt to scope the query audit ingestion by entering in Unity Catalog Workspace IDs. Enter a comma-separated list of the workspace IDs that you want Immuta to ingest audit records for. If left empty, Immuta will audit all tables and users in Unity Catalog.
Enter how often, in hours, you want Immuta to ingest audit events from Unity Catalog as an integer between 1 and 24.
Continue with your integration configuration.
Select your authentication method from the dropdown:
OAuth machine-to-machine (M2M):
AWS Databricks:
Fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.cloud.databricks.com/oidc/v1/token
.
Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
Azure Databricks:
Within Immuta, fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.azuredatabricks.net/oidc/v1/token
.
Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
Click Save.
Click the App Settings icon in the left sidebar.
Scroll to the Global Integrations Settings section and check the Enable Databricks Unity Catalog support in Immuta checkbox.
Click the Integrations tab.
Click + Add Integration and select Databricks Unity Catalog from the dropdown menu.
Complete the following fields:
Server Hostname is the hostname of your Databricks workspace.
HTTP Path is the HTTP path of your Databricks cluster or SQL warehouse.
Immuta Catalog is the name of the catalog Immuta will create to store internal entitlements and other user data specific to Immuta. This catalog will only be readable for the Immuta service principal and should not be granted to other users. The catalog name may only contain letters, numbers, and underscores and cannot start with a number.
If using a proxy server with Databricks Unity Catalog, click the Enable Proxy Support checkbox and complete the Proxy Host and Proxy Port fields. The username and password fields are optional.
Opt to fill out the Exemption Group field with the name of a group in Databricks that will be excluded from having data policies applied and must not be changed from the default value. Create this account-level group for privileged users and service accounts that require an unmasked view of data before configuring the integration in Immuta.
Opt to scope the query audit ingestion by entering in Unity Catalog Workspace IDs. Enter a comma-separated list of the workspace IDs that you want Immuta to ingest audit records for. If left empty, Immuta will audit all tables and users in Unity Catalog.
Enter how often, in hours, you want Immuta to ingest audit events from Unity Catalog as an integer between 1 and 24.
Continue with your integration configuration.
Select your authentication method from the dropdown:
OAuth machine-to-machine (M2M):
AWS Databricks:
Fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.cloud.databricks.com/oidc/v1/token
.
Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
Azure Databricks:
Within Immuta, fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.azuredatabricks.net/oidc/v1/token
.
Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
Select the Manual toggle and copy or download the script. You can modify the script to customize your storage location for tables, schemas, or catalogs.
Run the script in Databricks.
Click Save.
If the usernames in Immuta do not match usernames in Databricks, map each Databricks username to each Immuta user account to ensure Immuta properly enforces policies using one of the methods linked below:
Requirement: A Databricks Unity Catalog integration must be configured for tags to be ingested.
To allow Immuta to automatically import table and column tags from Databricks Unity Catalog, enable Databricks Unity Catalog tag ingestion in the external catalog section of the Immuta app settings page.
Navigate to the App Settings page.
Scroll to 2 External Catalogs, and click Add Catalog.
Enter a Display Name and select Databricks Unity Catalog from the dropdown menu.
Click Save and confirm your changes.
See the for more details about Unity Catalog privileges and securable objects.
Unity Catalog and attached to a Databricks workspace. Immuta supports configuring a single metastore for each configured integration, and that metastore may be attached to multiple Databricks workspaces.
use Databricks Runtime 15.4 LTS and above. Unity Catalog row- and column-level security controls are unsupported for single user access mode on Databricks Runtime 15.3 and below. See the for details.
Move all data into Unity Catalog before configuring Immuta with Unity Catalog. Existing data sources will need to be re-created after they are moved to Unity Catalog and the Unity Catalog integration is configured. If you don't move all data before configuring the integration, will protect your existing data sources throughout the migration process.
In Databricks, with the .
.
.
If you will configure the integration using the manual setup option, the Immuta script you will use includes the SQL statements for granting required privileges to the service principal, so you can skip this step and continue to the . Otherwise, . For Databricks Unity Catalog audit to work, the service principal must have the following access at minimum:
Existing data source migration: If you have existing Databricks data sources, complete these steps before proceeding.
: Immuta creates the catalogs, schemas, tables, and functions using the integration's configured service principal.
: Run the Immuta script in Databricks yourself to create the catalog. You can also modify the script to customize your storage location for tables, schemas, or catalogs. The user running the script must have the .
Required permissions: When performing an automatic setup, the credentials provided must have the .
is enabled by default; you can disable it by clicking the Enable Query Audit checkbox. .
Configure the by scrolling to Integrations Settings and find the Unity Catalog Audit Sync Schedule section.
Access Token: Enter a Databricks Personal Access Token. This is the access token for the Immuta service principal. This service principal must have the for the metastore associated with the Databricks workspace. If this token is configured to expire, update this field regularly for the integration to continue to function.
Follow for the Immuta service principal and assign this service principal the for the metastore associated with the Databricks workspace.
Fill out the Client ID. This is a combination of letters, numbers, or symbols, used as a public identifier and is the .
Enter the Scope (string). The scope limits the operations and roles allowed in Databricks by the access token. See the for details about scopes.
Follow to create a service principal within Azure and then populate to your Databricks account and workspace.
Assign this service principal the for the metastore associated with the Databricks workspace.
Within Databricks, . This completes your Databricks-based service principal setup.
Fill out the Client ID. This is a combination of letters, numbers, or symbols, used as a public identifier and is the (note that Azure Databricks uses the Azure SP Client ID; it will be identical).
Enter the Scope (string). The scope limits the operations and roles allowed in Databricks by the access token. See the for details about scopes.
Required permissions: When performing a manual setup, the Databricks user running the script must have the .
is enabled by default; you can disable it by clicking the Enable Query Audit checkbox. .
Configure the by scrolling to Integrations Settings and find the Unity Catalog Audit Sync Schedule section.
Access Token: Enter a Databricks Personal Access Token. This is the access token for the Immuta service principal. This service principal must have the for the metastore associated with the Databricks workspace. If this token is configured to expire, update this field regularly for the integration to continue to function.
Follow for the Immuta service principal and assign this service principal the for the metastore associated with the Databricks workspace.
Fill out the Client ID. This is a combination of letters, numbers, or symbols, used as a public identifier and is the .
Enter the Scope (string). The scope limits the operations and roles allowed in Databricks by the access token. See the for details about scopes.
Follow to create a service principal within Azure and then populate to your Databricks account and workspace.
Assign this service principal the for the metastore associated with the Databricks workspace.
Within Databricks, . This completes your Databricks-based service principal setup.
Fill out the Client ID. This is a combination of letters, numbers, or symbols, used as a public identifier and is the (note that Azure Databricks uses the Azure SP Client ID; it will be identical).
Enter the Scope (string). The scope limits the operations and roles allowed in Databricks by the access token. See the for details about scopes.
If the Databricks user doesn't exist in Databricks when you configure the integration, after they are created in Databricks. Otherwise, policies will not be enforced correctly for them in Databricks. Databricks user identities for Immuta users are automatically marked as invalid when the user is not found during policy application, preventing them from being affected by Databricks policy until their Immuta user identity is manually mapped to their Databricks identity.
.