Configure a Databricks Unity Catalog Integration

circle-exclamation

Databricks Unity Catalogarrow-up-right allows you to manage and access data in your Databricks account across all of your workspaces. With Immuta’s Databricks Unity Catalog integration, you can write your policies in Immuta and have them enforced automatically by Databricks across data in your Unity Catalog metastore.

Permissions

The permissions outlined in this section are the Databricks privileges required for a basic configuration. See the Databricks reference guide for a list of privileges necessary for additional features and settings.

  • APPLICATION_ADMIN Immuta permission

  • The Databricks user running the installation script must have the following privileges:

    • Account admin

    • CREATE CATALOG privilege on the Unity Catalog metastore to create an Immuta-owned catalog and tables

    • Metastore admin (only required if enabling query audit)

See the Databricks documentationarrow-up-right for more details about Unity Catalog privileges and securable objects.

Requirements

Before you configure the Databricks Unity Catalog integration, ensure that you have fulfilled the following requirements:

  • Unity Catalog metastore createdarrow-up-right and attached to a Databricks workspace. Immuta supports configuring a single metastore for each configured integration, and that metastore may be attached to multiple Databricks workspaces.

  • Unity Catalog enabled on your Databricks cluster or SQL warehouse. All SQL warehouses have Unity Catalog enabled if your workspace is attached to a Unity Catalog metastore. Immuta recommends linking a SQL warehouse to your Immuta tenant rather than a cluster for both performance and availability reasons.

  • If you select single user access mode for your cluster, you must

    • use Databricks Runtime 15.4 LTS and above. Unity Catalog row- and column-level security controls are unsupported for single user access mode on Databricks Runtime 15.3 and below. See the Databricks documentationarrow-up-right for details.

    • enable serverless compute for your workspace.

circle-info

Unity Catalog best practices

Ensure your integration with Unity Catalog goes smoothly by following these guidelines:

  • Use a Databricks SQL warehouse to configure the integration. Databricks SQL warehouses are faster to start than traditional clusters, require less management, and can run all the SQL that Immuta requires for policy administration. A serverless warehouse provides nearly instant startup time and is the preferred option for connecting to Immuta.

  • Move all data into Unity Catalog before configuring Immuta with Unity Catalog. The default catalog used once Unity Catalog support is enabled in Immuta is the hive_metastore, which is not supported by the Unity Catalog integration. Data sources in the Hive Metastore must be managed by the Databricks Spark integration. Existing data sources will need to be re-created after they are moved to Unity Catalog and the Unity Catalog integration is configured.

Migrate data to Unity Catalog

  1. Ensure that all Databricks clusters that have Immuta installed are stopped and the Immuta configuration is removed from the cluster. Immuta-specific cluster configuration is no longer needed with the Databricks Unity Catalog integration.

  2. Move all data into Unity Catalog before configuring Immuta with Unity Catalog. Existing data sources will need to be re-created after they are moved to Unity Catalog and the Unity Catalog integration is configured. If you don't move all data before configuring the integration, metastore magic will protect your existing data sources throughout the migration process.

Create the Databricks service principal

In Databricks, create a service principalarrow-up-right with the privileges listed below. Immuta uses this service principal continuously to orchestrate Unity Catalog policies and maintain state between Immuta and Databricks.

  • USE CATALOG and MANAGE on all catalogs containing securables registered as Immuta data sources.

  • USE SCHEMA on all schemas containing securables registered as Immuta data sources.

  • MODIFY and SELECT on all securables you want registered as Immuta data sources. The MODIFY privilege is not required for materialized views registered as Immuta data sources, since MODIFY is not a supported privilege on that object type in Databricksarrow-up-right.

circle-info

MANAGE and MODIFY are required so that the service principal can apply row filters and column masks on the securable; to do so, the service principal must also have SELECT on the securable as well as USE CATALOG on its parent catalog and USE SCHEMA on its parent schema. Since privileges are inherited, you can grant the service principal the MODIFY and SELECT privilege on all catalogs or schemas containing Immuta data sources, which automatically grants the service principal the MODIFY and SELECT privilege on all current and future securables in the catalog or schema. The service principal also inherits MANAGE from the parent catalog for the purpose of applying row filters and column masks, but that privilege must be set directly on the parent catalog in order for grants to be fully applied.

See the Databricks documentationarrow-up-right for more details about Unity Catalog privileges and securable objects.

Opt to enable query audit for Unity Catalog

If you will configure the integration using the manual setup option, the Immuta script you will use includes the SQL statements for granting required privileges to the service principal, so you can skip this step and continue to the manual setup section. Otherwise, manually grant the Immuta service principal access to the Databricks Unity Catalog system tablesarrow-up-right. For Databricks Unity Catalog audit to work, the service principal must have the following access at minimum:

  • USE CATALOG on the system catalog

  • USE SCHEMA on the system.access and system.query schemas

  • SELECT on the following system tables:

    • system.access.table_lineage

    • system.access.column_lineage

    • system.access.audit

    • system.query.history

Access to system tables is governed by Unity Catalog. No user has access to these system schemas by default. To grant access, a user that is both a metastore admin and an account admin must grant USE_SCHEMA and SELECT permissions on the system schemas to the service principal. See Manage privileges in Unity Catalogarrow-up-right for more details.

Configure the Databricks Unity Catalog integration

circle-info

Existing data source migration: If you have existing Databricks data sources, complete these migration steps before proceeding.

You have two options for configuring your Databricks Unity Catalog integration:

  • Automatic setup: Immuta creates the catalogs, schemas, tables, and functions using the integration's configured service principal.

  • Manual setup: Run the Immuta script in Databricks yourself to create the catalog. You can also modify the script to customize your storage location for tables, schemas, or catalogs. The user running the script must have the Databricks privileges listed above.

Automatic setup

Required permissions: When performing an automatic setup, the credentials provided must have the permissions listed above.

  1. Click the App Settings icon in the navigation menu.

  2. Click the Integrations tab.

  3. Click + Add Integration and select Databricks Unity Catalog from the dropdown menu.

  4. Complete the following fields:

    • Server Hostname is the hostname of your Databricks workspace.

    • HTTP Path is the HTTP path of your Databricks cluster or SQL warehouse.

    • Immuta Catalog is the name of the catalog Immuta will create to store internal entitlements and other user data specific to Immuta. This catalog will only be readable for the Immuta service principal and should not be granted to other users. The catalog name may only contain letters, numbers, and underscores and cannot start with a number.

circle-exclamation
  1. If using a proxy server with Databricks Unity Catalog, click the Enable Proxy Support checkbox and complete the Proxy Host and Proxy Port fields. The username and password fields are optional.

  2. Opt to fill out the Exemption Group field with the name of an account-level group in Databricks that must be exempt from having data policies applied. This group is created and managed in Databricks and should only include privileged users and service accounts that require an unmasked view of data. Create this group in Databricks before configuring the integration in Immuta.

circle-info

Exemption group cannot be changed after configuration is saved

The exemption group field cannot be edited after you save the integration configuration. If you need to change this group name, you can choose one of the following options:

  • Update the group name in Databricks to match what you have configured here.

  • Delete the integration in Immuta and create a new Databricks Unity Catalog integration with the new exemption group name.

For details about policy exemption groups, see the Databricks Unity Catalog reference guide.

  1. Unity Catalog query audit is enabled by default. Ensure you have enabled system tables in Unity Catalog and provided the required access to the Immuta service principal.

    1. Opt to scope the query audit ingestion by entering in Unity Catalog Workspace IDs. Enter a comma-separated list of the workspace IDs that you want Immuta to ingest audit records for. If left empty, Immuta will audit all tables and users in Unity Catalog.

    2. Configure the audit frequency by scrolling to Integrations Settings and find the Unity Catalog Audit Sync Schedule section.

    3. Enter how often, in hours, you want Immuta to ingest audit events from Unity Catalog as an integer between 1 and 24.

    4. Continue with your integration configuration.

  2. Select your authentication method from the dropdown:

  3. Click Save.

Manual setup

Required permissions: When performing a manual setup, the Databricks user running the script must have the permissions listed above.

  1. Click the App Settings icon in the navigation menu.

  2. Click the Integrations tab.

  3. Click + Add Integration and select Databricks Unity Catalog from the dropdown menu.

  4. Complete the following fields:

    • Server Hostname is the hostname of your Databricks workspace.

    • HTTP Path is the HTTP path of your Databricks cluster or SQL warehouse.

    • Immuta Catalog is the name of the catalog Immuta will create to store internal entitlements and other user data specific to Immuta. This catalog will only be readable for the Immuta service principal and should not be granted to other users. The catalog name may only contain letters, numbers, and underscores and cannot start with a number.

circle-exclamation
  1. If using a proxy server with Databricks Unity Catalog, click the Enable Proxy Support checkbox and complete the Proxy Host and Proxy Port fields. The username and password fields are optional.

  2. Opt to fill out the Exemption Group field with the name of an account-level group in Databricks that must be exempt from having data policies applied. This group is created and managed in Databricks and should only include privileged users and service accounts that require an unmasked view of data. Create this group in Databricks before configuring the integration in Immuta.

circle-info

Exemption group cannot be changed after configuration is saved

The exemption group field cannot be edited after you save the integration configuration. If you need to change this group name, you can choose one of the following options:

  • Update the group name in Databricks to match what you have configured here.

  • Delete the integration in Immuta and create a new Databricks Unity Catalog integration with the new exemption group name.

For details about policy exemption groups, see the Databricks Unity Catalog reference guide.

  1. Unity Catalog query audit is enabled by default. Ensure you have enabled system tables in Unity Catalog and provided the required access to the Immuta service principal.

    1. Opt to scope the query audit ingestion by entering in Unity Catalog Workspace IDs. Enter a comma-separated list of the workspace IDs that you want Immuta to ingest audit records for. If left empty, Immuta will audit all tables and users in Unity Catalog.

    2. Configure the audit frequency by scrolling to Integrations Settings and find the Unity Catalog Audit Sync Schedule section.

    3. Enter how often, in hours, you want Immuta to ingest audit events from Unity Catalog as an integer between 1 and 24.

    4. Continue with your integration configuration.

  2. Select your authentication method from the dropdown:

  3. Select the Manual toggle and copy or download the script. You can modify the script to customize your storage location for tables, schemas, or catalogs.

  4. Run the script in Databricks.

  5. Click Save.

Map Databricks users to Immuta

If the usernames in Immuta do not match usernames in Databricks, map each Databricks username to each Immuta user account to ensure Immuta properly enforces policies using one of the methods linked below:

If the Databricks user doesn't exist in Databricks when you configure the integration, manually link their Immuta username to Databricks after they are created in Databricks. Otherwise, policies will not be enforced correctly for them in Databricks. Databricks user identities for Immuta users are automatically marked as invalid when the user is not found during policy application, preventing them from being affected by Databricks policy until their Immuta user identity is manually mapped to their Databricks identity.

Opt to enable Databricks Unity Catalog tag ingestion

circle-info

Private preview: This feature is only available to select accounts. Contact your Immuta representative to enable this feature.

Requirements:

  • A configured Databricks Unity Catalog integration or connection

  • Fewer than 10,000 Databricks Unity Catalog data sources registered in Immuta

To allow Immuta to automatically import table and column tags from Databricks Unity Catalog, enable Databricks Unity Catalog tag ingestion in the external catalog section of the Immuta app settings page.

  1. Navigate to the App Settings page.

  2. Scroll to 2 External Catalogs, and click Add Catalog.

  3. Enter a Display Name and select Databricks Unity Catalog from the dropdown menu.

  4. Click Save and confirm your changes.

Register data

Register Databricks securables in Immuta.

Last updated

Was this helpful?