Databricks Unity Catalog allows you to manage and access data in your Databricks account across all of your workspaces. With Immuta’s Databricks Unity Catalog integration, you can write your policies in Immuta and have them enforced automatically by Databricks across data in your Unity Catalog metastore.
This page details how to configure the integration. To configure the Databricks Unity Catalog integration and register data sources using the connections, see this how-to guide.
Several different accounts are used to set up and maintain the Databricks Unity Catalog integration. The permissions required for each are outlined below.
Immuta account (required): This user configures the integration on the app settings page in Immuta. To access the app settings page, this user needs the following permission:
APPLICATION_ADMIN
Immuta permission
Databricks service principal (required): This service principal is used continuously by Immuta to orchestrate Unity Catalog policies and maintain state between Immuta and Databricks. In the automatic setup option, Immuta also uses this service principal to create the Immuta-managed catalog. This service principal needs the following Databricks privileges:
CREATE CATALOG
privilege on the Unity Catalog metastore. This is only required if you have Immuta automatically configure the integration in Databricks for you. If a separate user will run the Immuta script in Databricks to manually configure the integration, that Databricks user account needs this privilege instead.
OWNER
permission on the Immuta catalog you configure.
OWNER
privilege on one of the securables below so that Immuta can administer Unity Catalog row-level and column-level security controls.
on catalogs with schemas and tables registered as Immuta data sources. This permission could also be applied by granting OWNER
on a catalog to a Databricks group that includes the Immuta service principal to allow for multiple owners.
on schemas with tables registered as Immuta data sources.
on all tables registered as Immuta data sources - if the OWNER
permission cannot be applied at the catalog- or schema-level. In this case, each table registered as an Immuta data source must individually have the OWNER
permission granted to the Immuta service principal.
USE CATALOG
and USE SCHEMA
on parent catalogs and schemas of tables registered as Immuta data sources so that the Immuta service principal can SELECT
and MODIFY
securables within the parent catalog and schema.
SELECT
and MODIFY
on all tables registered as Immuta data sources so that the Immuta service principal can grant and revoke access to tables and apply Unity Catalog row- and column-level security controls.
For native query audit (optional)
USE CATALOG
on the system
catalog
USE SCHEMA
on the system.access
schema
SELECT
on the following system tables:
system.access.audit
system.access.table_lineage
system.access.column_lineage
Databricks account (recommended): This user account can manually configure the integration in Databricks to create the Immuta-managed catalog. To do so, this account requires the following Databricks privileges:
CREATE CATALOG
on the Unity Catalog metastore
ACCOUNT ADMIN
on the Unity Catalog metastore for native query audit (optional)
Before you configure the Databricks Unity Catalog integration, ensure that you have fulfilled the following requirements:
Unity Catalog metastore created and attached to a Databricks workspace. Immuta supports configuring a single metastore for each configured integration, and that metastore may be attached to multiple Databricks workspaces.
Unity Catalog enabled on your Databricks cluster or SQL warehouse. All SQL warehouses have Unity Catalog enabled if your workspace is attached to a Unity Catalog metastore. Immuta recommends linking a SQL warehouse to your Immuta tenant rather than a cluster for both performance and availability reasons.
Unity Catalog best practices
Ensure your integration with Unity Catalog goes smoothly by following these guidelines:
Use a Databricks SQL warehouse to configure the integration. Databricks SQL warehouses are faster to start than traditional clusters, require less management, and can run all the SQL that Immuta requires for policy administration. A serverless warehouse provides nearly instant startup time and is the preferred option for connecting to Immuta.
Move all data into Unity Catalog before configuring Immuta with Unity Catalog. The default catalog used once Unity Catalog support is enabled in Immuta is the hive_metastore
, which is not supported by the Unity Catalog native integration. Data sources in the Hive Metastore must be managed by the Databricks Spark integration. Existing data sources will need to be re-created after they are moved to Unity Catalog and the Unity Catalog integration is configured.
Ensure that all Databricks clusters that have Immuta installed are stopped and the Immuta configuration is removed from the cluster. Immuta-specific cluster configuration is no longer needed with the Databricks Unity Catalog integration.
Move all data into Unity Catalog before configuring Immuta with Unity Catalog. Existing data sources will need to be re-created after they are moved to Unity Catalog and the Unity Catalog integration is configured. If you don't move all data before configuring the integration, metastore magic will protect your existing data sources throughout the migration process.
In Databricks, create a service principal with the privileges listed above.
If you will configure the integration using the manual setup option, the Immuta script you will use includes the SQL statements for granting required privileges to the service principal, so you can skip this step and continue to the manual setup section. Otherwise, manually grant the Immuta service principal access to the Databricks Unity Catalog system tables. For Databricks Unity Catalog audit to work, the service principal must have the following access at minimum:
USE CATALOG
on the system
catalog
USE SCHEMA
on the system.access
schema
SELECT
on the following system tables:
system.access.audit
system.access.table_lineage
system.access.column_lineage
Existing data source migration: If you have existing Databricks data sources, complete these migration steps before proceeding.
You have two options for configuring your Databricks Unity Catalog integration:
Automatic setup: Immuta creates the catalogs, schemas, tables, and functions using the integration's configured service principal.
Manual setup: Run the Immuta script in Databricks yourself to create the catalog. You can also modify the script to customize your storage location for tables, schemas, or catalogs. The user running the script must have the Databricks privileges listed above.
Required permissions: When performing an automatic setup, the Immuta service principal must have the permissions listed above.
Click the App Settings icon in the left sidebar.
Click the Integrations tab.
Click + Add Native Integration and select Databricks Unity Catalog from the dropdown menu.
Complete the following fields:
Server Hostname is the hostname of your Databricks workspace.
HTTP Path is the HTTP path of your Databricks cluster or SQL warehouse.
Immuta Catalog is the name of the catalog Immuta will create to store internal entitlements and other user data specific to Immuta. This catalog will only be readable for the Immuta service principal and should not be granted to other users. The catalog name may only contain letters, numbers, and underscores and cannot start with a number.
If using a proxy server with Databricks Unity Catalog, click the Enable Proxy Support checkbox and complete the Proxy Host and Proxy Port fields. The username and password fields are optional.
Opt to fill out the Exemption Group field with the name of a group in Databricks that will be excluded from having data policies applied and must not be changed from the default value. Create this account-level group for privileged users and service accounts that require an unmasked view of data before configuring the integration in Immuta.
Opt to scope the query audit ingestion by entering in Unity Catalog Workspace IDs. Enter a comma-separated list of the workspace IDs that you want Immuta to ingest audit records for. If left empty, Immuta will audit all tables and users in Unity Catalog.
Unity Catalog query audit is enabled by default; you can disable it by clicking the Enable Native Query Audit checkbox. Ensure you have enabled system tables in Unity Catalog and provided the required access to the Immuta service principal.
Configure the audit frequency by scrolling to Integrations Settings and find the Unity Catalog Audit Sync Schedule section.
Enter how often, in hours, you want Immuta to ingest audit events from Unity Catalog as an integer between 1 and 24.
Continue with your integration configuration.
Select your authentication method from the dropdown:
Access Token: Enter a Databricks Personal Access Token. This is the access token for the Immuta service principal. This service principal must have the metastore privileges listed above for the metastore associated with the Databricks workspace. If this token is configured to expire, update this field regularly for the integration to continue to function.
OAuth machine-to-machine (M2M):
AWS Databricks:
Follow Databricks documentation to create a client secret for the Immuta service principal and assign this service principal the privileges listed above for the metastore associated with the Databricks workspace.
Fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.cloud.databricks.com/oidc/v1/token
.
Fill out the Client ID. This is a combination of letters, numbers, or symbols, used as a public identifier and is the client ID displayed in Databricks when creating the client secret for the service principal.
Enter the Scope (string). The scope limits the operations and roles allowed in Databricks by the access token. See the OAuth 2.0 documentation for details about scopes.
Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
Azure Databricks:
Follow Databricks documentation to create a service principal within Azure and then populate to your Databricks account and workspace.
Assign this service principal the privileges listed above for the metastore associated with the Databricks workspace.
Within Databricks, create an OAuth client secret for the service principal. This completes your Databricks-based service principal setup.
Within Immuta, fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.azuredatabricks.net/oidc/v1/token
.
Fill out the Client ID. This is a combination of letters, numbers, or symbols, used as a public identifier and is the client ID displayed in Databricks when creating the client secret for the service principal (note that Azure Databricks uses the Azure SP Client ID; it will be identical).
Enter the Scope (string). The scope limits the operations and roles allowed in Databricks by the access token. See the OAuth 2.0 documentation for details about scopes.
Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
Click Save.
Required permissions: When performing a manual setup, a service principal and a Databricks account must have the permissions listed above.
Click the App Settings icon in the left sidebar.
Click the Integrations tab.
Click + Add Native Integration and select Databricks Unity Catalog from the dropdown menu.
Complete the following fields:
Server Hostname is the hostname of your Databricks workspace.
HTTP Path is the HTTP path of your Databricks cluster or SQL warehouse.
Immuta Catalog is the name of the catalog Immuta will create to store internal entitlements and other user data specific to Immuta. This catalog will only be readable for the Immuta service principal and should not be granted to other users. The catalog name may only contain letters, numbers, and underscores and cannot start with a number.
If using a proxy server with Databricks Unity Catalog, click the Enable Proxy Support checkbox and complete the Proxy Host and Proxy Port fields. The username and password fields are optional.
Opt to fill out the Exemption Group field with the name of a group in Databricks that will be excluded from having data policies applied and must not be changed from the default value. Create this account-level group for privileged users and service accounts that require an unmasked view of data before configuring the integration in Immuta.
Opt to scope the query audit ingestion by entering in Unity Catalog Workspace IDs. Enter a comma-separated list of the workspace IDs that you want Immuta to ingest audit records for. If left empty, Immuta will audit all tables and users in Unity Catalog.
Unity Catalog query audit is enabled by default; you can disable it by clicking the Enable Native Query Audit checkbox. Ensure you have enabled system tables in Unity Catalog and provided the required access to the Immuta service principal.
Configure the audit frequency by scrolling to Integrations Settings and find the Unity Catalog Audit Sync Schedule section.
Enter how often, in hours, you want Immuta to ingest audit events from Unity Catalog as an integer between 1 and 24.
Continue with your integration configuration.
Select your authentication method from the dropdown:
Access Token: Enter a Databricks Personal Access Token. This is the access token for the Immuta service principal. This service principal must have the metastore privileges listed above for the metastore associated with the Databricks workspace. If this token is configured to expire, update this field regularly for the integration to continue to function.
OAuth machine-to-machine (M2M):
AWS Databricks:
Follow Databricks documentation to create a client secret for the Immuta service principal and assign this service principal the privileges listed above for the metastore associated with the Databricks workspace.
Fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.cloud.databricks.com/oidc/v1/token
.
Fill out the Client ID. This is a combination of letters, numbers, or symbols, used as a public identifier and is the client ID displayed in Databricks when creating the client secret for the service principal.
Enter the Scope (string). The scope limits the operations and roles allowed in Databricks by the access token. See the OAuth 2.0 documentation for details about scopes.
Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
Azure Databricks:
Follow Databricks documentation to create a service principal within Azure and then populate to your Databricks account and workspace.
Assign this service principal the privileges listed above for the metastore associated with the Databricks workspace.
Within Databricks, create an OAuth client secret for the service principal. This completes your Databricks-based service principal setup.
Within Immuta, fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.azuredatabricks.net/oidc/v1/token
.
Fill out the Client ID. This is a combination of letters, numbers, or symbols, used as a public identifier and is the client ID displayed in Databricks when creating the client secret for the service principal (note that Azure Databricks uses the Azure SP Client ID; it will be identical).
Enter the Scope (string). The scope limits the operations and roles allowed in Databricks by the access token. See the OAuth 2.0 documentation for details about scopes.
Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
Select the Manual toggle and copy or download the script. You can modify the script to customize your storage location for tables, schemas, or catalogs.
Run the script in Databricks.
Click Save.
If the usernames in Immuta do not match usernames in Databricks, map each Databricks username to each Immuta user account to ensure Immuta properly enforces policies using one of the methods linked below:
If the Databricks user doesn't exist in Databricks when you configure the integration, manually link their Immuta username to Databricks after they are created in Databricks. Otherwise, policies will not be enforced correctly for them in Databricks. Databricks user identities for Immuta users are automatically marked as invalid when the user is not found during policy application, preventing them from being affected by Databricks policy until their Immuta user identity is manually mapped to their Databricks identity.
Design partner preview
This feature is only available to select accounts. Reach out to your Immuta representative to enable this feature.
Requirement: A Databricks Unity Catalog integration must be configured for tags to be ingested.
To allow Immuta to automatically import table and column tags from Databricks Unity Catalog, enable Databricks Unity Catalog tag ingestion in the external catalog section of the Immuta app settings page.
Navigate to the App Settings page.
Scroll to 2 External Catalogs, and click Add Catalog.
Enter a Display Name and select Databricks Unity Catalog from the dropdown menu.
Click Save and confirm your changes.
The how-to guides linked on this page illustrate how to integrate Databricks Unity Catalog with Immuta.
Requirements:
Unity Catalog metastore created and attached to a Databricks workspace. Immuta supports configuring a single metastore for each configured integration, and that metastore may be attached to multiple Databricks workspaces.
Unity Catalog enabled on your Databricks cluster or SQL warehouse. All SQL warehouses have Unity Catalog enabled if your workspace is attached to a Unity Catalog metastore.
These guides provide information on the recommended features to enable with Databricks Unity Catalog.
Configure your Unity Catalog integration with the following feature enabled: Native query audit (enabled by default)
Select None as your default subscription policy.
These guides provide instructions for organizing your Databricks Unity Catalog data to align with your governance structure.
These guides provide instructions for auditing and detecting your users' activity, or see the Detect use case for a comprehensive guide on the benefits of these features and other recommendations.
These guides provide instructions for discovering, classifying, and tagging your data.
Register a subset of your tables to configure and validate SDD.
Configure SDD to discover entities of interest for your policy needs.
Register your remaining tables at the schema level with schema monitoring turned on.
These guides provide instructions for configuring and securing your data with governance policies, or see the Secure use cases for a comprehensive guide on creating policies to fit your organization's use case.
Validate the policies. You do not have to validate every policy you create in Immuta; instead, examine a few to validate the behavior you expect to see.
Once all Immuta policies are in place, remove or alter old permissions and revoke access to the ungoverned tables.
When you enable Unity Catalog, Immuta automatically migrates your existing Databricks data sources in Immuta to reference the legacy hive_metastore
catalog to account for Unity Catalog's three-level hierarchy. New data sources will reference the Unity Catalog metastore you create and attach to your Databricks workspace.
Because the hive_metastore
catalog is not managed by Unity Catalog, existing data sources in the hive_metastore
cannot have Unity Catalog access controls applied to them. Data sources in the Hive Metastore must be managed by the Databricks Spark integration.
To allow Immuta to administer Unity Catalog access controls on that data, move the data to Unity Catalog and re-register those tables in Immuta by completing the steps below. If you don't move all data before configuring the integration, metastore magic will protect your existing data sources throughout the migration process.
Ensure that all Databricks clusters that have Immuta installed are stopped and the Immuta configuration is removed from the cluster. Immuta-specific cluster configuration is no longer needed with the Databricks Unity Catalog integration.
Move all data into Unity Catalog before configuring Immuta with Unity Catalog. Existing data sources will need to be re-created after they are moved to Unity Catalog and the Unity Catalog integration is configured.
This integration allows you to manage and access data in your Databricks account across all of your workspaces. With Immuta’s Databricks Unity Catalog integration, you can write your policies in Immuta and have them enforced automatically by Databricks across data in your Unity Catalog metastore.
This getting started guide outlines how to integrate Databricks Unity Catalog with Immuta.
Databricks Unity Catalog configuration: Configure the Databricks Unity Catalog integration.
Migrate to Databricks Unity Catalog: Migrate from the legacy Databricks Spark integrations to the Databricks Unity Catalog integration.
Databricks Unity Catalog integration reference guide: This guide describes the design and components of the integration.
Immuta’s integration with Unity Catalog allows you to enforce fine-grained access controls on Unity Catalog securable objects with Immuta policies. Instead of manually creating UDFs or granting access to each table in Databricks, you can author your policies in Immuta and have Immuta manage and orchestrate Unity Catalog access-control policies on your data in Databricks clusters or SQL warehouses:
Subscription policies: Immuta subscription policies automatically grant and revoke access to specific Databricks securable objects.
: Immuta data policies enforce row- and column-level security.
Unity Catalog uses the following hierarchy of data objects:
Metastore: Created at the account level and is attached to one or more Databricks workspaces. The metastore contains metadata of all the catalogs, schemas, and tables available to query. All clusters on that workspace use the configured metastore and all workspaces that are configured to use a single metastore share those objects.
Catalog: Sits on top of schemas (also called databases) and tables to manage permissions across a set of schemas
Schema: Organizes tables and views
Table-etc: Table (managed or external tables), view, volume, model, and function
For details about the Unity Catalog object model, see the .
The Databricks Unity Catalog integration supports
:
applying column masks and row filters on specific securable objects
applying subscription polices on tables and views
enforcing Unity Catalog access controls, even if Immuta becomes disconnected
allowing non-Immuta reads and writes
using Photon
using a proxy server
Immuta uses this service principal to run queries that set up user-defined functions (UDFs) and other data necessary for policy enforcement. Upon enabling the native integration, Immuta will create a catalog that contains these schemas:
immuta_system
: Contains internal Immuta data.
immuta_policies_n
: Contains policy UDFs.
When policies require changes to be pushed to Unity Catalog, Immuta updates the internal tables in the immuta_system
schema with the updated policy information. If necessary, new UDFs are pushed to replace any out-of-date policies in the immuta_policies_n
schemas and any row filters or column masks are updated to point at the new policies. Many of these operations require compute on the configured Databricks cluster or SQL warehouse, so compute must be available for these policies to succeed.
Typical use cases for binding a catalog to specific workspaces include
Ensuring users can only access production data from a production workspace environment.
For example, you may have production data in a prod_catalog
, as well as a production workspace you are introducing to your organization. Binding the prod_catalog
to the prod_workspace
ensures that workspace admins and users can only access prod_catalog
from the prod_workspace
environment.
Ensuring users can only process sensitive data from a specific workspace. Limiting the environments from which users can access sensitive data helps better secure your organization’s data. Limiting access to one workspace also simplifies any monitoring, auditing, and understanding of which users are accessing specific data. This would entail a similar setup as the example above.
Giving users read-only access to production data from a developer workspace.
This enables your organization to effectively conduct development and testing, while minimizing risk to production data. All user access to this catalog from this workspace can be specified as read-only, ensuring developers can access the data they need for testing without risk of any unwanted updates.
Limitations
Each additional workspace connection must be in the same metastore as the primary workspace used to set up the integration.
No two additional workspace connections can be responsible for the same catalog.
Immuta’s Unity Catalog integration applies Databricks table-, row-, and column-level security controls that are enforced natively within Databricks. Immuta's management of these Databricks security controls is automated and ensures that they synchronize with Immuta policy or user entitlement changes.
Row-level security: Immuta applies SQL UDFs to restrict access to rows for querying users.
Column-level security: Immuta applies column-mask SQL UDFs to tables for querying users. These column-mask UDFs run for any column that requires masking.
The Unity Catalog integration supports the following policy types:
Conditional masking
Constant
Custom masking
Hashing
Null
Rounding (date and numeric rounding)
Matching (only show rows where)
Custom WHERE
Never
Where user
Where value in column
Minimization
Time-based restrictions
If you are using views in Databricks Unity Catalog, one of the following must be true for project-scoped purpose exceptions to apply to the views in Databricks:
The view and underlying table are registered as Immuta data sources and added to a project: If a view and its underlying table are both added as Immuta data sources, both of these assets must be added to the project for the project-scoped purpose exception to apply. If a view and underlying table are both added as data sources but the table is not added to an Immuta project, the purpose exception will not apply to the view because Databricks does not support fine-grained access controls on views.
Only the underlying table is registered as an Immuta data source and added to a project: If only the underlying table is registered as an Immuta data source but the view is not registered, the purpose exception will apply to both the table and corresponding view in Databricks. Views are the only Databricks object that will have Immuta policies applied to them even if they're not registered as Immuta data sources (as long as their underlying tables are registered).
Some users may need to be exempt from masking and row-level policy enforcement. When you add user accounts to the configured exemption group in Databricks, Immuta will not enforce policies for those users. Exemption groups are created when the Unity Catalog integration is configured, and no policies will apply to these users' queries, despite any policies enforced on the tables they query.
The principal used to register data sources in Immuta will be automatically added to this exemption group for that Databricks table. Consequently, users added to this list and used to register data sources in Immuta should be limited to service accounts.
hive_metastore
When enabling Unity Catalog support in Immuta, the catalog for all Databricks data sources will be updated to point at the default hive_metastore
catalog. Internally, Databricks exposes this catalog as a proxy to the workspace-level Hive metastore that schemas and tables were kept in before Unity Catalog. Since this catalog is not a real Unity Catalog catalog, it does not support any Unity Catalog policies. Therefore, Immuta will ignore any data sources in the hive_metastore
in any Databricks Unity Catalog integration, and policies will not be applied to tables there.
The Databricks Unity Catalog integration supports the following authentication methods to configure the integration and create data sources:
Access requirements
For Databricks Unity Catalog audit to work, Immuta must have, at minimum, the following access.
USE CATALOG
on the system
catalog
USE SCHEMA
on the system.access
schema
SELECT
on the following system tables:
system.access.audit
system.access.table_lineage
system.access.column_lineage
Design partner preview: This feature is available to select accounts. Reach out to your Immuta representative to enable this feature.
You can enable tag ingestion to allow Immuta to ingest Databricks Unity Catalog table and column tags so that you can use them in Immuta policies to enforce access controls. When you enable this feature, Immuta uses the credentials and connection information from the Databricks Unity Catalog integration to pull tags from Databricks and apply them to data sources as they are registered in Immuta. If Databricks data sources preexist the Databricks Unity Catalog tag ingestion enablement, those data sources will automatically sync to the catalog and tags will apply. Immuta checks for changes to tags in Databricks and syncs Immuta data sources to those changes every 24 hours.
When syncing data sources to Databricks Unity Catalog tags, Immuta pulls the following information:
Table tags: These tags apply to the table and appear on the data source overview tab. Databricks tags' key and value pairs are reflected in Immuta as a hierarchy with each level separated by a .
delimiter. For example, the Databricks Unity Catalog tag Location: US
would be represented as Location.US
in Immuta.
Column tags: These tags are applied to data source columns and appear on the columns listed in the data dictionary tab. Databricks tags' key and value pairs are reflected in Immuta as a hierarchy with each level separated by a .
delimiter. For example, the Databricks Unity Catalog tag Location: US
would be represented as Location.US
in Immuta.
Table comments field: This content appears as the data source description on the data source overview tab.
Column comments field: This content appears as dictionary column descriptions on the data dictionary tab.
Only tags that apply to Databricks data sources in Immuta are available to build policies in Immuta. Immuta will not pull tags in from Databricks Unity Catalog unless those tags apply to registered data sources.
Cost implications: Tag ingestion in Databricks Unity Catalog requires compute resources. Therefore, having many Databricks data sources or frequently manually syncing data sources to Databricks Unity Catalog may incur additional costs.
Databricks Unity Catalog tag ingestion only supports tenants with fewer than 2,500 data sources registered.
The table below outlines the integrations supported for various Databricks cluster configurations. For example, the only integration available to enforce policies on a cluster configured to run on Databricks Runtime 9.1 is the Databricks Spark integration.
Legend:
Row access policies with more than 1023 columns are unsupported. This is an underlying limitation of UDFs in Databricks. Immuta will only create row access policies with the minimum number of referenced columns. This limit will therefore apply to the number of columns referenced in the policy and not the total number in the table.
If you disable table grants, Immuta revokes the grants. Therefore, if users had access to a table before enabling Immuta, they’ll lose access.
You must use the global regex flag (g
) when creating a regex masking policy in this integration, and you cannot use the case insensitive regex flag (i
) when creating a regex masking policy in this integration. See the examples below for guidance:
regex with a global flag (supported): /^ssn|social ?security$/g
regex without a global flag (unsupported): /^ssn|social ?security$/
regex with a case insensitive flag (unsupported): /^ssn|social ?security$/gi
regex without a case insensitive flag (supported): /^ssn|social ?security$/g
If a registered data source is owned by a Databricks group at the table level, then the Unity Catalog integration cannot apply data masking policies to that table in Unity Catalog.
Therefore, set all table-level ownership on your Unity Catalog data sources to an individual user or service principal instead of a Databricks group. Catalogs and schemas can still be owned by a Databricks group, as ownership at that level doesn't interfere with the integration.
The following features are currently unsupported:
Databricks change data feed support
Immuta projects
Multiple IAMs on a single cluster
Column masking policies on views
Mixing masking policies on the same column
Row-redaction policies on views
R and Scala cluster support
Scratch paths
User impersonation
Policy enforcement on raw Spark reads
Python UDFs for advanced masking functions
Direct file-to-SQL reads
Data policies (except for masking with NULL) on ARRAY, MAP, or STRUCT type columns
Snippets for Databricks data sources may be empty in the Immuta UI.
Unity Catalog supports managing permissions account-wide in Databricks through controls applied directly to objects in the metastore. To establish a connection with Databricks and apply controls to securable objects within the metastore, Immuta requires a service principal with permissions to manage all data protected by Immuta. (OAuth M2M) or a personal access token (PAT) can be provided for Immuta to authenticate as the service principal. (See the for a list of specific Databricks privileges.)
Workspace-catalog binding allows users to leverage Databricks’ catalog isolation mode to limit catalog access to specific Databricks workspaces. The default isolation mode is OPEN, meaning all workspaces can access the catalog (with the exception of the automatically-created ), provided they are in the metastore attached to the catalog. Setting this mode to ISOLATED allows the catalog owner to specify a workspace-catalog binding, which means the owner can dictate which workspaces are authorized to access the catalog. This prevents other workspaces from accessing the specified catalogs. To bind a catalog to a specific workspace in Databricks Unity Catalog, see the .
Immuta’s Databricks Unity Catalog integration allows users to configure additional workspace connections to support using Databricks' feature. Users can configure additional workspace connections in their Immuta integrations to be consistent with the workspace-catalog bindings that are set up in Databricks. Immuta will use each additional workspace connection to govern the catalog(s) that workspace is bound to in Databricks. If desired, each set of bound catalogs can also be configured to run on its own compute.
To use this feature, you should first . Once that is configured, you can use Immuta's Integrations API to configure an additional workspace connection. This can be added when you or by .
Additional workspace connections in Databricks Unity Catalog are not currently supported in Immuta's .
Table-level security: Immuta manages and privileges on securable objects in Databricks through subscription policies. When you create a subscription policy in Immuta, Immuta uses the Unity Catalog API to issue GRANTS or REVOKES against the catalog, schema, or table in Databricks for every user affected by that subscription policy.
Regex: You must use the global regex flag (g
) when creating a regex masking policy in this integration. You cannot use the case insensitive regex flag (i
) when creating a regex masking policy in this integration. See the for examples.
Project-scoped purpose exceptions for Databricks Unity Catalog integrations allow you to apply to Databricks data sources in a project. As a result, users can only access that data when they are working within that specific project.
This feature allows masked columns to be joined across data sources that belong to the same project. When data sources do not belong to a project, Immuta uses a unique salt per data source for hashing to prevent masked values from being joined. (See the guide for an explanation of that behavior.) However, once you add Databricks Unity Catalog data sources to a project and enable masked joins, Immuta uses a consistent salt across all the data sources in that project to allow the join.
For more information about masked joins and enabling them for your project, see the of documentation.
However, with you can use hive_metastore
and enforce subscription and data policies with the .
Personal access token (PAT): This is the access token for the Immuta service principal. This service principal must have the metastore privileges listed in the section for the metastore associated with the Databricks workspace. If this token is configured to expire, update this field regularly for the integration to continue to function.
OAuth machine-to-machine (M2M): Immuta uses the to integrate with , which allows Immuta to authenticate with Databricks using a client secret. Once Databricks verifies the Immuta service principal’s identity using the client secret, Immuta is granted a temporary OAuth token to perform token-based authentication in subsequent requests. When that token expires (after one hour), Immuta requests a new temporary token. See the for more details.
The Unity Catalog data object model introduces a 3-tiered namespace, as . Consequently, your Databricks tables registered as data sources in Immuta will reference the catalog, schema (also called a database), and table.
External data connectors and query-federated tables are preview features in Databricks. See the for details about the support and limitations of these features before registering them as data sources in the Unity Catalog integration.
The Databricks Unity Catalog integration audits user queries run in clusters or SQL warehouses for deployments configured with the Databricks Unity Catalog integration. The audit ingest is set when and the audit logs can be scoped to only ingest specific workspaces if needed.
See the for details about manually prompting ingest of audit logs and the contents of the logs.
Once external tags are applied to Databricks data sources, those tags can be used to create and .
To enable Databricks Unity Catalog tag ingestion, see the page.
After making changes to tags in Databricks, you can so that the changes immediately apply to the data sources in Immuta. Otherwise, tag changes will automatically sync within 24 hours.
for a list of requirements.
The feature or integration is enabled.
The feature or integration is disabled.
Unity Catalog row- and column-level security controls are unsupported for single-user clusters. See the for details about this limitation.
.
Cluster 1
9.1
Unavailable
Unavailable
Cluster 2
10.4
Unavailable
Unavailable
Cluster 3
11.3
Unavailable
Cluster 4
11.3
Cluster 5
11.3
/
The status of the integration is visible on the integrations tab of the Immuta application settings page. If errors occur in the integration, a banner will appear in the Immuta UI with guidance for remediating the error .
The definitions for each status and the state of configured data platform integrations is available in the response schema of the integrations API. However, the UI consolidates these error statuses and provides detail in the error messages.