Unity Catalog Integration Reference

Databricks Unity Catalog allows you to manage and access data in your Databricks account across all of your workspaces and introduces fine-grained access controls in Databricks.

Immuta’s integration with Unity Catalog allows you to manage multiple Databricks workspaces through Unity Catalog while protecting your data with Immuta policies. Instead of manually creating UDFs or granting access to each table in Databricks, you can author your policies in Immuta and have Immuta manage and enforce Unity Catalog access-control policies on your data in Databricks clusters or SQL warehouses:

  • Subscription policies: Immuta subscription policies automatically grant and revoke access to Databricks tables.

  • Data policies: Immuta data policies enforce row- and column-level security without creating views, so users can query tables as they always have without their workflows being disrupted.

Unity Catalog object model

Unity Catalog uses the following hierarchy of data objects:

  • Metastore: Created at the account level and is attached to one or more Databricks workspaces. The metastore contains metadata of all the catalogs, schemas, and tables available to query. All clusters on that workspace use the configured metastore and all workspaces that are configured to use a single metastore share those objects.

  • Catalog: A catalog sits on top of schemas (also called databases) and tables to manage permissions across a set of schemas.

  • Schema: Organizes tables and views.

  • Table: Tables can be managed or external tables.

For details about the Unity Catalog object model, see the Databricks Unity Catalog documentation.

Feature support

The Databricks Unity Catalog integration supports

Architecture

Unity Catalog supports managing permissions at the Databricks account level through controls applied directly to objects in the metastore. To interact with the metastore and apply controls to any table, Immuta requires a personal access token (PAT) for an Immuta system account user with permissions to manage all data protected by Immuta. See the permissions requirements section for a list of specific Databricks privileges.

Immuta uses this Immuta system account user to run queries that set up all the tables, user-defined functions (UDFs), and other data necessary for policy enforcement. Upon enabling the native integration, Immuta will create a catalog named after your provided workspaceName that contains two schemas:

  • immuta_system: Contains internal Immuta data.

  • immuta_policies: Contains policy UDFs.

When policies require changes to be pushed to Unity Catalog, Immuta updates the internal tables in the immuta_system schema with the updated policy information. If necessary, new UDFs are pushed to replace any out-of-date policies in the immuta_policies schema and any row filters or column masks are updated to point at the new policies. Many of these operations require compute on the configured Databricks cluster or SQL endpoint, so compute must be available for these policies to succeed.

Policy enforcement

Immuta’s Unity Catalog integration applies Databricks table-, row-, and column-level security controls that are enforced natively within Databricks. Immuta's management of these Databricks security controls is automated and ensures that they synchronize with Immuta policy or user entitlement changes.

  • Table-level security: Immuta manages REVOKE and GRANT privileges on securable objects in Databricks through subscription policies. When you create a subscription policy in Immuta, Immuta uses the Unity Catalog API to issue GRANTS or REVOKES against the catalog, schema, or table in Databricks for every user affected by that subscription policy.

  • Row-level security: Immuta applies SQL UDFs to restrict access to rows for querying users.

  • Column-level security: Immuta applies column-mask SQL UDFs to tables for querying users. These column-mask UDFs run for any column that requires masking.

The Unity Catalog integration supports the following policy types:

  • Select masking policies

    • Conditional masking

    • Constant

    • Custom masking

    • Hashing

    • Null

    • Regex: You must use the global regex flag (g) when creating a regex masking policy in this integration. You cannot use the case insensitive regex flag (i) when creating a regex masking policy in this integration. See the limitations section for examples.

    • Rounding (date and numeric rounding)

  • Row-level policies

    • Matching (only show rows where)

      • Custom WHERE

      • Never

      • Where user

      • Where value in column

    • Minimization

    • Time-based restrictions

Policy exemption groups

Some users may need to be exempt from masking and row-level policy enforcement. When you add user accounts to the configured exemption group in Databricks, Immuta will not enforce policies for those users. Exemption groups are created when the Unity Catalog integration is configured, and no policies will apply to these users' queries, despite any policies enforced on the tables they query.

The principal used to register data sources in Immuta will be automatically added to this exemption group for that Databricks table. Consequently, users added to this list and used to register data sources in Immuta should be limited to service accounts.

Policy support with hive_metastore

When enabling Unity Catalog support in Immuta, the catalog for all Databricks data sources will be updated to point at the default hive_metastore catalog. Internally, Databricks exposes this catalog as a proxy to the workspace-level Hive metastore that schemas and tables were kept in before Unity Catalog. Since this catalog is not a real Unity Catalog catalog, it does not support any Unity Catalog policies. Therefore, Immuta will ignore any data sources in the hive_metastore in any Databricks Unity Catalog integration, and policies will not be applied to tables there.

However, with Databricks metastore magic you can use hive_metastore and enforce subscription and data policies with the Databricks Spark integration.

Authentication method

The Databricks Unity Catalog integration supports the access token method to configure the integration and create data sources in Immuta. This is the access token for the Immuta service principal. This service principal must have the metastore privileges listed in the permissions section for the metastore associated with the Databricks workspace. If this token is configured to expire, update this field regularly for the integration to continue to function.

Immuta data sources in Unity Catalog

The Unity Catalog data object model introduces a 3-tiered namespace, as outlined above. Consequently, your Databricks tables registered as data sources in Immuta will reference the catalog, schema (also called a database), and table.

External data connectors and query-federated tables

External data connectors and query-federated tables are preview features in Databricks. See the Databricks documentation for details about the support and limitations of these features before registering them as data sources in the Unity Catalog integration.

Native query audit

Access requirements

For Databricks Unity Catalog audit to work, Immuta must have, at minimum, the following access.

  • USE CATALOG on the system catalog

  • USE SCHEMA on the system.access schema

  • SELECT on the following system tables:

    • system.access.audit

    • system.access.table_lineage

    • system.access.column_lineage

The Databricks Unity Catalog integration audits user queries run in clusters or SQL warehouses for deployments configured with the Databricks Unity Catalog integration. The audit ingest is set when configuring the integration and the audit logs can be scoped to only ingest specific workspaces if needed.

See the Unity Catalog native audit page for details about manually prompting ingest of audit logs and the contents of the logs.

Configuration requirements

See the Enable Unity Catalog guide for a list of requirements.

Supported Databricks cluster configurations

The table below outlines the integrations supported for various Databricks cluster configurations. For example, the only integration available to enforce policies on a cluster configured to run on Databricks Runtime 9.1 is the Databricks Spark integration.

Example clusterDatabricks RuntimeUnity Catalog in DatabricksDatabricks Spark integrationDatabricks Spark with Unity Catalog supportDatabricks Unity Catalog integration

Cluster 1

9.1

Unavailable

Unavailable

Cluster 2

10.4

Unavailable

Unavailable

Cluster 3

11.3

/

/

Unavailable

Cluster 4

11.3

Cluster 5

11.3

Legend:

  • The feature or integration is enabled.

  • The feature or integration is disabled.

Unity Catalog caveats

  • Unity Catalog row- and column-level security controls are unsupported for single-user clusters. See the Databricks documentation for details about this limitation.

  • Row access policies with more than 1023 columns are unsupported. This is an underlying limitation of UDFs in Databricks. Immuta will only create row access policies with the minimum number of referenced columns. This limit will therefore apply to the number of columns referenced in the policy and not the total number in the table.

  • If you disable table grants, Immuta revokes the grants. Therefore, if users had access to a table before enabling Immuta, they’ll lose access.

  • You must use the global regex flag (g) when creating a regex masking policy in this integration, and you cannot use the case insensitive regex flag (i) when creating a regex masking policy in this integration. See the examples below for guidance:

    • regex with a global flag (supported): /^ssn|social ?security$/g

    • regex without a global flag (unsupported): /^ssn|social ?security$/

    • regex with a case insensitive flag (unsupported): /^ssn|social ?security$/gi

    • regex without a case insensitive flag (supported): /^ssn|social ?security$/g

Azure Databricks Unity Catalog limitation

If a registered data source is owned by a Databricks group at the table level, then the Unity Catalog integration cannot apply data masking policies to that table in Unity Catalog.

Therefore, set all table-level ownership on your Unity Catalog data sources to an individual user or service principal instead of a Databricks group. Catalogs and schemas can still be owned by a Databricks group, as ownership at that level doesn't interfere with the integration.

Feature limitations

The following features are currently unsupported:

  • Databricks change data feed support

  • Immuta projects

  • Multiple IAMs on a single cluster

  • Column masking policies on views

  • Mixing masking policies on the same column

  • Row-redaction policies on views

  • R and Scala cluster support

  • Scratch paths

  • User impersonation

  • Policy enforcement on raw Spark reads

  • Python UDFs for advanced masking functions

  • Direct file-to-SQL reads

  • Data policies on ARRAY, MAP, or STRUCT type columns

Known issue

Snippets for Databricks data sources may be empty in the Immuta UI.

Next

Configure the Databricks Unity Catalog integration.

Last updated

Copyright © 2014-2024 Immuta Inc. All rights reserved.