> For the complete documentation index, see [llms.txt](https://documentation.immuta.com/2024.2/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://documentation.immuta.com/2024.2/data-and-integrations/databricks-unity-catalog/unity-catalog-overview.md).

# Databricks Unity Catalog Integration Reference Guide

Immuta’s integration with Unity Catalog allows you to enforce fine-grained access controls on Unity Catalog securable objects with Immuta policies. Instead of manually creating UDFs or granting access to each table in Databricks, you can author your policies in Immuta and have Immuta manage and orchestrate Unity Catalog access-control policies on your data in Databricks clusters or SQL warehouses:

* **Subscription policies**: Immuta subscription policies automatically grant and revoke access to specific Databricks securable objects.
* [**Data policies**](#policy-enforcement): Immuta data policies enforce row- and column-level security.

## Unity Catalog object model

Unity Catalog uses the following hierarchy of data objects:

* **Metastore**: Created at the account level and is attached to one or more Databricks workspaces. The metastore contains metadata of all the catalogs, schemas, and tables available to query. All clusters on that workspace use the configured metastore and all workspaces that are configured to use a single metastore share those objects.
* **Catalog**: Sits on top of schemas (also called databases) and tables to manage permissions across a set of schemas
* **Schema**: Organizes tables and views
* **Table-etc**: Table (managed or external tables), view, volume, model, and function

For details about the Unity Catalog object model, see the [Databricks Unity Catalog documentation](https://docs.databricks.com/data-governance/unity-catalog/index.html).

## Feature support

The Databricks Unity Catalog integration supports

* [managing and accessing data across multiple Databricks workspaces](#architecture)
* [enforcing Unity Catalog row-, column-, and table-level access controls on Databricks clusters and SQL warehouses](#policy-enforcement):
  * applying column masks and row filters on specific securable objects
  * applying subscription policies on tables and views
* enforcing Unity Catalog access controls, even if Immuta becomes disconnected
* [auditing activity of both Immuta users and non-Immuta users](#unity-catalog-audit)
* allowing non-Immuta reads and writes
* using Photon
* using a proxy server

## Architecture

Unity Catalog supports managing permissions account-wide in Databricks through controls applied directly to objects in the metastore. To establish a connection with Databricks and apply controls to securable objects within the metastore, Immuta requires [a service principal](#user-content-fn-1)[^1] with privileges to manage all data protected by Immuta. [Databricks OAuth for service principals](https://docs.databricks.com/en/dev-tools/auth/index.html) (OAuth M2M) or a personal access token (PAT) can be provided for Immuta to authenticate as the service principal. See the [permissions requirements section](/2024.2/data-and-integrations/databricks-unity-catalog/how-to-guides/configure.md#permissions) for a list of specific Databricks privileges.

Immuta uses this service principal to run queries that set up user-defined functions (UDFs) and other data necessary for policy enforcement. Upon enabling the integration, Immuta will create a catalog that contains these schemas:

* `immuta_system`: Contains internal Immuta data.
* `immuta_policies_n`: Contains policy UDFs.

When policies require changes to be pushed to Unity Catalog, Immuta updates the internal tables in the `immuta_system` schema with the updated policy information. If necessary, new UDFs are pushed to replace any out-of-date policies in the `immuta_policies_n` schemas and any row filters or column masks are updated to point at the new policies. Many of these operations require compute on the configured Databricks cluster or SQL warehouse, so compute must be available for these policies to succeed.

## Policy enforcement

Immuta’s Unity Catalog integration applies Databricks table-, row-, and column-level security controls that are enforced natively within Databricks. Immuta's management of these Databricks security controls is automated and ensures that they synchronize with Immuta policy or user entitlement changes.

* **Table-level security**: Immuta manages [REVOKE](https://docs.databricks.com/sql/language-manual/security-revoke.html) and [GRANT](https://docs.databricks.com/sql/language-manual/security-grant.html) privileges on Databricks securable objects that have been registered as Immuta data sources. When you register a data source in Immuta, Immuta uses the Unity Catalog API to issue GRANTS or REVOKES against the catalog, schema, or table in Databricks [for every user registered in Immuta](#user-content-fn-2)[^2].
* **Row-level security**: Immuta applies SQL UDFs to restrict access to rows for querying users.
* **Column-level security**: Immuta applies column-mask SQL UDFs to tables for querying users. These column-mask UDFs run for any column that requires masking.

{% hint style="warning" %}
**Policy behavior**

If you enable a Databricks Unity Catalog object in Immuta and it has no subscription policy set on it, Immuta will REVOKE access to that object in Databricks for all Immuta users, even if they had been directly granted access to that table outside of Immuta.

If you disable a Unity Catalog data source in Immuta, all existing grants and policies on that object will be removed in Databricks for all Immuta users. All existing grants and policies will be removed, regardless of whether they were set in Immuta or in Unity Catalog directly.

If a user is not registered in Immuta, Immuta will have no effect on that user's access to data in Unity Catalog.
{% endhint %}

### Supported policies

The Unity Catalog integration supports the following policy types:

* [Subscription policies](/2024.2/secure-your-data/authoring-policies-in-secure/section-contents/reference-guides/subscription-policies.md)
* [Select masking policies](/2024.2/secure-your-data/authoring-policies-in-secure/data-policies/reference-guides/data-policies.md#masking-policies)
  * Conditional masking
  * Constant
  * Custom masking
  * Hashing
  * Null
  * Regex: You must use the global regex flag (`g`) when creating a regex masking policy in this integration. You cannot use the case insensitive regex flag (`i`) when creating a regex masking policy in this integration. See the [limitations section](#unity-catalog-caveats) for examples.
  * Rounding (date and numeric rounding)
* [Row-level policies](/2024.2/secure-your-data/authoring-policies-in-secure/data-policies/reference-guides/data-policies.md#row-level-security-policies)
  * Matching (only show rows where)
    * Custom WHERE
    * Never
    * Where user
    * Where value in column
  * Minimization
  * Time-based restrictions

### Project-scoped purpose exceptions for Databricks Unity Catalog

{% hint style="info" %}
**Public preview:** This feature is available to select accounts. Reach out to your Immuta representative to enable this feature.
{% endhint %}

Project-scoped purpose exceptions for Databricks Unity Catalog integrations allow you to apply [purpose-based policies](/2024.2/secure-your-data/authoring-policies-in-secure/data-policies/reference-guides/data-policies.md#limit-to-purpose-policies) to Databricks data sources in a project. As a result, users can only access that data when they are working within that specific project.

#### **Databricks Unity Catalog views**

If you are using views in Databricks Unity Catalog, one of the following must be true for project-scoped purpose exceptions to apply to the views in Databricks:

* **The view and underlying table are registered as Immuta data sources and added to a project**: If a view and its underlying table are both added as Immuta data sources, both of these assets must be added to the project for the project-scoped purpose exception to apply. If a view and underlying table are both added as data sources but the table is not added to an Immuta project, the purpose exception will not apply to the view because Databricks does not support fine-grained access controls on views.
* **Only the underlying table is registered as an Immuta data source and added to a project**: If only the underlying table is registered as an Immuta data source but the view is not registered, the purpose exception will apply to both the table and corresponding view in Databricks. Views are the only Databricks object that will have Immuta policies applied to them even if they're not registered as Immuta data sources (as long as their underlying tables are registered).

### Policy exemption group

The Databricks group configured as the policy exemption group in Immuta will be exempt from Immuta data policy enforcement. This account-level group is created and managed in Databricks, not in Immuta. This group does not need to be assigned to a Databricks workspace.

If you have service or system accounts that need to be exempt from masking and row-level policy enforcement, add them to an account-level group in Databricks and include this group name in the Databricks Unity Catalog configuration in Immuta. Then, group members will be excluded from having data policies applied to them when they query Immuta-protected tables in Databricks.

Typically, service or system accounts that perform the following actions are added to an exemption group in Databricks:

* Automated queries
* ETL
* Report generation

If you have multiple groups that must be exempt from data policies, add each group to a single group in Databricks that you then set as the policy exemption group in Immuta.

The service principal used to register data sources in Immuta will be automatically added to the exemption group for the Databricks securables it registers. Consequently, accounts added to the exemption group and used to register data sources in Immuta should be limited to service accounts.

For guidance on configuring a policy exemption group on the Immuta app settings page, see the [Configure a Databricks Unity Catalog integration guide](/2024.2/data-and-integrations/databricks-unity-catalog/how-to-guides/configure.md). Alternatively, this group can be configured via the [integrations API](/2024.2/developer-guides/api-intro/integrations-api/how-to-guides/databricks-uc-api.md) using the `groupPattern` object.

### Policy support with `hive_metastore`

When enabling Unity Catalog support in Immuta, the catalog for all Databricks data sources will be updated to point at the default `hive_metastore` catalog. Internally, Databricks exposes this catalog as a proxy to the workspace-level Hive metastore that schemas and tables were kept in before Unity Catalog. Since this catalog is not a real Unity Catalog catalog, it does not support any Unity Catalog policies. Therefore, Immuta will ignore any data sources in the `hive_metastore` in any Databricks Unity Catalog integration, and policies will not be applied to tables there.

However, with [Databricks metastore magic](/2024.2/data-and-integrations/databricks-spark/reference-guides/metastore-magic.md) you can use `hive_metastore` and enforce subscription and data policies with the [Databricks Spark integration](/2024.2/data-and-integrations/databricks-spark/how-to-guides/configuration/simplified.md).

## Authentication methods

The Databricks Unity Catalog integration supports the following authentication methods to configure the integration and create data sources:

* **Personal access token (PAT)**: This is the access token for the Immuta service principal. This service principal must have the metastore privileges listed in the [permissions](/2024.2/data-and-integrations/databricks-unity-catalog/how-to-guides/configure.md#permissions) section for the metastore associated with the Databricks workspace. If this token is configured to expire, update this field regularly for the integration to continue to function.
* **OAuth machine-to-machine (M2M)**: Immuta uses the [Client Credentials Flow](https://auth0.com/docs/get-started/authentication-and-authorization-flow/client-credentials-flow) to integrate with [Databricks OAuth machine-to-machine authentication](https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html), which allows Immuta to authenticate with Databricks using a client secret. Once Databricks verifies the Immuta service principal’s identity using the client secret, Immuta is granted a temporary OAuth token to perform token-based authentication in subsequent requests. When that token expires (after one hour), Immuta requests a new temporary token. See the [Databricks OAuth machine-to-machine (M2M) authentication page](https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html) for more details.

## Immuta data sources in Unity Catalog

The Unity Catalog data object model introduces a 3-tiered namespace, as [outlined above](#unity-catalog-object-model). Consequently, your Databricks tables registered as data sources in Immuta will reference the catalog, schema (also called a database), and table.

### External data connectors and query-federated tables

External data connectors and query-federated tables are preview features in Databricks. See the [Databricks documentation](https://docs.databricks.com/query-federation/index.html) for details about the support and limitations of these features before registering them as data sources in the Unity Catalog integration.

## Query audit

{% hint style="info" %}
**Access requirements**

For Databricks Unity Catalog audit to work, Immuta must have, at minimum, the following access.

* `USE CATALOG` on the `system` catalog
* `USE SCHEMA` on the `system.access` schema
* `SELECT` on the following system tables:
  * `system.access.audit`
  * `system.access.table_lineage`
  * `system.access.column_lineage`
    {% endhint %}

The Databricks Unity Catalog integration audits all user queries run in the integration's clusters or SQL warehouses. See the [Databricks Unity Catalog audit page](/2024.2/detect-your-activity/audit/reference-guides/databricks-uc.md) for details about the contents of the logs.

The audit ingest is set when [configuring the integration](/2024.2/data-and-integrations/databricks-unity-catalog/how-to-guides/configure.md) and can be scoped to only ingest specific workspaces if needed. The default ingest frequency is every hour, but this can be configured to a different frequency on the [Immuta app settings page](/2024.2/application-settings/how-to-guides/config-builder-guide.md#databricks-unity-catalog-configuration). Additionally, audit ingestion can be manually requested at any time from the Immuta audit page. When manually requested, it will only search for new queries that were created since the last query that had been audited. The job is run in the background, so the new queries will not be immediately available.

## Configuration requirements

[See the Enable Unity Catalog guide](/2024.2/data-and-integrations/databricks-unity-catalog/how-to-guides/configure.md#requirements) for a list of requirements.

## Supported Databricks cluster configurations

The table below outlines the integrations supported for various Databricks cluster configurations. For example, the only integration available to enforce policies on a cluster configured to run on Databricks Runtime 9.1 is the Databricks Spark integration.

| Example cluster | Databricks Runtime | Unity Catalog in Databricks |    Databricks Spark integration    | Databricks Unity Catalog integration |
| :-------------: | :----------------: | :-------------------------: | :--------------------------------: | :----------------------------------: |
|    Cluster 1    |         9.1        |         Unavailable         |        :white\_check\_mark:        |              Unavailable             |
|    Cluster 2    |        10.4        |         Unavailable         |        :white\_check\_mark:        |              Unavailable             |
|    Cluster 3    |        11.3        |         :no\_entry:         | :white\_check\_mark: / :no\_entry: |              Unavailable             |
|    Cluster 4    |        11.3        |     :white\_check\_mark:    |             :no\_entry:            |              :no\_entry:             |
|    Cluster 5    |        11.3        |     :white\_check\_mark:    |        :white\_check\_mark:        |         :white\_check\_mark:         |

**Legend**:

* :white\_check\_mark: The feature or integration is enabled.
* :no\_entry: The feature or integration is disabled.

## Unity Catalog caveats

* Row access policies with more than 1023 columns are unsupported. This is an underlying limitation of UDFs in Databricks. Immuta will only create row access policies with the minimum number of referenced columns. This limit will therefore apply to the number of columns referenced in the policy and not the total number in the table.
* If you disable table grants, Immuta revokes the grants. Therefore, if users had access to a table before enabling Immuta, they’ll lose access.
* If multiple Immuta tenants are connected to your Databricks environment, you must create a separate Immuta catalog for each of those tenants during configuration. Having multiple Immuta tenants use the same Immuta catalog causes failures in policy enforcement.
* You must use the global regex flag (`g`) when creating a regex masking policy in this integration, and you cannot use the case insensitive regex flag (`i`) when creating a regex masking policy in this integration. See the examples below for guidance:
  * regex with a global flag (supported): `/^ssn|social ?security$/g`
  * regex without a global flag (unsupported): `/^ssn|social ?security$/`
  * regex with a case insensitive flag (unsupported): `/^ssn|social ?security$/gi`
  * regex without a case insensitive flag (supported): `/^ssn|social ?security$/g`

### Azure Databricks Unity Catalog limitation

If a registered data source is owned by a Databricks group at the table level, then the Unity Catalog integration cannot apply data masking policies to that table in Unity Catalog.

Therefore, set all table-level ownership on your Unity Catalog data sources to an individual user or service principal instead of a Databricks group. Catalogs and schemas can still be owned by a Databricks group, as ownership at that level doesn't interfere with the integration.

### Feature limitations

The following features are currently unsupported:

* Immuta projects (*Enable the* [*project-scoped purpose exceptions feature*](#project-scoped-purpose-exceptions-for-databricks-unity-catalog) *to allow you to apply purpose-based policies to Databricks data sources in a project.*)
* Multiple IAMs on a single cluster
* Column masking policies on views
* Mixing masking policies on the same column
* Row-redaction policies on views
* R and Scala cluster support
* Scratch paths
* User impersonation
* Policy enforcement on raw Spark reads
* Python UDFs for advanced masking functions
* Direct file-to-SQL reads
* Data policies on ARRAY, MAP, or STRUCT type columns
* Shallow clones

### Known issue

Snippets for Databricks data sources may be empty in the Immuta UI.

## Next

[Configure the Databricks Unity Catalog integration](/2024.2/data-and-integrations/databricks-unity-catalog/how-to-guides/configure.md).

[^1]: Immuta only manages grants for securables and [users who are registered in Immuta](/2024.2/people/immuta-users.md). Therefore, to avoid having this service principal's access to data revoked in Databricks, do not register this service principal as an Immuta user.

[^2]: Immuta only manages grants for [users who are registered in Immuta](/2024.2/people/immuta-users.md). If a table has been registered as an Immuta data source, users who are registered in Immuta and who are not subscribed to the data source will have their access REVOKED, even if they had been directly granted access to the table in Unity Catalog. If a user is not registered in Immuta, Immuta will have no effect on that user's access to data in Unity Catalog.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://documentation.immuta.com/2024.2/data-and-integrations/databricks-unity-catalog/unity-catalog-overview.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.