1 of 100

SaaS

Immuta Documentation - SaaS

One platform to optimize how you access and control data.

Immuta gives everyone fast, governed access to data with the built-in controls, collaboration workflows, automated provisioning, and continuous monitoring you need to keep risk low and compliance high.

Configure Immuta

Explore Immuta

Configure

Connect Data Platforms

Immuta integrates with your data platforms so you can register your data and effectively manage access controls on that data.

This section includes guidance for connecting your data platform and keeping it synced with Immuta.

Data platforms overview

This reference guide outlines the features, policies, and audit capabilities of each data platform Immuta supports.

Integrations and connections

The guides in these sections include information about how to connect your data platform to Immuta:

Amazon Redshift
Amazon S3
AWS Lake Formation
Azure Synapse Analytics
Databricks: This section includes guides for Databricks Spark and Databricks Unity Catalog integrations.
Google BigQuery
MariaDB
Oracle
PostgreSQL
Snowflake
SQL Server
Starburst (Trino)
Teradata

Queries Immuta runs in remote platforms

This reference guide outlines the actions and features that trigger Immuta queries in your remote platform that may incur cost.

Connect your data

Immuta integrates with your data platforms so you can register your data and effectively manage access controls on that data. This section includes concept, reference, and how-to guides for registering and managing data sources.

Amazon Redshift

In this integration, Immuta generates policy-enforced views in your configured Redshift schema for tables registered as Immuta data sources.

Getting started

This guide outlines how to integrate Redshift with Immuta.

How-to guides

Redshift integration configuration: Configure the integration in Immuta.
Redshift Spectrum configuration: Configure Redshift Spectrum in Immuta.

Reference guide

Redshift integration reference guide: This guide describes the design and components of the integration.

Getting Started with Redshift

The how-to guides linked on this page illustrate how to integrate Redshift with Immuta. See the reference guide for information about the Redshift integration.

Requirement: Redshift cluster with an RA3 node is required for the multi-database integration. For other instance types, you may configure a single-database integration using one of the Redshift Spectrum options.

Connect your technology

These guides provide instructions on getting your data set up in Immuta for the Marketplace and Governance apps.

Configure your Redshift integration: Configure a Redshift integration with Immuta so that Immuta can create policy protected views for your users to query.
Register Redshift data sources: This will register your data objects into Immuta and allow you to start dictating access through Marketplace or global policies.
Organize your data sources into domains and assign domain permissions to accountable teams: Use domains to segment your data and assign responsibilities to the appropriate team members. These domains will then be used in Marketplace, policies, and identification.

Register your users

These guides provide instructions on getting your users set up in Immuta for the Marketplace and Governance apps.

Connect an IAM: Bring the IAM your organization already uses and allow Immuta to register your users for you.
Map external user IDs from Redshift to Immuta: Ensure the user IDs in Immuta, Redshift, and your IAM are aligned so that the right policies impact the right users.

Start using Marketplace

These guides provide instructions on using Marketplace for the first time.

Publish a data product: Once you register your tables and users, you can immediately start publishing data products in Marketplace.
Request access to a data product: Users must then request access to your data products in Marketplace.
Respond to an access request: To grant access to a data product and its tables, respond to the access request.

Add data metadata

These guides provide instructions on getting your data metadata set up in Immuta for the Governance app.

Connect an external catalog: Bring the external catalog your organization already uses and allow Immuta to continually sync your tags with your data sources for you.
Run identification: Identification allows you to automate data tagging using identifiers that detect certain data patterns.

Start using the Governance app

These guides provide instructions on using the Governance app for the first time.

Author a global subscription policy: Once you add your data metadata to Immuta, you can immediately create policies that utilize your tags and apply to your tables. Subscription policies can be created to dictate access to data sources.
Author a global data policy: Data metadata can also be used to create data policies that apply to data sources as they are registered in Immuta. Data policies dictate what data a user can see once they are granted access to a data source. Using catalog and identification tags you can create proactive policies, knowing that they will apply to data sources as they are added to Immuta with the automated tagging.
Configure audit: Once you have your data sources and users, and policies granting them access, you can set up audit export. This will export the audit logs from policy changes and tagging updates.

How-to Guides

Configure Redshift Integration

This page illustrates how to configure the Redshift integration on the Immuta app settings page. To configure this integration via the Immuta API, see the Integrations API getting started guide.

For instructions on configuring Redshift Spectrum, see the Redshift Spectrum guide.

Requirements

A Redshift cluster with an RA3 node is required for the multi-database integration. You must use a Redshift RA3 instance type because Immuta requires cross-database views, which are only supported in Redshift RA3 instance types. For other instance types, you may configure a single-database integration using one of the Redshift Spectrum options.
For automated installations, the credentials provided must be a Superuser or have the ability to create databases and users and modify grants.
The enable_case_sensitive_identifier parameter must be set to false (default setting) for your Redshift cluster.

Add a Redshift integration

Click the App Settings icon in the navigation menu.
Click the Integrations tab.
Click the +Add Integration button and select Redshift from the dropdown menu.
Complete the Host and Port fields.
Enter an Immuta Database. This is a new database where all secure schemas and Immuta created views will be stored.
Opt to check the Enable Impersonation box and customize the Impersonation Role name as needed. This will allow users to natively impersonate another user.

Select your configuration method

You have two options for configuring your Redshift environment:

Automatic setup: Grant Immuta one-time use of credentials to automatically configure your Redshift environment and the integration.
Manual setup: Run the Immuta script in your Redshift environment yourself to configure your environment and the integration.

Automatic setup

Immuta requires temporary, one-time use of credentials with specific privileges

When performing an automated installation, Immuta requires temporary, one-time use of credentials with the following privileges:

CREATE DATABASE
CREATE USER
REVOKE ALL PRIVILEGES ON DATABASE
GRANT TEMP ON DATABASE
MANAGE GRANTS ON ACCOUNT

These privileges will be used to create and configure a new IMMUTA database within the specified Redshift instance. The credentials are not stored or saved by Immuta, and Immuta doesn’t retain access to them after initial setup is complete.

You can create a new account for Immuta to use that has these privileges, or you can grant temporary use of a pre-existing account. By default, the pre-existing account with appropriate privileges is a Superuser. If you create a new account, it can be deleted after initial setup is complete.

Alternatively, you can create the IMMUTA database within the specified Redshift instance without giving Immuta user credentials for a Superuser using the manual setup option.

Select Automatic.
Enter an Initial Database from your Redshift integration for Immuta to use to connect.
Use the dropdown menu to select your Authentication Method.
1. Username and Password: Enter the Username and Password of the privileged user.
2. AWS Access Key: Enter the Database User, Access Key ID, and Secret Key. Opt to enter in the Session Token.

Manual setup

Required privileges

The specified role used to run the bootstrap needs to have the following privileges:

CREATE DATABASE
CREATE USER
REVOKE ALL PRIVILEGES ON DATABASE
GRANT TEMP ON DATABASE
MANAGE GRANTS ON ACCOUNT

Select Manual and download both of the bootstrap scripts.
Run the bootstrap script (initial database) in the Redshift initial database.
Run the bootstrap script (Immuta database) in the new Immuta Database in Redshift.
Choose your authentication method, and enter the information of the newly created account.

Save the configuration

Click Save.

Register data

Edit a Redshift integration

Click the App Settings icon in the navigation menu.
Navigate to the Integrations tab and click the down arrow next to the Redshift Integration.
Edit the field you want to change. Note any field shadowed is not editable, and the integration must be disabled and re-installed to change it.
Enter Username and Password.
Click Save.

Required privileges

When performing edits to an integration, Immuta requires temporary, one-time use of credentials of a Superuser or a user with the following permissions:

Create Databases
Create users
Modify grants

Alternatively, you can download the Edit Script and run it in Redshift.

Remove a Redshift integration

Disabling Redshift Spectrum

Disabling the Redshift integration is not supported when you set the fields nativeWorkspaceName, nativeViewName, and nativeSchemaName to create Redshift Spectrum data sources. Disabling the integration when these fields are used in metadata ingestion causes undefined behavior.

Click the App Settings icon in the navigation menu.
Navigate to the Integrations tab and click the down arrow next to the Redshift Integration.
Click the checkbox to disable the integration.
Enter the username and password that were used to initially configure the integration.
Click Save.

Configure Redshift Spectrum

Allow Immuta to create secure views of your external tables through one of these methods:

Configure the integration with an existing database that contains the external tables: Instead of creating an immuta database that manages all schemas and views created when Redshift data is registered in Immuta, the integration adds the Immuta-managed schemas and views to an existing database in Redshift
Configure the integration by creating a new immuta database and re-create all of your external tables in that database.

For an overview of the integration, see the Redshift overview documentation.

Requirements

A Redshift cluster with an AWS row-level security patch applied. Contact Immuta for guidance.
An AWS IAM role for Redshift that is associated with your Redshift cluster.
The enable_case_sensitive_identifier parameter must be set to false (default setting) for your Redshift cluster.
The Redshift role used to run the Immuta bootstrap script must have the following privileges when configuring the integration to
- Use an existing database:
  - ALL PRIVILEGES ON DATABASE for the database you configure the integration with, as you must manage grants on that database.
  - CREATE USER
  - GRANT TEMP ON DATABASE
- Create a new database:
  - CREATE DATABASE
  - CREATE USER
  - GRANT TEMP ON DATABASE
  - REVOKE ALL PRIVILEGES ON DATABASE
A Redshift database that contains an external schema and external tables.

Use an existing database

Click the App Settings icon in the navigation menu.
Click the Integrations tab.
Click the +Add Integration button and select Redshift from the dropdown menu.
Complete the Host and Port fields.
Enter the name of the database you created the external schema in as the Immuta Database. This database will store all secure schemas and Immuta-created views.
Opt to check the Enable Impersonation box and customize the Impersonation Role name as needed. This will allow users to natively impersonate another user.
Select Manual and download both of the bootstrap scripts from the Setup section. The specified role used to run the bootstrap needs to have the following privileges:
- ALL PRIVILEGES ON DATABASE for the database you configure the integration with, as you must manage grants on that database.
- CREATE USER
- GRANT TEMP ON DATABASE
Run the bootstrap script (Immuta database) in the Redshift database that contains the external schema.
Choose your authentication method, and enter the credentials from the bootstrap script for the Immuta_System_Account.
Click Save.

Register data

Create a new Immuta database

Click the App Settings icon in the navigation menu.
Click the Integrations tab.
Click the +Add Integration button and select Redshift from the dropdown menu.
Complete the Host and Port fields.
Enter an Immuta Database. This is a new database where all secure schemas and Immuta created views will be stored.
Opt to check the Enable Impersonation box and customize the Impersonation Role name as needed. This will allow users to natively impersonate another user.
Select Manual and download both of the bootstrap scripts from the Setup section. The specified role used to run the bootstrap needs to have the following privileges:
- ALL PRIVILEGES ON DATABASE for the database you configure the integration with, as you must manage grants on that database.
- CREATE DATABASE
- CREATE USER
- GRANT TEMP ON DATABASE
Run the bootstrap script (initial database) in the Redshift initial database.
Run the bootstrap script (Immuta database) in the new Immuta Database in Redshift.
Choose your authentication method, and enter the credentials from the bootstrap script for the Immuta_System_Account.
Click Save.

Then, add your external tables to the Immuta database.

Register data

Reference Guides

Redshift Overview

This page provides an overview of the Redshift integration in Immuta. For a tutorial detailing how to enable this integration, see the installation guide.

Overview

Redshift is a policy push integration that allows Immuta to apply policies directly in Redshift. This allows data analysts to query Redshift views directly instead of going through a proxy and have per-user policies dynamically applied at query time.

Architecture

The Redshift integration will create views from the tables within the database specified when configured. Then, the user can choose the name for the schema where all the Immuta generated views will reside. Immuta will also create the schemas immuta_system, immuta_functions, and immuta_procedures to contain the tables, views, UDFs, and stored procedures that support the integration. Immuta then creates a system role and gives that system account the following privileges:

ALL PRIVILEGES ON DATABASE IMMUTA_DB
ALL PRIVILEGES ON ALL SCHEMAS IN DATABASE IMMUTA_DB
USAGE ON FUTURE PROCEDURES IN SCHEMA IMMUTA_DB.IMMUTA_PROCEDURES
USAGE ON LANGUAGE PLPYTHONU

Additionally the PUBLIC role will be granted the following privileges:

USAGE ON DATABASE IMMUTA_DB
TEMP ON DATABASE IMMUTA_DB
USAGE ON SCHEMA IMMUTA_DB.IMMUTA_PROCEDURES
USAGE ON SCHEMA IMMUTA_DB.IMMUTA_FUNCTIONS
USAGE ON FUTURE FUNCTIONS IN SCHEMA IMMUTA_DB.IMMUTA_FUNCTIONS
USAGE ON SCHEMA IMMUTA_DB.IMMUTA_SYSTEM
SELECT ON TABLES TO public

Integration type

Immuta supports the Redshift integration as both multi-database and single-database integrations. In either integration type, Immuta supports a single integration with secure views in a single database per cluster.

Multi-database integration

If using a multi-database integration, you must use a Redshift cluster with an RA3 node because Immuta requires cross-database views.

Single-database integration

If using a single-database integration, all Redshift cluster types are supported. However, because cross-database queries are not supported in any types other than RA3, Immuta's views must exist in the same database as the raw tables. Consequently, the steps for configuring the integration for Redshift clusters with external tables differ slightly from those that don't have external tables. Allow Immuta to create secure views of your external tables through one of these methods:

configure the integration with an existing database that contains the external tables: Instead of creating an immuta database that manages all schemas and views created when Redshift data is registered in Immuta, the integration adds the Immuta-managed schemas and views to an existing database in Redshift.
configure the integration by creating a new immuta database and re-create all of your external tables in that database.

Policy enforcement

SQL statements are used to create all views, including a join to the secure view: immuta_system.user_profile. This secure view is a select from the immuta_system.profile table (which contains all Immuta users and their current groups, attributes, projects, and a list of valid tables they have access to) with a constraint immuta__userid = current_user() to ensure it only contains the profile row for the current user. The immuta_system.user_profile view is readable by all users, but will only display the data that corresponds to the user executing the query.

The Redshift integration uses webhooks to keep views up-to-date with Immuta data sources. When a data source or policy is created, updated, or disabled, a webhook will be called that will create, modify, or delete the dynamic view. The immuta_system.profile table is updated through webhooks when a user's groups or attributes change, they switch projects, they acknowledge a purpose, or when their data source access is approved or revoked. The profile table can only be read and updated by the Immuta system account.

Integration health status

The status of the integration is visible on the integrations tab of the Immuta application settings page. If errors occur in the integration, a banner will appear in the Immuta UI with guidance for remediating the error.

The definitions for each status and the state of configured data platform integrations is available in the response schema of the integrations API. However, the UI consolidates these error statuses and provides detail in the error messages.

Data flow

An Immuta Application Administrator configures the Redshift integration and registers Redshift warehouse and databases with Immuta.
Immuta creates a database inside the configured Redshift ecosystem that contains Immuta policy definitions and user entitlements.
A Data Owner registers Redshift tables in Immuta as data sources.
A Data Owner, Data Governor, or Administrator creates or changes a policy or user in Immuta.
Data source metadata, tags, user metadata, and policy definitions are stored in Immuta's Metadata Database.
The Immuta Web Service calls a stored procedure that modifies the user entitlements or policies.
A Redshift user who is subscribed to the data source in Immuta queries the corresponding table directly in Redshift through the immuta database and sees policy-enforced data.

Redshift Spectrum

Redshift Spectrum (Redshift external tables) allows Redshift users to query external data directly from files on Amazon S3. Because cross-database queries are not supported in Redshift Spectrum, Immuta's views must exist in the same database as the raw tables. Consequently, the steps for configuring the integration for Redshift clusters with external tables differ slightly from those that don't have external tables. Allow Immuta to create secure views of your external tables through one of these methods:

configure the integration with an existing database that contains the external tables: Instead of creating an immuta database that manages all schemas and views created when Redshift data is registered in Immuta, the integration adds the Immuta-managed schemas and views to an existing database in Redshift
configure the integration by creating a new immuta database and re-create all of your external tables in that database.

Once the integration is configured, Data Owners must register Redshift Spectrum data sources using the Immuta CLI or V2 API.

Redshift Pre-Configuration Details

This page describes the Redshift integration, configuration options, and features. For a tutorial to enable this integration, see the .

Feature Availability

Prerequisites

For automated installations, the credentials provided must be a Superuser or have the ability to create databases and users and modify grants.

Supported Features

Redshift datashares
Redshift Serverless
For configuration and data source registration instructions, see the .

Authentication Methods

The Redshift integration supports the following authentication methods to configure the integration and create data sources:

Username and Password: Users can authenticate with their Redshift username and password.
AWS Access Key: Users can authenticate with an .

Tag Ingestion

Immuta cannot ingest tags from Redshift, but you can connect any of these to work with your integration.

User Impersonation

Required Redshift privileges

Setup User:

OWNERSHIP ON GROUP IMMUTA_IMPERSONATOR_ROLE
CREATE GROUP

Immuta System Account:

GRANT EXECUTE ON PROCEDURE grant_impersonation
GRANT EXECUTE ON PROCEDURE revoke_impersonation

Impersonation allows users to query data as another Immuta user in Redshift. To enable user impersonation, see the page.

Multiple Integrations

Users can enable multiple with a single Immuta tenant.

Redshift Limitations

The host of the data source must match the host of the connection for the view to be created.
When using multiple Redshift integrations, a user has to have the same user account across all hosts.
Case sensitivity of database, table, and column identifiers is not supported. The must be set to false (default setting) for your Redshift cluster to configure the integration and register data sources.

Python UDF Specific Limitations

For most policy types in Redshift, Immuta uses SQL clauses to implement enforcement logic; however Immuta uses Python UDFs in the Redshift integration to implement the following masking policies:

Masking using a regular expression
Reversible masking
Format-preserving masking
Randomized response

The number of Python UDFs that can run concurrently per Redshift cluster is limited to one-fourth of the total concurrency level for the cluster. For example, if the Redshift cluster is configured with a concurrency of 15, a maximum of three Python UDFs can run concurrently. After the limit is reached, Python UDFs are queued for execution within workload management queues.

The SVL_QUERY_QUEUE_INFO view in Redshift, which is visible to a Redshift superuser, summarizes details for queries that spent time in a workload management (WLM) query queue. Queries must be completed in order to appear as results in the SVL_QUERY_QUEUE_INFO view.

If you find that queries on Immuta-built views are spending time in the workload management (WLM) query queue, you should either edit your Redshift cluster configuration to increase concurrency, or use fewer of the masking policies which leverage Python UDFs. For more information on increasing concurrency, see the Redshift docs on implementing .

AWS Lake Formation

Public preview: This connection is available to all accounts.

In the Lake Formation connection, Immuta orchestrates on data registered in the Glue Data Catalog. Then, Immuta users who have been granted access to the Glue Data Catalog table or view can query it using one of these analytic engines:

Amazon Athena
Amazon EMR Spark

This getting started guide outlines how to connect AWS Lake Formation with Immuta.

How-to guide

Reference guides

: This guide describes the design and components of the connection.
: This guide provides an overview of the Immuta features that provide security for your users and that allow you to prove compliance and monitor for anomalies.
: This guide provides an overview of how to protect AWS securables with Immuta policies.
: This guide provides an overview of how AWS users access data registered in Immuta.

Getting Started with AWS Lake Formation

Public preview: This connection is available to all accounts.

The how-to guides linked on this page illustrate how to use AWS Lake Formation with Immuta. See the reference guide for information about the AWS Lake Formation connection.

Connect your technology

These guides provide instructions on getting your data set up in Immuta for the Marketplace and Governance apps.

Register your AWS Lake Formation connection: Using a single setup process, connect AWS Lake Formation to Immuta. This will register your data objects in Immuta and allow you to start dictating access through Marketplace or global policies.
Organize your data sources into domains and assign domain permissions to accountable teams: Use domains to segment your data and assign responsibilities to the appropriate team members. These domains will then be used in Marketplace, policies, audit, and identification.

Register your users

These guides provide instructions on getting your users set up in Immuta for the Marketplace and Governance apps.

Connect an IAM: Bring the IAM your organization already uses and allow Immuta to register your users for you.
Map external user IDs from AWS to Immuta: Ensure the user IDs in Immuta, AWS, and your IAM are aligned so that the right policies impact the right users.

Start using Marketplace

These guides provide instructions on using Marketplace for the first time.

Publish a data product: Once you register your tables and users, you can immediately start publishing data products in Marketplace.
Request access to a data product: Users must then request access to your data products in Marketplace.
Respond to an access request: To grant access to a data product and its tables, respond to the access request.

Add data metadata

These guides provide instructions on getting your data metadata set up in Immuta for the Governance app.

Connect an external catalog: Bring the external catalog your organization already uses and allow Immuta to continually sync your tags with your data sources for you.
Run identification: Identification allows you to automate data tagging using identifiers that detect certain data patterns.

Start using the Governance app

These guides provide instructions on using the Governance app for the first time.

Author a global subscription policy: Once you add your data metadata to Immuta, you can immediately create policies that utilize your tags and apply to your tables. Subscription policies can be created to dictate access to data sources.
Configure audit: Once you have your data sources and users, and policies granting them access, you can set up audit export. This will export the audit logs from policy changes and tagging updates.

Reference Guides

Security and Compliance

Immuta offers several features to provide security for your users and to prove compliance and monitor for anomalies.

Security

Data processing and encryption

See the and the guides for more information about transmission of policy decision data, encryption of data in transit and at rest, and encryption key management.

Authentication

Registering the connection

The Lake Formation connection supports the following authentication methods to register a connection:

Access using AWS IAM role (recommended): Immuta will assume this role when interacting with the AWS API. This option allows you to provide Immuta with an IAM role from your AWS account that is granted a trust relationship with Immuta's IAM role. Immuta will assume this IAM role from Immuta's AWS account in order to perform any operations in your AWS account.
Access using access key and secret access key: These credentials are used temporarily by Immuta to register the connection.

Identity providers for user authentication

The built-in Immuta IAM can be used as a complete solution for authentication and user entitlement. However, you can connect your existing identity management provider to Immuta to use that system for authentication and user entitlement instead.

Each of the supported identity providers includes a specific set of configuration options that enable Immuta to communicate with the IAM system and map the users, permissions, groups, and attributes into Immuta.

See the for a list of supported providers and details.

See the for details about user provisioning and mapping AWS user accounts to Immuta.

Auditing and compliance

Immuta provides governance reports so that data owners and governors can monitor users' access to data and detect anomalies in behavior.

Immuta governance reports allow users with the GOVERNANCE Immuta permission to use a natural language builder to instantly create reports that delineate user activity across Immuta. These reports can be based on various entity types, including users, groups, projects, data sources, purposes, policy types, or connection types.

See the page for a list of report types and guidance.

Protecting Data

In the AWS Lake Formation connection, Immuta orchestrates on data registered in the Glue Data Catalog. Then, Immuta users who have been granted access to the Glue Data Catalog table or view can query it using one of these analytic engines:

Amazon Athena
Amazon EMR Spark
Amazon Redshift Spectrum

The sequence diagram below outlines the events that occur when an Immuta user who is subscribed to a data source submits a query in their AWS analytic engine.

See the for more details about Lake Formation access controls.

Registering a connection

AWS Lake Formation is configured and data is registered through , an Immuta feature that allows administrators to register data objects in a technology through a single connection to make data registration more scalable for your organization.

Once the Lake Formation connection is registered, you can author policies in Immuta to orchestrate Lake Formation access controls.

See the for more details about registering a connection.

Protecting data

After Glue Data Catalog views and tables are registered in Immuta, you can author subscription policies in Immuta to orchestrate Lake Formation access controls. Once a subscription policy is applied, users can be subscribed to data sources in the following ways:

Manually subscribed: If a data owner , Immuta issues a grant directly to the data object in AWS.
Automatically subscribed through policy logic: When a policy is applied to a data source, users who meet the conditions of the policy will be . Then, Immuta generates a Lake Formation tag and applies it to the corresponding data object in AWS and grants subscribers access to that tag, which in turn grants them access to the data. See the for details about this process.

Consider the following example that illustrates how Immuta enforces a subscription policy that only allows users in the analysts group to access the yellow-table. When this policy is authored and applied to the data source, Immuta generates a Lake Formation (LF) tag that is applied to the Glue Data Catalog yellow-table and permissions on that tag are granted to all AWS users (registered in Immuta) that are part of the analysts group.

In the image above, the user in the analysts group accesses yellow-table , while the user who is a part of the research group is denied access.

See the for guidance on applying a subscription policy to a data source. See the page for details about the subscription policy types supported and permissions Immuta grants on securables registered as Immuta data sources.

Accessing Data

Once data is registered through the AWS Lake Formation connection, you will access your data in one of these AWS analytic engines as you normally would:

Amazon Athena
Amazon EMR Spark
Amazon Redshift Spectrum

If you are subscribed to the data source, Immuta either directly grants you access to the resource through Lake Formation or generates and assigns a Lake Formation tag to that resource to grant you access. See the for details about how policies are enforced.

When you submit a query, the analytic engine requests metadata from Glue Data Catalog, which then queries Lake Formation to determine what data you are allowed to see. Then, the analytic engine requests temporary access from Lake Formation, retrieves the data from S3, and filters the data to returns policy-enforced data to you.

The diagram below illustrates how the analytic engine interacts with Glue Data Catalog and Lake Formation to access data.

Azure Synapse Analytics

In this integration, Immuta generates policy-enforced views in a schema in your configured Azure Synapse Analytics Dedicated SQL pool for tables registered as Immuta data sources.

Getting started

This guide outlines how to integrate Azure Synapse Analytics with Immuta.

How-to guide

Azure Synapse Analytics configuration: Configure the integration in Immuta.

Reference guide

Azure Synapse Analytics integration reference guide: This guide describes the design and components of the integration.

Getting Started with Azure Synapse Analytics

The how-to guides linked on this page illustrate how to integrate Azure Synapse Analytics with Immuta. See the reference guide for information about the Azure Synapse Analytics integration.

Requirement: A running Dedicated SQL pool

Connect your technology

These guides provide instructions on getting your data set up in Immuta for the Marketplace and Governance apps.

Configure your Azure Synapse Analytics integration: Configure an Azure Synapse Analytics integration with Immuta so that Immuta can create policy protected views for your users to query.
Register Azure Synapse Analytics data sources: This will register your data objects into Immuta and allow you to start dictating access through Marketplace or global policies.
Organize your data sources into domains and assign domain permissions to accountable teams: Use domains to segment your data and assign responsibilities to the appropriate team members. These domains will then be used in Marketplace and policies.

Register your users

These guides provide instructions on getting your users set up in Immuta for the Marketplace and Governance apps.

Connect an IAM: Bring the IAM your organization already uses and allow Immuta to register your users for you.
Map external user IDs from Azure Synapse Analytics to Immuta: Ensure the user IDs in Immuta, Azure Synapse Analytics, and your IAM are aligned so that the right policies impact the right users.

Start using Marketplace

These guides provide instructions on using Marketplace for the first time.

Publish a data product: Once you register your tables and users, you can immediately start publishing data products in Marketplace.
Request access to a data product: Users must then request access to your data products in Marketplace.
Respond to an access request: To grant access to a data product and its tables, respond to the access request.

Add data metadata

These guides provide instructions on getting your data metadata set up in Immuta for the Governance app.

Connect an external catalog: Bring the external catalog your organization already uses and allow Immuta to continually sync your tags with your data sources for you.

Start using the Governance app

These guides provide instructions on using the Governance app for the first time.

Author a global subscription policy: Once you add your data metadata to Immuta, you can immediately create policies that utilize your tags and apply to your tables. Subscription policies can be created to dictate access to data sources.
Author a global data policy: Data metadata can also be used to create data policies that apply to data sources as they are registered in Immuta. Data policies dictate what data a user can see once they are granted access to a data source. Using catalog tags you can create proactive policies, knowing that they will apply to data sources as they are added to Immuta with the automated tagging.
Configure audit: Once you have your data sources and users, and policies granting them access, you can set up audit export. This will export the audit logs from policy changes and tagging updates.

Configure Azure Synapse Analytics Integration

This page provides a tutorial for enabling the Azure Synapse Analytics integration on the Immuta app settings page. To configure this integration via the Immuta API, see the .

For an overview of the integration, see the documentation.

Requirement

A running Dedicated SQL pool is required.

Add an Azure Synapse Analytics integration

Click the App Settings icon in the navigation menu.
Click the Integrations tab.
Click the +Add Integration button and select Azure Synapse Analytics from the dropdown menu.
Complete the Host, Port, Immuta Database, and Immuta Schema fields.
Opt to check the Enable Impersonation box and customize the Impersonation Role name as needed. This will allow users to natively impersonate another user.
Opt to update the User Profile Delimiters. This will be necessary if any of the provided symbols are used in user profile information.

Select your configuration method

You have two options for configuring your Azure Synapse Analytic environment:

: Grant Immuta one-time use of credentials to automatically configure your environment and the integration.
: Run the Immuta script in your Azure Synapse Analytics environment yourself to configure the integration.

Automatic setup

Enter the in the Privileged User Credentials section.

Manual setup

Select Manual.
Download, fill out the appropriate fields, and run the bootstrap master script and bootstrap script linked in the Setup section.
Enter the username and password in the Immuta System Account Credentials section. The username and password provided must be the credentials that were set in the bootstrap master script when you created the user.

Save the configuration

Click Save.

Register data

Edit an Azure Synapse Analytics integration

Click the App Settings icon in the navigation menu.
Navigate to the Integrations tab and click the down arrow next to the Azure Synapse Analytics Integration.
Edit the field you want to change. Note any field shadowed is not editable, and the integration must be disabled and re-installed to change it.
Enter Username and Password.
Click Save.

Immuta requires temporary, one-time use of credentials with specific permissions

When performing edits to an integration, Immuta requires temporary, one-time use of credentials of a Superuser or a user with the Manage GRANTS permission.

Alternatively, you can download the Edit Script from your Azure Synapse Analytics configuration on the Immuta app settings page and run it in Azure Synapse Analytics.

Remove an Azure Synapse Analytics integration

Click the App Settings icon in the navigation menu.
Navigate to the Integrations tab and click the down arrow next to the Azure Synapse Analytics Integration.
Click the checkbox to disable the integration.
Enter the username and password that were used to initially configure the integration.
Click Save.

Reference Guides

Azure Synapse Analytics Overview

This page describes the Azure Synapse Analytics integration, through which Immuta applies policies directly in Azure Synapse Analytics. For a tutorial on configuring Azure Synapse Analytics see the .

Overview

The Azure Synapse Analytics is a policy push integration that allows Immuta to apply policies directly in Azure Synapse Analytics Dedicated SQL pools without the need for users to go through a proxy. Instead, users can work within their existing Synapse Studio and have per-user policies dynamically applied at query time.

Architecture

This integration works on a per-Dedicated-SQL-pool basis: all of Immuta's policy definitions and user entitlements data need to be in the same pool as the target data sources because Dedicated SQL pools do not support cross-database joins. Immuta creates schemas inside the configured Dedicated SQL pool that contain policy-enforced views that users query.

When the integration is configured, the Application Admin specifies the

Immuta Database: This is the pre-existing database Immuta uses. Immuta will create views from the tables contained in this database, and all schemas and views created by Immuta will exist in this database, such as the schemas immuta_system, immuta_functions, and the immuta_procedures that contain the tables, views, UDFs, and stored procedures that support the integration.
Immuta Schema: The schema that Immuta manages. All views generated by Immuta for tables registered as data sources will be created in this schema.
User Profile Delimiters: Since Azure Synapse Analytics dedicated SQL pools do not support array or hash objects, certain user access information is stored as delimited strings; the Application Admin can modify those delimiters to ensure they do not conflict with possible characters in strings.

For a tutorial on configuring the integration see the .

Data source naming convention

Synapse data sources are represented as views and are under one schema instead of a database, so their view names are a combination of their schema and table name, separated by an underscore.

For example, with a configuration that uses IMMUTA as the schema in the database dedicated_pool, the view name for the data source dedicated_pool.tpc.case would be dedicated_pool.IMMUTA.tpc_case.

You can see the view information on the data source details page under Connection Information.

Policy enforcement

This integration uses webhooks to keep views up-to-date with the corresponding Immuta data sources. When a data source or policy is created, updated, or disabled, a webhook is called that creates, modifies, or deletes the dynamic view in the Immuta schema. Note that only standard views are available because Azure Synapse Analytics Dedicated SQL pools do not support secure views.

Integration health status

The definitions for each status and the state of configured data platform integrations is available in the . However, the UI consolidates these error statuses and provides detail in the error messages.

Data flow

An Immuta Application Administrator , registering their initial Synapse Dedicated SQL pool with Immuta.
Immuta creates Immuta schemas inside the configured Synapse Dedicated SQL pool.
A Data Owner in Immuta as data sources. A Data Owner, Data Governor, or Administrator or in Immuta.
Data source metadata, tags, user metadata, and policy definitions are stored in Immuta's Metadata Database.
The Immuta Web Service calls a stored procedure that modifies the user entitlements or policies and updates data source view definitions as necessary.
A Synapse user who is subscribed to the data source in Immuta in Synapse and sees policy-enforced data.

Azure Synapse Analytics Pre-Configuration Details

This page describes the Azure Synapse integration, configuration options, and features. See the Azure Synapse integration page for a tutorial on enabling the integration and these features through the app settings page.

Feature Availability

Project Workspaces

Query Audit

❌

✅

❌

✅

Prerequisite

A running Dedicated SQL pool

Authentication Method

Immuta only supports the SQL authentication option for Azure Synapse Analytics to configure the integration and create data sources. The Microsoft Entra authentication option is unsupported. See the SQL Authentication in Azure Synapse Analytics documentation for details.

Tag Ingestion

Immuta cannot ingest tags from Synapse, but you can connect any of these supported external catalogs to work with your integration.

User Impersonation

Impersonation allows users to query data as another Immuta user in Synapse. To enable user impersonation, see the User Impersonation page.

Multiple Integrations

A user can configure multiple integrations of Synapse to a single Immuta tenant.

Limitations

Immuta does not support the following masking types in this integration because of limitations with Dedicated SQL pools (linked below). Any column assigned one of these masking types will be masked to NULL:
- Reversible Masking: Synapse UDFs currently only support SQL, but Immuta needs to execute code (such as JavaScript or Python) to support this masking feature. See the Synapse Documentation for details.
- Format Preserving Masking: Synapse UDFs currently only support SQL, but Immuta needs to execute code (such as JavaScript or Python) to support this masking feature. See the Synapse Documentation for details.
- Regex: The built in string replace function does not support full regex. See the Synapse Documentation for details.
The delimiters configured when enabling the integration cannot be changed once they are set. To change the delimiters, the integration has to be disabled and re-enabled.
If the generated view name is more than 128 characters, then the view name is shortened to 128 characters. This could cause collisions between view names if the shortened version is the same for two different data sources.
For proper updates, the Dedicated SQL pools have to be running when changes are made to users or data sources in Immuta.

Databricks

Immuta offers two integrations for Databricks:

Databricks Unity Catalog integration: This integration supports working with database objects registered in Unity Catalog.
Databricks Spark integration: This integration supports working with database objects registered in the legacy Hive metastore.

Which integration should you use?

To determine which integration you should use, consider which metastore you use:

Legacy Hive metastore: Databricks recommends that you migrate all data from the legacy Hive metastore to Unity Catalog. However, when this migration is not possible, use the Databricks Spark integration to protect securables registered in the Hive metastore.
Unity Catalog: To protect securables registered in the Unity Catalog metastore, use the Databricks Unity Catalog integration.
Legacy Hive metastore and Unity Catalog: If you need to work with database objects registered in both the legacy Hive metastore and in Unity Catalog, metastore magic allows you to use both integrations.

Metastore magic

Databricks metastore magic allows you to migrate your data from the Databricks legacy Hive metastore to the Unity Catalog metastore while protecting data and maintaining your current processes in a single Immuta instance.

Databricks metastore magic is for organizations who intend to use the Databricks Unity Catalog integration, but must still protect tables in the Hive metastore until they can migrate all of their data to Unity Catalog.

Requirement

Unity Catalog support is enabled in Immuta.

Databricks metastores and Immuta policy enforcement

Databricks has two built-in metastores that contain metadata about your tables, views, and storage credentials:

Legacy Hive metastore: Created at the workspace level. This metastore contains metadata of the registered securables in that workspace available to query.
Unity Catalog metastore: Created at the account level and is attached to one or more Databricks workspaces. This metastore contains metadata of the registered securables available to query. All clusters on that workspace use the configured metastore and all workspaces that are configured to use a single metastore share those securables.

Databricks allows you to use the legacy Hive metastore and the Unity Catalog metastore simultaneously. However, Unity Catalog does not support controls on the Hive metastore, so you must attach a Unity Catalog metastore to your workspace and move existing databases and tables to the attached Unity Catalog metastore to use the governance capabilities of Unity Catalog.

Immuta's Databricks Spark integration and Unity Catalog integration enforce access controls on the Hive and Unity Catalog metastores, respectively. However, because these metastores have two distinct security models, users were discouraged from using both in a single Immuta instance before metastore magic; the Databricks Spark integration and Unity Catalog integration were unaware of each other, so using both concurrently caused undefined behavior.

Databricks metastore magic solution

Metastore magic reconciles the distinct security models of the legacy Hive metastore and the Unity Catalog metastore, allowing you to use multiple metastores (specifically, the Hive metastore or AWS Glue Data Catalog alongside Unity Catalog metastores) within a Databricks workspace and single Immuta instance and keep policies enforced on all your tables as you migrate them. The diagram below shows Immuta enforcing policies on registered tables across workspaces.

In clusters A and D, Immuta enforces policies on data sources in each workspace's Hive metastore and in the Unity Catalog metastore shared by those workspaces. In clusters B, C, and E (which don't have Unity Catalog enabled in Databricks), Immuta enforces policies on data sources in the Hive metastores for each workspace.

Enforce policies as you migrate

With metastore magic, the Databricks Spark integration enforces policies only on data in the Hive metastore, while the Unity Catalog integration enforces policies on tables in the Unity Catalog metastore. The table below illustrates this policy enforcement.

To enforce plugin-based policies on Hive metastore tables and Unity Catalog native controls on Unity Catalog metastore tables, enable the Databricks Spark integration and the Databricks Unity Catalog integration. Note that some Immuta policies are not supported in the Databricks Unity Catalog integration. See the Databricks Unity Catalog integration reference guide for details.

Enforcing policies on Databricks SQL

Databricks SQL cannot run the Databricks Spark plugin to protect tables, so Hive metastore data sources will not be policy enforced in Databricks SQL.

To enforce policies on data sources in Databricks SQL, use Hive metastore table access controls to manually lock down Hive metastore data sources and the Databricks Unity Catalog integration to protect tables in the Unity Catalog metastore. Table access control is enabled by default on SQL warehouses, and any Databricks cluster without the Immuta plugin must have table access control enabled.

Databricks Spark

This integration enforces policies on Databricks securables registered in the legacy Hive metastore. Once these securables are registered as Immuta data sources, users can query policy-enforced data on Databricks clusters.

The guides in this section outline how to integrate Databricks Spark with Immuta.

Getting started

This getting started guide outlines how to integrate Databricks with Immuta.

How-to guides

Configure a Databricks Spark integration: Configure the Databricks Spark integration.
Manually update your Databricks cluster: Manually update your cluster to reflect changes in the Immuta init script or cluster policies.
Install a trusted library: Register a Databricks library with Immuta as a trusted library to avoid Immuta security manager errors when using third-party libraries.
Project UDFs cache settings: Raise the caching on-cluster and lower the cache timeouts for the Immuta web service to allow use of project UDFs in Spark jobs.
Run R and Scala spark-submit jobs on Databricks: Run R and Scala spark-submit jobs on your Databricks cluster.
DBFS access: Access DBFS in Databricks for non-sensitive data.
Troubleshooting: Resolve errors in the Databricks Spark configuration.

Reference guides

Databricks Spark integration configuration: This guide describes the design and components of the integration.
Security and compliance: This guide provides an overview of the Immuta features that provide security for your users and Databricks clusters and that allow you to prove compliance and monitor for anomalies.
Registering and protecting data: This guide provides an overview of registering Databricks securables and protecting them with Immuta policies.
Accessing data: This guide provides an overview of how Databricks users access data registered in Immuta.

Getting Started with Databricks Spark

The how-to guides linked on this page illustrate how to integrate Databricks Spark with Immuta.

Requirements

If Databricks Unity Catalog is enabled in a Databricks workspace, you must use an Immuta cluster policy when you set up the Databricks Spark integration to create an Immuta-enabled cluster.
If Databricks Unity Catalog is not enabled in your Databricks workspace, you must disable Unity Catalog in your Immuta tenant before proceeding with your configuration of Databricks Spark:
1. Navigate to the App Settings page and click Integration Settings.
2. Uncheck the Enable Unity Catalog checkbox.
3. Click Save.

Connect your technology

These guides provide instructions for getting your data set up in Immuta.

Configure your Databricks Spark integration.
Register Databricks securable objects in Immuta as data sources.
Organize your data sources into domains and assign domain permissions to accountable teams (recommended): Use domains to segment your data and assign responsibilities to the appropriate team members. These domains will then be used in policies, audit, and identification.

Register your users

These guides provide instructions on setting up your users in Immuta.

Integrate an IAM with Immuta: Connect the IAM your organization already uses and allow Immuta to register your users for you.
Map external user IDs from Databricks to Immuta: Ensure the user IDs in Immuta, Databricks, and your IAM are aligned so that the right policies impact the right users.

Add data metadata

These guides provide instructions on getting your data metadata set up in Immuta for use in policies.

Connect an external catalog: Connect the external catalog your organization already uses and allow Immuta to continually sync your tags with your data sources for you.
Run identification: Identification allows you to automate data tagging using identifiers that detect certain data patterns.

Protect and monitor data access

These guides provide instructions on authoring policies and auditing data access.

Author a global subscription policy: Once you add your data metadata to Immuta, you can immediately create policies that utilize your tags and apply to your tables. Subscription policies can be created to dictate access to data sources.
Author a global data policy: Data metadata can also be used to create data policies that apply to data sources as they are registered in Immuta. Data policies dictate what data a user can see once they are granted access to a data source. Using catalog and identification tags you can create proactive policies, knowing that they will apply to data sources as they are added to Immuta with the automated tagging.
Configure audit: Once you have your data sources and users, and policies granting them access, you can set up audit export. This will export the audit logs from user queries, policy changes, and tagging updates.

How-to Guides

Manually Update Your Databricks Cluster

If a Databricks cluster needs to be manually updated to reflect changes in the Immuta init script or cluster policies, you can remove and set up your integration again to get the updated policies and init script.

Log in to Immuta as an Application Admin.
Click the App Settings icon in the navigation menu and scroll to the Integration Settings section.
Your existing Databricks Spark integration should be listed here; expand it and note the configuration values. Now select Remove to remove your integration.
Click Add Integration and select Databricks Integration to add a new integration.
Enter your Databricks Spark integration settings again as configured previously.
Click Add Integration to add the integration, and then select Configure Cluster Policies to set up the updated cluster policies and init script.
Select the cluster policies you wish to use for your Immuta-enabled Databricks clusters.
Automatically push cluster policies and the init script (recommended) or manually update your cluster policies.
- Automatically push cluster policies
  1. Select Automatically Push Cluster Policies and enter your privileged Databricks access token. This token must have privileges to write to cluster policies.
  2. Select Apply Policies to push the cluster policies and init script again.
  3. Click Save and Confirm to deploy your changes.
- Manually update cluster policies
  1. Download the init script and the new cluster policies to your local computer.
  2. Click Save and Confirm to save your changes in Immuta.
  3. Log in to your Databricks workspace with your administrator account to set up cluster policies.
  4. Get the path you will upload the init script (immuta_cluster_init_script_proxy.sh) to by opening one of the cluster policy .json files and looking for the defaultValue of the field init_scripts.0.dbfs.destination. This should be a DBFS path in the form of dbfs:/immuta-plugin/hostname/immuta_cluster_init_script_proxy.sh.
  5. Click Data in the left pane to upload your init script to DBFS to the path you found above.
  6. To find your existing cluster policies you need to update, click Compute in the left pane and select the Cluster policies tab.
  7. Edit each of these cluster policies that were configured before and overwrite the contents of the JSON with the new cluster policy JSON you downloaded.
Restart any Databricks clusters using these updated policies for the changes to take effect.

Install a Trusted Library

In the Databricks Clusters UI, install your third-party library .jar or Maven artifact with Library Source Upload, DBFS, DBFS/S3, or Maven. Alternatively, use the Databricks libraries API.
In the Databricks Clusters UI, add the IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URIS property as a Spark environment variable and set it to your artifact's URI:

For Maven artifacts, the URI is maven:/<maven_coordinates>, where <maven_coordinates> is the Coordinates field found when clicking on the installed artifact on the Libraries tab in the Databricks Clusters UI. Here's an example of an installed artifact:

In this example, you would add the following Spark environment variable:

IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URIS=maven:/com.github.immuta.hadoop.immuta-spark-third-party-maven-lib-test:2020-11-17-144644

For jar artifacts, the URI is the Source field found when clicking on the installed artifact on the Libraries tab in the Databricks Clusters UI. For artifacts installed from DBFS or S3, this ends up being the original URI to your artifact. For uploaded artifacts, Databricks will rename your .jar and put it in a directory in DBFS. Here's an example of an installed artifact:

In this example, you would add the following Spark environment variable:

IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URIS=dbfs:/immuta/bstabile/jars/immuta-spark-third-party-lib-test.jar

Once you've finished making your changes, restart the cluster.

Once the cluster is up, execute a command in a notebook. If the trusted library installation is successful, you should see driver log messages like this:

TrustedLibraryUtils: Successfully found all configured Immuta configured trusted libraries in Databricks.
TrustedLibraryUtils: Wrote trusted libs file to [/databricks/immuta/immutaTrustedLibs.json]: true.
TrustedLibraryUtils: Added trusted libs file with 1 entries to spark context.
TrustedLibraryUtils: Trusted library installation complete.

Project UDFs Cache Settings

This page outlines the configuration for setting up project UDFs, which allow users to set their current project in Immuta through Spark. For details about the specific functions available and how to use them, see the .

Use project UDFs in Databricks Spark

Currently, caches are not all invalidated outside of Databricks because Immuta caches information pertaining to a user's current project. Consequently, this feature should only be used in Databricks.

Lower the web service cache timeout in Immuta:
1. Click the App Settings icon and scroll to the HDFS Cache Settings section.
2. Lower the Cache TTL of HDFS user names (ms) to 0.
3. Click Save.
Raise the cache timeout on your Databricks cluster: In the Spark environment variables section, set the IMMUTA_CURRENT_PROJECT_CACHE_TIMEOUT_SECONDS and IMMUTA_PROJECT_CACHE_TIMEOUT_SECONDS to high values (like 10000).
Note: These caches will be invalidated on cluster when a user calls immuta.set_current_project, so they can effectively be cached permanently on cluster to avoid periodically reaching out to the web service.

DBFS Access

This page outlines how to enable access to DBFS in Databricks for non-sensitive data. Databricks administrators should place the desired configuration in the Spark environment variables.

DBFS FUSE mount

This Databricks feature mounts DBFS to the local cluster filesystem at /dbfs. Although disabled when using process isolation, this feature can safely be enabled if raw, unfiltered data is not stored in DBFS and all users on the cluster are authorized to see each other’s files. When enabled, the entirety of DBFS essentially becomes a scratch path where users can read and write files in /dfbs/path/to/my/file as though they were local files.

DBFS FUSE mount limitation: This feature cannot be used in environments with E2 Private Link enabled.

For example,

In Python,

Note: This solution also works in R and Scala.

Enable DBFS FUSE mount

To enable the DBFS FUSE mount, set this configuration in the Spark environment variables: IMMUTA_SPARK_DATABRICKS_DBFS_MOUNT_ENABLED=true.

Mounting a bucket

Users can that can also be accessed using the FUSE mount.
Mounting a bucket is a one-time action, and the mount will be available to all clusters in the workspace from that point on.
Mounting must be performed from a non-Immuta cluster.

Scala DBUtils (and %fs magic) with scratch paths

Scratch paths will work when performing arbitrary remote filesystem operations with fs magic or Scala dbutils.fs functions. For example,

Configure Scala DBUtils (and %fs magic) with scratch paths

To support %fs magic and Scala DBUtils with scratch paths, configure

Configure DBUtils in Python

To use dbutils in Python, set this configuration: immuta.spark.databricks.py4j.strict.enabled=false.

Example workflow

This section illustrates the workflow for getting a file from a remote scratch path, editing it locally with Python, and writing it back to a remote scratch path.

Get the file from remote storage:
Make a copy if you want to explicitly edit localScratchFile, as it will be read-only and owned by root:
Write the new file back to remote storage:

Troubleshooting

This page provides guidelines for troubleshooting issues with the Databricks Spark integration and resolving Py4J security and Databricks trusted library errors.

Debugging the integration

For easier debugging of the Databricks Spark integration, follow the recommendations below.

Enable cluster init script logging:
- In the cluster page in Databricks for the target cluster, navigate to Advanced Options -> Logging.
- Change the Destination from NONE to DBFS and change the path to the desired output location. Note: The unique cluster ID will be added onto the end of the provided path.
View the Spark UI on your target Databricks cluster: On the cluster page, click the Spark UI tab, which shows the Spark application UI for the cluster. If you encounter issues creating Databricks data sources in Immuta, you can also view the JDBC/ODBC Server portion of the Spark UI to see the result of queries that have been sent from Immuta to Databricks.

Using the validation and debugging notebook

The validation and debugging notebook is designed to be used by or under the guidance of an Immuta support professional. Reach out to your Immuta representative for assistance.

Import the notebook into a Databricks workspace by navigating to Home in your Databricks instance.
Click the arrow next to your name and select Import.
Once you have executed commands in the notebook and populated it with debugging information, export the notebook and its contents by opening the File menu, selecting Export, and then selecting DBC Archive.

Py4J security error

Error Message: py4j.security.Py4JSecurityException: Constructor <> is not allowlisted
Explanation: This error indicates you are being blocked by Py4J security rather than the Immuta Security Manager. Py4J security is strict and generally ends up blocking many ML libraries.
Solution: Turn off Py4J security on the offending cluster by setting IMMUTA_SPARK_DATABRICKS_PY4J_STRICT_ENABLED=false in the environment variables section. Additionally, because there are limitations to the security mechanisms Immuta employs on-cluster when Py4J security is disabled, ensure that all users on the cluster have the same level of access to data, as users could theoretically see (policy-enforced) data that other users have queried.

Databricks trusted library errors

Check the driver logs for details. Some possible causes of failure include

One of the Immuta-configured trusted library URIs does not point to a Databricks library. Check that you have configured the correct URI for the Databricks library.
For trusted Maven artifacts, the URI must follow this format: maven:/group.id:artifact-id:version.
Databricks failed to install a library. Any Databricks library installation errors will appear in the Databricks UI under the Libraries tab.

Reference Guides

Databricks Spark Integration Configuration

The Immuta offers for Databricks.

In this integration, Immuta installs an Immuta-maintained Spark plugin on your Databricks cluster. When a user queries data that has been registered in Immuta as a data source, the plugin injects policy logic into the plan Spark builds so that the results returned to the user only include data that specific user should see.

The reference guides in this section are written for Databricks administrators who are responsible for setting up the integration, securing Databricks clusters, and setting up users:

: This guide includes information about what Immuta creates in your Databricks environment and securing your Databricks clusters.
: Consult this guide for information about customizing the Databricks Spark integration settings.
: Consult this guide for information about connecting data users and setting up user impersonation.
: This guide provides a list of Spark environment variables used to configure the integration.
: This guide describes and how to configure them to reduce the risk that a user has overrides set to a cluster (or multiple clusters) that aren't currently up.

Setting Up Users

When the Databricks Spark plugin is running on a Databricks cluster, all Databricks users running jobs or queries are either a privileged user or a non-privileged user:

Privileged users: Privileged users can effectively read from and write to any table or view in the cluster Metastore, or any file path accessible by the cluster, without restriction. Privileged users are either Databricks workspace admins or users specified in IMMUTA_SPARK_ACL_ALLOWLIST. Any user writing queries or jobs impersonating another user is a non-privileged user, even if they are impersonating a privileged user.
Privileged users have effective authority to read from and write to any securable in the cluster metastore or file path, because in almost all cases Databricks clusters running with the Immuta Spark plug-in installed have disabled Hive metastore table access control. However, if Hive metastore table access control is enabled on the cluster, privileged users will have the authority granted to them that is specified by table access control.
Non-privileged users: Non-privileged users are any users who are not privileged users, and all authorization for non-privileged users is determined by Immuta policies.

Whether a user is a privileged user or a non-privileged user, for a given query or job, is cached once first determined, based on IMMUTA_SPARK_ACL_PRIVILEGED_TIMEOUT_SECONDS environment variable. This caching can be disabled entirely by setting the value of that environment variable to 0.

Mapping Databricks users to Immuta

Usernames in Databricks must match the usernames in the connected Immuta tenant. By default, the Immuta Spark plugin checks the Databricks username against the username within Immuta's internal IAM to determine access. However, you can integrate your existing IAM with Immuta and use that instead of the default internal IAM. Ideally, you should use the same identity manager for Immuta that you use for Databricks. See the Immuta support matrix page for a list of supported identity providers and protocols.

It is possible within Immuta to have multiple users share the same username if they exist within different IAMs. In this case, the cluster can be configured to look up users from a specified IAM. To do this, the value of theIMMTUA_USER_MAPPING_IAMID Spark environment variable must be updated to be the targeted IAM ID configured within the Immuta tenant. The targeted IAM ID can be found on the App settings page. Each Databricks cluster can only be mapped to one IAM.

User impersonation

Databricks user impersonation allows a Databricks user to impersonate an Immuta user. With this feature,

the Immuta user who is being impersonated does not have to have a Databricks account, but they must have an Immuta account.
the Databricks user who is impersonating an Immuta user does not have to be associated with Immuta. For example, this could be a service account.

When acting under impersonation, the Databricks user loses their privileged access, so they can only access the tables the Immuta user has access to and only perform DDL commands when that user is acting under an allowed circumstance (such as workspaces, scratch paths, or non-Immuta reads/writes).

Use the IMMUTA_SPARK_DATABRICKS_ALLOWED_IMPERSONATION_USERS Spark environment variable to enable user impersonation.

Scala clusters

Immuta discourages use of this feature with Scala clusters, as the proper security mechanisms were not built to account for user isolation limitations in Scala clusters. Instead, this feature was developed for the BI tool use case in which service accounts connecting to the Databricks cluster need to impersonate Immuta users so that policies can be enforced.

Ephemeral Overrides

In the context of the Databricks Spark integration, Immuta uses the term ephemeral to describe data sources where the associated compute resources can vary over time. This means that the compute bound to these data sources is not fixed and can change. All Databricks data sources in Immuta are ephemeral.

Ephemeral overrides are specific to each data source and user. They effectively bind cluster compute resources to a data source for a given user. Immuta uses these overrides to determine which cluster compute to use when connecting to Databricks for various maintenance operations.

The operations that use the ephemeral overrides include

Visibility checks on the data source for a particular user. These checks assess how to apply row-level policies for specific users.
Stats collection triggered by a specific user.
Validating a custom WHERE clause policy against a data source. When owners or governors create custom WHERE clause policies, Immuta uses compute resources to validate the SQL in the policy. In this case, the ephemeral overrides for the user writing the policy are used to contact a cluster for SQL validation.
High cardinality column detection. Certain advanced policy types (e.g., minimization) in Immuta require a high cardinality column, and that column is computed on data source creation. It can be recomputed on demand and, if so, will use the ephemeral overrides for the user requesting computation.

Triggering an ephemeral override request

An ephemeral override request can be triggered when a user queries the securable corresponding to a data source in a Databricks cluster with the Spark plug-in configured. The actual triggering of this request depends on the .

Ephemeral overrides can also be set for a data source in the Immuta UI by navigating to a data source page, clicking on the data source actions button, and selecting Ephemeral overrides from the dropdown menu.

Ephemeral override requests made from a cluster for data sources and users where ephemeral overrides were set in the UI will not be successful.

If ephemeral overrides are never set (either through the user interface or the cluster configuration), the system will continue to use the connection details directly associated with the data source, which are set during .

Configuring overrides in Immuta-enabled clusters

Ephemeral overrides can be problematic in environments that have a dedicated cluster to handle maintenance activities, since ephemeral overrides can cause these operations to execute on a different cluster than the dedicated one.

To reduce the risk that a user has overrides set to a cluster (or multiple clusters) that aren't currently up, complete one of the following actions:

Direct all clusters' HTTP paths for overrides to a cluster dedicated for metadata queries using the .
Disable ephemeral overrides completely by setting the to false.

Ephemeral overrides best practices

Disable ephemeral overrides for clusters when using multiple workspaces and dedicate a single cluster to serve queries from Immuta in a single workspace.
If you use multiple E2 workspaces without disabling ephemeral overrides, avoid applying the where user row-level policy to data sources.

Accessing Data

Once a Databricks securable is registered in Immut as a data source and you are subscribed to that data source, you must access that data through SQL:

df = spark.sql("select * from immuta.table")

import org.apache.spark.sql.SparkSession

val spark = SparkSession
  .builder()
  .appName("Spark SQL basic example")
  .config("spark.some.config.option", "some-value")
  .getOrCreate()
val sqlDF = spark.sql("SELECT * FROM immuta.table")

%sql
select * from immuta.table

library(SparkR)
df <- SparkR::sql("SELECT * from immuta.table")

With R, you must load the SparkR library in a cell before accessing the data.

See the sections below for more guidance on accessing data using Delta Lake, direct file reads in Spark for file paths, and user impersonation.

Delta Lake

When using Delta Lake, the API does not go through the normal Spark execution path. This means that Immuta's Spark extensions do not provide protection for the API. To solve this issue and ensure that Immuta has control over what a user can access, the Delta Lake API is blocked.

Spark SQL can be used instead to give the same functionality with all of Immuta's data protections. See the Delta API reference guide for a list of corresponding Spark SQL calls to use.

Spark direct file reads

In addition to supporting direct file reads through workspace and scratch paths, Immuta allows direct file reads in Spark for file paths. As a result, users who prefer to interact with their data using file paths or who have existing workflows revolving around file paths can continue to use these workflows without rewriting those queries for Immuta.

When reading from a path in Spark, the Immuta Databricks Spark plugin queries the Immuta Web Service to find Databricks data sources for the current user that are backed by data from the specified path. If found, the query plan maps to the Immuta data source and follows existing code paths for policy enforcement.

Users can read data from individual parquet files in a sub-directory and partitioned data from a sub-directory (or by using a where predicate). Expand the blocks below to view examples of reading data using these methods.

Read data from an individual parquet file

To read from an individual file, load a partition file from a sub-directory:

spark.read.format("parquet").load("s3:/my_bucket/path/to/my_parquet_table/partition_column=01/my_file.parquet")

Read partitioned data from a sub-directory

To read partitioned data from a sub-directory, load a parquet partition from a sub-directory:

spark.read.format("parquet").load("s3:/my_bucket/path/to/my_parquet_table/partition_column=01")

Alternatively, load a parquet partition using a where predicate:

spark.read.format("parquet").load("s3:/my_bucket/path/to/my_parquet_table").where("partition_column=01")Read partitioned data from a sub-directory

Limitations

Direct file reads for Immuta data sources only apply to data sources created from tables, not data sources created from views or queries.
If more than one data source has been created for a path, Immuta will use the first valid data source it finds. It is therefore not recommended to use this integration when more than one data source has been created for a path.
In Databricks, multiple input paths are supported as long as they belong to the same data source.
CSV-backed tables are not currently supported.

Loading a delta partition from a sub-directory is not recommended by Spark and is not supported in Immuta. Instead, use a where predicate:

# Not recommended by Spark and not supported in Immuta
spark.read.format("delta").load("s3:/my_bucket/path/to/my_delta_table/partition_column=01")

# Recommended by Spark and supported in Immuta.
spark.read.format("delta").load("s3:/my_bucket/path/to/my_delta_table").where("partition_column=01")

User impersonation

User impersonation allows Databricks users to query data as another Immuta user. To impersonate another user, see the User impersonation page.

Delta Lake API

Spark SQL can be used instead to give the same functionality with all of Immuta's data protections.

Requests

Below is a table of the Delta Lake API with the Spark SQL that may be used instead.

Delta Lake API

Spark SQL

DeltaTable.convertToDelta

CONVERT TO DELTA parquet./path/to/parquet/

DeltaTable.delete

DELETE FROM [table_identifier delta./path/to/delta/] WHERE condition

DeltaTable.generate

GENERATE symlink_format_manifest FOR TABLE [table_identifier delta./path/to/delta]

DeltaTable.history

DESCRIBE HISTORY [table_identifier delta./path/to/delta] (LIMIT x)

DeltaTable.merge

MERGE INTO

DeltaTable.update

UPDATE [table_identifier delta./path/to/delta/] SET column = valueWHERE (condition)

DeltaTable.vacuum

VACUUM [table_identifier delta./path/to/delta]

See here for a complete list of the Delta SQL Commands.

Merging tables in workspaces

When a table is created in a project workspace, you can merge a different Immuta data source from that workspace into that table you created.

Create a table in the project workspace.
Create a temporary view of the Immuta data source you want to merge into that table.
Use that temporary view as the data source you add to the project workspace.

Run the following command:

MERGE INTO delta_native.target_native as target
USING immuta_temp_view_data_source as source
ON target.dr_number = source.dr_number
WHEN MATCHED THEN
UPDATE SET target.date_reported = source.date_reported

Databricks Unity Catalog

This integration allows you to manage and access data in your Databricks account across all of your workspaces. With Immuta’s Databricks Unity Catalog integration, you can write your policies in Immuta and have them enforced automatically by Databricks across data in your Unity Catalog metastore.

Getting started

This getting started guide outlines how to integrate Databricks Unity Catalog with Immuta.

How-to guides

Databricks Unity Catalog configuration: Configure the Databricks Unity Catalog integration.
Migrate to Databricks Unity Catalog: Migrate from the legacy Databricks Spark integrations to the Databricks Unity Catalog integration.

Reference guide

Databricks Unity Catalog integration reference guide: This guide describes the design and components of the integration.

How-to Guides

Migrating to Unity Catalog

When you enable Unity Catalog, Immuta automatically migrates your existing Databricks data sources in Immuta to reference the legacy hive_metastore catalog to account for Unity Catalog's . New data sources will reference the Unity Catalog metastore you create and attach to your Databricks workspace.

Because the hive_metastore catalog is not managed by Unity Catalog, existing data sources in the hive_metastore cannot have Unity Catalog access controls applied to them. .

To allow Immuta to administer Unity Catalog access controls on that data, move the data to Unity Catalog and re-register those tables in Immuta by completing the steps below. If you don't move all data before configuring the integration, will protect your existing data sources throughout the migration process.

Ensure that all Databricks clusters that have Immuta installed are stopped and the Immuta configuration is removed from the cluster. Immuta-specific cluster configuration is no longer needed with the Databricks Unity Catalog integration.
Move all data into Unity Catalog before configuring Immuta with Unity Catalog. Existing data sources will need to be re-created after they are moved to Unity Catalog and the Unity Catalog integration is configured.
.

MariaDB

Immuta policies will not be automatically enforced in MariaDB

While you can author and apply subscription and data policies on MariaDB data sources within Immuta, these policies will not be enforced natively in the MariaDB platform. You can use to be notified about changes to user access and make appropriate access updates in MariaDB using your own process.

To use this connection, contact your Immuta representative.

The MariaDB connection registers data from MariaDB in Immuta.

How-to guide

Reference guide

: This guide describes the design and components of the connection.

Register a MariaDB Connection

Immuta policies will not be automatically enforced in MariaDB

While you can author and apply subscription and data policies on MariaDB data sources within Immuta, these policies will not be enforced natively in the MariaDB platform. You can use Immuta webhooks to be notified about changes to user access and make appropriate access updates in MariaDB using your own process.

To use this connection, contact your Immuta representative.

Requirement

Amazon RDS for MariaDB

Permissions

The user registering the connection must have the permissions below.

APPLICATION_ADMIN Immuta permission
The MariaDB user registering the connection and running the script must be the root user or have the GRANT OPTION MariaDB privilege.

Create a database user account

Create a new database user in MariaDB to serve as the Immuta system account. Immuta will use this system account continuously to crawl the database you register. How you create this user depends on your database authentication method. Follow the instructions linked below to create this user:
1. Password authentication: Follow the MariaDB documentation to create the database user in and assign that user a password.
2. IAM database authentication:
Grant this account the following MariaDB privileges. A sample command that provides all these privileges to all databases and views is provided below:
1. SHOW DATABASES on all databases in the server
2. SELECT on all databases, tables, and views in the server
3. SHOW VIEW on all views in the server
```
GRANT SELECT, SHOW DATABASES, SHOW VIEW ON *.* TO ''@'%';
```

Register a MariaDB connection

In Immuta, click Data and select Connections in the navigation menu.
Click the + Add Connection button.
Select the MariaDB tile.
Select RDS as the deployment method.
Enter the host connection information:
1. Display Name: This is the name of your new connection. This name will be used in the API (connectionKey), in data source names from the host, and on the connections page.
2. Hostname: URL of your MariaDB instance.
3. Port: Port configured with MariaDB.
4. Region: The region of the AWS account with your MariaDB instance.
Select an authentication method from the dropdown menu.
1. AWS Access Key: Provide the access key ID and secret access key for the database account you created above.
2. AWS Assumed Role (recommended): Immuta will assume this IAM role from Immuta's AWS account to request that it can use to perform operations in the registered MariaDB database. Before proceeding, contact your Immuta representative and provide your service principal's IAM role. Immuta will allowlist the service principal so that Immuta can successfully assume that role. Your Immuta representative will provide the account to add to your trust relationship. Then, complete the steps below.
  1. Enter the Role ARN of the database account you created above.
  2. Set the external ID provided in a condition on the trust relationship for the role specified above. See the AWS documentation for guidance.
3. Username and Password: Enter the credentials for the MariaDB database user account you created above.
Click Save connection.

Oracle

Immuta policies will not be automatically enforced in Oracle

While you can author and apply subscription and data policies on Oracle data sources within Immuta, these policies will not be enforced natively in the Oracle platform. You can use to be notified about changes to user access and make appropriate access updates in Oracle using your own process.

To use this connection, contact your Immuta representative.

The Oracle connection registers data from Oracle in Immuta.

How-to guide

Reference guide

: This guide describes the design and components of the connection.

Register an Oracle Connection

Immuta policies will not be automatically enforced in Oracle

While you can author and apply subscription and data policies on Oracle data sources within Immuta, these policies will not be enforced natively in the Oracle platform. You can use Immuta webhooks to be notified about changes to user access and make appropriate access updates in Oracle using your own process.

To use this connection, contact your Immuta representative.

Requirement

Amazon RDS for Oracle

Permissions

The user registering the connection must have the permissions below.

APPLICATION_ADMIN Immuta permission
The Oracle user running the script must have either of the following system privileges:
- GRANT ANY ROLE
- GRANT ANY PRIVILEGE

Create the database user

Create a new database user in Oracle to serve as the Immuta system account. Immuta will use this system account continuously to crawl the connection.
Grant this account the SELECT Oracle privilege on the system views listed below:
- V$DATABASE
- CDB_PDBS
- SYS.DBA_USERS
- SYS.DBA_TABLES
- SYS.DBA_VIEWS
- SYS.DBA_MVIEWS
- SYS.DBA_TAB_COLUMNS
- SYS.DBA_OBJECTS
- SYS.DBA_CONSTRAINTS
- SYS.DBA_CONS_COLUMNS

Register an Oracle connection

In Immuta, click Data and select Connections in the navigation menu.
Click the + Add Connection button.
Select the Oracle tile.
Select RDS as the deployment method.
Enter the host connection information:
1. Display Name: This is the name of your new connection. This name will be used in the API (connectionKey), in data source names from the host, and on the connections page.
2. Hostname: URL of your Oracle instance.
3. Port: Port configured for Oracle.
4. Database: The Oracle database you want to connect to. All databases in the host will be registered.
5. Region: The region of the AWS account with your Oracle instance.
Enter the username and password of the Oracle database user you created above.
Click Save connection.

PostgreSQL

Public preview: This feature is available to all accounts. Contact your Immuta representative for details.

The PostgreSQL connection registers data from PostgreSQL in Immuta and enforces subscription policies on that data.

This getting started guide outlines how to connect PostgreSQL to Immuta.

How-to guide

Reference guides

: This guide describes the design and components of the connection.
: This guide provides an overview of the Immuta features that provide security for your users and that allow you to prove compliance and monitor for anomalies.
: This guide provides an overview of how to protect securables with Immuta policies.
: This guide provides an overview of how PostgreSQL users access data registered in Immuta.

Getting Started with PostgreSQL

Public preview: This feature is available to all accounts. Contact your Immuta representative for details.

The how-to guides linked on this page illustrate how to use PostgreSQL with Immuta. See the reference guide for information about the PostgreSQL connection.

Connect your technology

These guides provide instructions on getting your data set up in Immuta for the Marketplace and Governance apps.

Register your PostgreSQL connection: Using a single setup process, connect PostgreSQL to Immuta. This will register your data objects in Immuta and allow you to start dictating access through Marketplace or global policies.
Organize your data sources into domains and assign domain permissions to accountable teams: Use domains to segment your data and assign responsibilities to the appropriate team members. These domains will then be used in Marketplace, policies, audit, and identification.

Register your users

These guides provide instructions on getting your users set up in Immuta for the Marketplace and Governance apps.

Connect an IAM: Bring the IAM your organization already uses and allow Immuta to register your users for you.
Map external user IDs to Immuta: Ensure the user IDs in Immuta and your data platform are aligned so that the right policies impact the right users. This step can be completed during initial configuration of your IAM or after it has been connected to Immuta.

Start using Marketplace

These guides provide instructions on using Marketplace for the first time.

Publish a data product: Once you register your tables and users, you can immediately start publishing data products in Marketplace.
Request access to a data product: Users must then request access to your data products in Marketplace.
Respond to an access request: To grant access to a data product and its tables, respond to the access request.

Add data metadata

These guides provide instructions on getting your data metadata set up in Immuta for the Governance app.

Connect an external catalog: Bring the external catalog your organization already uses and allow Immuta to continually sync your tags with your data sources for you.
Run identification: Identification allows you to automate data tagging using identifiers that detect certain data patterns.

Start using the Governance app

These guides provide instructions on using the Governance app for the first time.

Author a global subscription policy: Once you add your data metadata to Immuta, you can immediately create policies that utilize your tags and apply to your tables. Subscription policies can be created to dictate access to data sources.
Configure audit: Once you have your data sources and users, and policies granting them access, you can set up audit export. This will export the audit logs from policy changes and tagging updates.

Reference Guides

Security and Compliance

Immuta offers several features to provide security for your users and to prove compliance and monitor for anomalies.

Security

Data processing and encryption

See the and the guides for information about transmission of policy decision data, encryption of data in transit and at rest, and encryption key management.

Authentication

Registering the connection

The PostgreSQL connection supports the following authentication methods to register a connection:

Amazon Aurora and Amazon RDS deployments
- Access using AWS IAM role (recommended): Immuta will assume this IAM role from Immuta's AWS account when interacting with the AWS API to perform any operations in your AWS account. This option allows you to provide Immuta with an IAM role from your AWS account that is granted a trust relationship with Immuta's IAM role.
- Access using access key and secret access key: These credentials are used temporarily by Immuta to register the connection. The access key ID and secret access key provided must be for an AWS account with the permissions listed in the .
Neon and PostgreSQL deployments
- Username and password: These credentials are used temporarily by Immuta to register the connection. The credentials provided must be for an account with the permissions listed in the .

Identity providers for user authentication

See the for a list of supported providers and details.

See the for details about user provisioning and mapping user accounts to Immuta.

Auditing and compliance

Immuta provides governance reports so that data owners and governors can monitor users' access to data and detect anomalies in behavior.

See the page for a list of report types and guidance.

Protecting Data

In the PostgreSQL connection, Immuta administers PostgreSQL privileges on data registered in Immuta. Then, Immuta users who have been granted access to the tables can query them with policies enforced.

The sequence diagram below outlines the events that occur when an Immuta user who is subscribed to a data source queries it in PostgreSQL.

Registering a connection

PostgreSQL is configured and data is registered through connections, an Immuta feature that allows administrators to register data objects in a technology through a single connection to make data registration more scalable for your organization.

Once the PostgreSQL connection is registered, you can author subscription policies in Immuta to enforce access controls.

See the PostgreSQL connection reference guide for more details about registering a connection.

Protecting data

After tables are registered in Immuta, you can author subscription policies in Immuta to enforce access controls.

When a policy is applied to a data source, users who meet the conditions of the policy will be . Then, Immuta issues a SQL statement in PostgreSQL that grants the SELECT privilege to users on those tables.

Consider the following example that illustrates how Immuta enforces a subscription policy that only allows users in the analysts group to access the yellow-table. When this policy is authored and applied to the data source, Immuta issues a SQL statement in PostgreSQL that grants the SELECT privilege on yellow-table to users (registered in Immuta) that are part of the analysts group.

In the image above, the user in the analysts group accesses yellow-table , while the user who is a part of the research group is denied access. See the Author a subscription policy page for guidance on applying a subscription policy to a data source. See the Subscription policy access types page for details about the subscription policy types supported and PostgreSQL privileges Immuta grants on tables registered as Immuta data sources.

Accessing Data

Once data is registered through the PostgreSQL connection, you will access your data through your PostgreSQL client as you normally would. If you are subscribed to the data source, Immuta grants you access to the data in PostgreSQL.

When you submit a query, the PostgreSQL client submits the SQL query to the PostgreSQL server, which then processes the query and determines what data your role is allowed to see. Then, the PostgreSQL server queries the database and returns the query results to the PostgreSQL client, which then returns policy-enforced data to you.

The diagram below illustrates how Immuta, the PostgreSQL server, and PostgreSQL client interact to access data.

Snowflake

Immuta manages access to Snowflake tables by administering Snowflake row access policies and column masking policies on those tables, allowing users to query tables directly in Snowflake while dynamic policies are enforced.

Getting started

This getting started guide outlines how to integrate your Snowflake account with Immuta.

How-to guides

Configure a Snowflake integration: Configure the Snowflake integration.
Edit or remove an existing integration: Manage integration settings or delete your existing Snowflake integration.
Integration settings:
- Enable Snowflake table grants: Enable Snowflake table grants and configure the Snowflake role prefix.
- Use Snowflake data sharing with Immuta: Use Snowflake data sharing with table grants or project workspaces.
- Snowflake low row access policy mode: Enable Snowflake low row access policy mode.
- Snowflake lineage tag propagation: Configure your Snowflake integration to automatically apply tags added to a Snowflake table to its descendant data source columns in Immuta.

Reference guides

Snowflake integration reference guide: This reference guide describes the design and features of the Snowflake integration.
Snowflake table grants: Snowflake table grants simplifies the management of privileges in Snowflake when using Immuta. Instead of manually granting users access to tables registered in Immuta, you allow Immuta to manage privileges on your Snowflake tables and views according to subscription policies. This guide describes the components of Snowflake table grants and how they are used in Immuta's Snowflake integration.
Snowflake data sharing with Immuta: Organizations can share the policy-protected data of their Snowflake database with other Snowflake accounts with Immuta policies enforced in real time. This guide describes the components of using Immuta with Snowflake data shares.
Snowflake low row access policy mode: The Snowflake low row access policy mode improves query performance in Immuta's Snowflake integration. To do so, this mode decreases the number of Snowflake row access policies Immuta creates and uses table grants to manage user access. This guide describes the design and requirements of this mode.
Snowflake lineage tag propagation: Snowflake column lineage specifies how data flows from source tables or columns to the target tables in write operations. When Snowflake lineage tag propagation is enabled in Immuta, Immuta automatically applies tags added to a Snowflake table to its descendant data source columns in Immuta so you can build policies using those tags to restrict access to sensitive data.
Warehouse sizing recommendations: Adjust the size and scale of clusters for your warehouse to manage workloads so that you can use Snowflake compute resources the most cost effectively.

Explanatory guide

Phased Snowflake onboarding: A phased onboarding approach to configuring the Snowflake integration ensures that your users will not be immediately affected by changes as you add data sources and policies. This guide describes the settings and requirements for implementing this phased approach.

Getting Started with Snowflake

The how-to guides linked on this page illustrate how to integrate Snowflake with Immuta. See the reference guide for information about the Snowflake integration.

Requirements

Snowflake enterprise edition
Access to a Snowflake account that can create a Snowflake user

Connect your technology

These guides provide instructions on getting your data set up in Immuta for the Marketplace and Governance apps.

Register your Snowflake connection: Using a single setup process, connect Snowflake to Immuta. This will register your data objects into Immuta and allow you to start dictating access through Marketplace or global policies.
Organize your data sources into domains and assign domain permissions to accountable teams: Use domains to segment your data and assign responsibilities to the appropriate team members. These domains will then be used in Marketplace, policies, audit, and identification.

Connections are available on all tenants created after February 26, 2025. If you do not have connections enabled on your tenant, configure Snowflake and register data sources using the legacy workflow.

Register your users

These guides provide instructions on getting your users set up in Immuta for the Marketplace and Governance apps.

Connect an IAM: Bring the IAM your organization already uses and allow Immuta to register your users for you.
Map external user IDs from Snowflake to Immuta: Ensure the user IDs in Immuta, Snowflake, and your IAM are aligned so that the right policies impact the right users.

Start using Marketplace

These guides provide instructions on using Marketplace for the first time.

Publish a data product: Once you register your tables and users, you can immediately start publishing data products in Marketplace.
Request access to a data product: Users must then request access to your data products in Marketplace.
Respond to an access request: To grant access to a data product and its tables, respond to the access request.

Add data metadata

These guides provide instructions on getting your data metadata set up in Immuta for the Governance app.

Connect an external catalog: Bring the external catalog your organization already uses and allow Immuta to continually sync your tags with your data sources for you.
Run identification: Identification allows you to automate data tagging using identifiers that detect certain data patterns.

Start using the Governance app

These guides provide instructions on using the Governance app for the first time.

Author a global subscription policy: Once you add your data metadata to Immuta, you can immediately create policies that utilize your tags and apply to your tables. Subscription policies can be created to dictate access to data sources.
Author a global data policy: Data metadata can also be used to create data policies that apply to data sources as they are registered in Immuta. Data policies dictate what data a user can see once they are granted access to a data source. Using catalog and identification tags you can create proactive policies, knowing that they will apply to data sources as they are added to Immuta with the automated tagging.
Configure audit: Once you have your data sources and users, and policies granting them access, you can set up audit export. This will export the audit logs from user queries, policy changes, and tagging updates.

How-to Guides

Edit or Remove Your Snowflake Integration

To edit or remove a Snowflake integration, you have two options:

Automatic: Grant Immuta one-time use of credentials with the following privileges to automatically edit or remove the integration:
- CREATE DATABASE ON ACCOUNT WITH GRANT OPTION
- CREATE ROLE ON ACCOUNT WITH GRANT OPTION
- CREATE USER ON ACCOUNT WITH GRANT OPTION
- MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION
Manual: Run the Immuta script in your Snowflake environment as a user with the following privileges to edit or remove the integration:
- CREATE DATABASE ON ACCOUNT WITH GRANT OPTION
- CREATE ROLE ON ACCOUNT WITH GRANT OPTION
- CREATE USER ON ACCOUNT WITH GRANT OPTION
- MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION
- APPLY MASKING POLICY ON ACCOUNT WITH GRANT OPTION
- APPLY ROW ACCESS POLICY ON ACCOUNT WITH GRANT OPTION

Edit a Snowflake integration

Select one of the following options for editing your integration:

Automatic: Grant Immuta one-time use of credentials to automatically edit the integration.
Manual: Run the Immuta script in your Snowflake environment yourself to edit the integration.

Automatic edit

Click the App Settings icon in the navigation menu.
Click the Integrations tab and click the down arrow next to the Snowflake integration.
Edit the field you want to change or check a checkbox of a feature you would like to enable. Note any field shadowed is not editable, and the integration must be disabled and re-installed to change it.
From the Select Authentication Method Dropdown, select either Username and Password or Key Pair Authentication:
- Username and Password option: Complete the Username, Password, and Role fields.
- Key Pair Authentication option:
  1. Complete the Username field.
  2. Click Key Pair (Required), and upload a Snowflake key pair file.
  3. Complete the Role field.
Click Save.

Manual edit

Click the App Settings icon in the navigation menu.
Click the Integrations tab and click the down arrow next to the Snowflake integration.
Edit the field you want to change or check a checkbox of a feature you would like to enable. Note any field shadowed is not editable, and the integration must be disabled and re-installed to change it.
Click edit script to download the script, and then run it in Snowflake.
Click Save.

Remove a Snowflake integration

Select one of the following options for deleting your integration:

Automatic: Grant Immuta one-time use of credentials to automatically remove the integration and Immuta-managed resources from your Snowflake environment.
Manual: Run the Immuta script in your Snowflake environment yourself to remove Immuta-managed resources and policies from Snowflake.

Automatic removal

Click the App Settings icon in the navigation menu.
Click the Integrations tab and click the down arrow next to the Snowflake integration.
Click the checkbox to disable the integration.
Enter the Username, Password, and Role that was entered when the integration was configured.
Click Save.

Manual removal

Cleaning up your Snowflake environment Until you manually run the cleanup script in your Snowflake environment, Immuta-managed roles and Immuta policies will still exist in Snowflake.

Click the App Settings icon in the navigation menu.
Click the Integrations tab and click the down arrow next to the Snowflake integration.
Click the checkbox to disable the integration.
Click cleanup script to download the script.
Click Save.
Run the cleanup script in Snowflake.

Integration Settings

Snowflake Table Grants Private Preview Migration

To migrate from the private preview version of table grants (available before September 2022) to the GA version, complete the steps below.

Navigate to the App Settings page.
Scroll to the Global Integrations Settings section.
Uncheck the Snowflake Table Grants checkbox to disable the feature.
Click Save. Wait for about 1 minute per 1000 users. This gives time for Immuta to drop all the previously created user roles.
Use the Enable Snowflake table grants tutorial to re-enable the feature.

Enable Snowflake Table Grants

Navigate to the App Settings page.
Scroll to the Global Integrations Settings section.
Ensure the Snowflake Table Grants checkbox is checked. It is enabled by default.
Opt to change the Role Prefix. Snowflake table grants creates a new Snowflake role for each Immuta user. To ensure these Snowflake role names do not collide with existing Snowflake roles, each Snowflake role created for Snowflake table grants requires a common prefix. When using multiple Immuta accounts within a single Snowflake account, the Snowflake table grants role prefix should be unique for each Immuta account. The prefix must adhere to and be less than 50 characters. Once the configuration is saved, the prefix cannot be modified; however, the Snowflake table grants feature can be disabled and re-enabled to change the prefix.
Finish configuring your integration by following one of these guidelines:
- New Snowflake integration: Set up a new Snowflake integration by following the .
- Existing Snowflake integration (automatic setup): You will be prompted to enter connection information for a Snowflake user. Immuta will execute the migration to Snowflake table grants using a connection established with this Snowflake user. The Snowflake user you provide here must have Snowflake privileges to run these .
- Existing Snowflake integration (manual setup): Immuta will display a link to a migration script you must run in Snowflake and a link to a rollback script for use in the event of a failed migration. Important: Execute the migration script in Snowflake before clicking Save on the app settings page.

Snowflake table grants private preview migration

To migrate from the private preview version of Snowflake table grants (available before September 2022) to the generally available version of Snowflake table grants, follow the steps in the .

Using Snowflake Data Sharing with Immuta

Immuta is compatible with Snowflake Secure Data Sharing. Using both Immuta and Snowflake, organizations can share the policy-protected data of their Snowflake database with other Snowflake accounts with Immuta policies enforced in real time.

Prerequisites:

Create Immuta Policies to Protect the Data

Required Permission: Immuta: GOVERNANCE

Build Immuta data policies to fit your organization's compliance requirements.

It's important to understand that subscription policies are not relevant to Snowflake data shares, because the act of sharing the data is the subscription policy. Data policies can be enforced on the consuming account from the producer account on a share following these instructions.

Register the Snowflake Data Consumer with Immuta

Required Permission: Immuta: USER_ADMIN

To register the Snowflake data consumer in Immuta,

Create a new Immuta user.
Update the Immuta user's Snowflake username to match the account ID for the data consumer. This value is the output on the data consumer side when SELECT CURRENT_ACCOUNT() is run in Snowflake.
Give the Immuta user the appropriate attributes and groups for your organization's policies.
Subscribe the Immuta user to the data sources.

Required Permission: Snowflake ACCOUNTADMIN

To share the policy-protected data source,

Create a Snowflake Data Share of the Snowflake table that has been registered in Immuta.
Grant reference usage on the Immuta database to the share you created:
```
GRANT REFERENCE_USAGE ON DATABASE "<Immuta database of the provider account>" TO SHARE "<DATA_SHARE>";
```
Replace the content in angle brackets above with the name of your Immuta database and Snowflake data share.

Enable Snowflake Low Row Access Policy Mode

If you have Snowflake low row access policy mode enabled in private preview and have impersonation enabled, see these . Otherwise, query performance will be negatively affected.

Click the App Settings icon in the navigation menu and scroll to the Global Integration Settings section.
Click the Enable Snowflake Low Row Access Policy Mode checkbox to enable the feature.
Confirm to allow Immuta to automatically disable impersonation for the Snowflake integration. If you do not confirm, you will not be able to enable Snowflake low row access policy mode.
Click Save.

Configure your Snowflake integration

If you already have a configured, you don't need to reconfigure your integration. Your Snowflake policies automatically refresh when you enable Snowflake low row access policy mode.

. Note that you will not be able to enable project workspaces or user impersonation with Snowflake low row access policy mode enabled.
Click Save and Confirm your changes.

Upgrade Snowflake Low Row Access Policy Mode

Prerequisites

This upgrade step is necessary if you meet both of the following criteria:

You have the Snowflake low row access policy mode enabled in private preview.
You have user impersonation enabled.

If you do not meet this criteria, follow the instructions on the configuration guide.

Upgrade to Snowflake low row access policy mode

To upgrade to the generally available version of the feature, disable your Snowflake integration on the app settings page and then re-enable it.

Configure Snowflake Lineage Tag Propagation

Private preview: This feature is available to select accounts. Contact your Immuta representative to enable this feature.

Contact your Immuta representative to enable this feature in your Immuta tenant.

Configure the Snowflake integration

Navigate to the App Setting page and click the Integration tab.
Click +Add Integration and select Snowflake from the dropdown menu.
Complete the Host, Port, and Default Warehouse fields.
Enable Query Audit.
Enable Lineage and complete the following fields:
- Ingest Batch Sizes: This setting configures the number of rows Immuta ingests per batch when streaming Access History data from your Snowflake instance.
- Table Filter: This filter determines which tables Immuta will ingest lineage for. Enter a regular expression that excludes / from the beginning and end to filter tables. Without this filter, Immuta will attempt to ingest lineage for every table on your Snowflake instance.
- Tag Filter: This filter determines which tags to propagate using lineage. Enter a regular expression that excludes / from the beginning and end to filter tags. Without this filter, Immuta will ingest lineage for every tag on your Snowflake instance.
Select Manual or Automatic Setup and

Trigger Snowflake lineage sync job

Prerequisite

Trigger the lineage job

The Snowflake lineage sync endpoint triggers the lineage ingestion job that allows Immuta to propagate Snowflake tags added through lineage to Immuta data sources.

Copy the example and replace the Immuta URL and API key with your own.
Change the payload attribute values to your own, where
- tableFilter (string): This regular expression determines which tables Immuta will ingest lineage for. Enter a regular expression that excludes / from the beginning and end to filter tables. Without this filter, Immuta will attempt to ingest lineage for every table on your Snowflake instance.
- batchSize (integer): This parameter configures the number of rows Immuta ingests per batch when streaming Access History data from your Snowflake instance. Minimum 1.
- lastTimestamp (string): Setting this parameter will only return lineage events later than the value provided. Use a format like 2022-06-29T09:47:06.012-07:00.

Next steps

Once the sync job is complete, you can complete the following steps:

Reference Guides

Snowflake Data Sharing with Immuta

Immuta is compatible with . Using both Immuta and Snowflake, organizations can share the policy-protected data of their Snowflake database with other Snowflake accounts with Immuta policies enforced in real time. This integration gives data consumers a live connection to the data and relieves data providers of the legal and technical burden of creating static data copies that leave their Snowflake environment.

Requirements:

Snowflake Enterprise Edition or higher
Immuta's

Configuration

This method requires that the data consumer account is registered as an Immuta user with the Snowflake user name equal to the consuming account.

At that point, the user that represents the account being shared with can have the appropriate attributes and groups assigned to them, relevant to the data policies that need to be enforced. Once that user has access to the share in the consuming account (not managed by Immuta), they can query the share with the data policies from the producer account enforced because Immuta is treating that account as if they are a single user in Immuta.

For a tutorial on this workflow, see the .

Benefits

Using Immuta with Snowflake Data Sharing allows the sharer to

Only need limited knowledge of the context or goals of the existing policies in place: Because the sharer is not editing or creating policies to share their data, they only need a limited knowledge of how the policies work. Their main responsibility is making sure they properly represent the attributes of the data consumer (the account being shared to).
Leave policies untouched.

Snowflake Low Row Access Policy Mode

Snowflake with low row access policy mode enabled will soon be required

Support for disabling this feature has been deprecated. You must have Snowflake low row access policy mode and enabled for your integration to continue working. Furthermore, (which require table grants to be disabled) will be unavailable. See the for EOL dates.

The Snowflake low row access policy mode improves query performance in Immuta's Snowflake integration by decreasing the number of Immuta creates and by using table grants to manage user access.

Immuta manages access to Snowflake tables by administering and on those tables, allowing users to query them directly in Snowflake while policies are enforced.

Without Snowflake low row access policy mode enabled, row access policies are created and administered by Immuta in the following scenarios:

are disabled and a subscription policy that does not automatically subscribe everyone to the data source is applied. Immuta administers Snowflake row access policies to filter out all the rows to restrict access to the entire table when the user doesn't have privileges to query it. However, if table grants are disabled and a subscription policy is applied that grants everyone access to the data source automatically, Immuta does not create a row access policy in Snowflake. See the for details about these policy types.
is applied to a data source. A row access policy filters out all the rows of the table if users aren't acting under the purpose specified in the policy when they query the table.
is applied to a data source. A row access policy filters out rows querying users don't have access to.
is enabled. A row access policy is created for every Snowflake table registered in Immuta.

Reducing row access policies

Snowflake low row access policy mode is enabled by default to reduce the number of row access policies Immuta creates and improve query performance. Snowflake low row access policy mode requires

.
user impersonation to be disabled. User impersonation diminishes the performance of interactive queries because of the number of row access policies Immuta creates when it's enabled.

Requirements

Project-scoped purpose exceptions for Snowflake with low row access policy mode enabled

Project-scoped purpose exceptions for Snowflake integrations allow you to apply to Snowflake data sources in a project. As a result, users can only access that data when they are working within that specific project.

Masked joins for Snowflake with low row access policy mode enabled

This feature allows masked columns to be joined across data sources that belong to the same project. When data sources do not belong to a project, Immuta uses a unique salt per data source for hashing to prevent masked values from being joined. (See the guide for an explanation of that behavior.) However, once you add Snowflake data sources to a project and enable masked joins, Immuta uses a consistent salt across all the data sources in that project to allow the join.

For more information about masked joins and enabling them for your project, see the of documentation.

Limitations and considerations

Project workspaces are not compatible with this feature.
Impersonation is not supported when the Snowflake low row access policy mode is enabled.

Warehouse Sizing Recommendations

The warehouse you select when configuring the Snowflake integration uses compute resources to set up the integration, register data sources, orchestrate policies, and run jobs like identification. Snowflake credit charges are based on the size of and amount of time the warehouse is active, not the number of queries run.

This document prescribes how and when to adjust the size and scale of clusters for your warehouse to manage workloads so that you can use Snowflake compute resources the most cost effectively.

In general, increase the size of and number of clusters for the warehouse to handle heavy workloads and multiple queries. Workloads are typically lighter after data sources are onboarded and policies are established in Immuta, so compute resources can be reduced after those workloads complete.

Integration and data source registration warehouse use

The Snowflake integration uses warehouse compute resources to sync policies created in Immuta to the Snowflake objects registered as data sources and, if configured, to run and . Follow the guidelines below to adjust the warehouse size and scale according to your needs.

Increase the of and of clusters for the warehouse during large policy syncs, updates, and changes.
Enable to optimize resource use in Snowflake. In the Snowflake UI, the lowest auto suspend time setting is 5 minutes. However, through SQL query, you can set auto_suspend to 61 seconds (since the minimum uptime for a warehouse is 60 seconds). For example,
Identification uses compute resources for each table it runs on. Consider when registering data sources if you have an or a tagging strategy in place.
Register data before creating global policies. Immuta does not apply a subscription policy on registered data unless an existing global policy applies to it, which allows Immuta to only pull metadata instead of also applying policies when data sources are created. Registering data before policies are created reduces the workload and the Snowflake compute resources needed.
Begin onboarding with a small dataset of tables, and then review and monitor query performance in the . Adjust the virtual warehouse accordingly to handle heavier loads.
uses the compute warehouse that was employed during the initial ingestion to periodically monitor the schema for changes. If you expect a low number of new tables or minimal changes to the table structure, consider scaling down the warehouse size.
Resize the warehouse after after data sources are registered and policies are established. For example,

For more details and guidance about warehouse sizing, see the .

Identifying bulk jobs and heavy workloads

Even after your integration is configured, data sources are registered, and policies are established, changes to those data sources or policies may initiate heavy workloads. Follow the guidelines below to adjust your warehouse size and scale according to your needs.

Review your to identify query performance and bottlenecks.
Check how many credits queries have consumed:
After reviewing query performance and cost, implement to adjust your warehouse.

Explanatory Guides

Phased Snowflake Onboarding

While you're onboarding Snowflake data sources and designing policies, you don't want to disrupt your Snowflake users' existing workflows. Instead, you want to gradually onboard Immuta through a series of successive changes that will not impact your existing Snowflake users.

A phased onboarding approach to configuring the Snowflake integration ensures that your users will not be immediately affected by changes as you add data sources and configure policies.

Several features allow you to gradually onboard data sources and policies in Immuta:

No subscription policies are applied at the time of data registration: No policy is applied at registration time unless an existing global policy applies to it; the table is registered in Immuta and waits for a policy to be applied, if ever.
There are several benefits to this design:
- All existing roles maintain access to the data and registration of the table or view with Immuta has zero impact on your data platform.
- It gives you time to configure tags on the Immuta registered tables and views, either manually or through automatic means, such as identification, or an external catalog integration to include Snowflake tags.
- It gives you time to assess and validate the sensitive data tags that were applied.
- You can build only row and column controls with Immuta and let your existing roles manage table access instead of using Immuta subscription policies for table access.
Snowflake table grants coupled with Snowflake low row access policy mode: With these features enabled, Immuta manages access to tables (subscription policies) through GRANTs. This works by assigning each user their own unique role created by Immuta and all table access is managed using that single role.
Without these two features enabled, Immuta uses a Snowflake row access policy (RAP) to manage table access. A RAP only allows users to access rows in the table if they were explicitly granted access through an Immuta subscription policy; otherwise, the user sees no rows. This behavior means all existing Snowflake roles lose access to the table contents until explicitly granted access through Immuta subscription policies. Essentially, roles outside of Immuta don't control access anymore.
By using table grants and the low row access policy mode, users and roles outside Immuta continue to work.
There are two benefits to this approach:
- All pre-existing Snowflake roles retain access to the data until you explicitly revoke access (outside Immuta).
- It provides a way to test that Immuta GRANTs are working without impacting production workloads.

Requirements

The following configuration is required for phased Snowflake onboarding:

Impersonation is disabled
Project workspaces are disabled

If either of these capabilities is necessary for your use case, you cannot do phased Snowflake onboarding as described below.

See the Getting started page for step-by-step guidance to implement phased Snowflake onboarding.

SQL Server

Immuta policies will not be automatically enforced in SQL Server

While you can author and apply subscription and data policies on SQL Server data sources within Immuta, these policies will not be enforced natively in the SQL Server platform. You can use Immuta webhooks to be notified about changes to user access and make appropriate access updates in SQL Server using your own process.

To use this connection, contact your Immuta representative.

The SQL Server connection registers data from SQL Server in Immuta.

How-to guide

Reference guide

SQL Server connection reference guide: This guide describes the design and components of the connection.

Register a SQL Server Connection

Immuta policies will not be automatically enforced in SQL Server

While you can author and apply subscription and data policies on SQL Server data sources within Immuta, these policies will not be enforced natively in the SQL Server platform. You can use to be notified about changes to user access and make appropriate access updates in SQL Server using your own process.

To use this connection, contact your Immuta representative.

Requirements

The requirements depend on your deployment type:

Supported Azure SQL Server versions:
- Azure SQL Database
- Azure SQL Managed Instance
- SQL Server on Azure VMs. Immuta supports the following:
  - SQL Server 2025 Preview
  - SQL Server 2022
  - SQL Server 2019
  - SQL Server 2017
  - SQL Server 2016
  - SQL Server 2014
  - SQL Server 2012
Supported SQL Server on Amazon RDS versions:
- SQL Server 2022 (16.0.4185.3)
- SQL Server 2019 (15.0.4430.1)
- SQL Server 2017 (14.0.3485.1)
- SQL Server 2016 (13.0.6455.2)

Permissions

The user registering the connection must have the permissions below.

APPLICATION_ADMIN Immuta permission
The user registering the connection must have the following system privileges depending on your deployment type:
- Azure SQL Server
- SQL Server on AWS RDS
  - master user

Create the system account user

Create a new system account user for Immuta. Immuta will use the credentials of this system user to connect to SQL Server, ingest the data objects, and continually crawl the registered connection. See instructions below based on your deployment method:

Azure SQL Server

Create a database user in your Azure SQL Server for Microsoft SQL DB instance. .
Grant this new account any of the privileges listed below to ensure it can access all databases and register them in Immuta:
- ALTER ANY DATABASE or the VIEW ANY DATABASE server-level permission, or CREATE DATABASE permission in the master database to allow the user to

SQL Server on Amazon RDS

Create a database user in your Amazon RDS for Microsoft SQL DB instance. .
Grant this new account the privileges listed below to ensure it can access all databases and register them in Immuta:
- ALTER ANY DATABASE or the VIEW ANY DATABASE server-level permission, or CREATE DATABASE permission in the master database to allow the user to

Register a SQL Server connection

In your SQL Server environment, create an Immuta database that Immuta can use to connect to your SQL Server instance to register the connection and maintain state with SQL Server.
Having this separate database for Immuta prevents custom ETL processes or jobs deleting the database you use to register the connection, which would break the connection.
In Immuta, click Data and select Connections in the navigation menu.
Click the + Add Connection button.
Select the SQL Server tile.
Select your deployment method:
1. Azure SQL Server
2. RDS
3. Self-Managed
Enter the host connection information:
1. Display Name: This is the name of your new connection. This name will be used in the API (connectionKey), in data source names from the host, and on the connections page.
2. Hostname: URL of your SQL Server instance.
3. Port: Port configured for SQL Server.
4. Database: The SQL Server database you created for Immuta. All databases in the host will be registered.
Select an authentication method from the dropdown menu:
1. Username and Password: Enter the credentials of the .
2. Azure AD Access Token: Enter the and credentials of the .
Click Save connection.

Starburst (Trino)

In this integration, Immuta policies are translated into Starburst rules and permissions and applied directly to tables within users’ existing catalogs.

Getting started

This guide outlines how to integrate Starburst with Immuta.

How-to guides

Starburst (Trino) integration configuration guide: Configure the integration in Immuta.
Map read and write access policies to Starburst (Trino) privileges: Configure how read and write access subscription policies translate to Starburst (Trino) privileges and apply to Starburst (Trino) data sources.

Reference guide

Starburst (Trino) integration reference guide: This guide describes the design and components of the integration.

Getting Started with Starburst (Trino)

The how-to guides linked on this page illustrate how to integrate Starburst (Trino) with Immuta. See the for information about the Starburst (Trino) integration.

Connect your technology

These guides provide instructions on getting your data set up in Immuta for the Marketplace and Governance apps.

: Install the Immuta Starburst (Trino) plugin in Starburst or Trino so that policies can be applied to data objects.
: This will register your data objects into Immuta and allow you to start dictating access through Marketplace or global policies.
: Use domains to segment your data and assign responsibilities to the appropriate team members. These domains will then be used in Marketplace, policies, audit, and identification.

Register your users

These guides provide instructions on getting your users set up in Immuta for the Marketplace and Governance apps.

: Bring the IAM your organization already uses and allow Immuta to register your users for you.
: Ensure the user IDs in Immuta, Starburst (Trino), and your IAM are aligned so that the right policies impact the right users.

Start using Marketplace

These guides provide instructions on using Marketplace for the first time.

: Once you register your tables and users, you can immediately start publishing data products in Marketplace.
: Users must then request access to your data products in Marketplace.
: To grant access to a data product and its tables, respond to the access request.

Add data metadata

These guides provide instructions on getting your data metadata set up in Immuta for the Governance app.

: Bring the external catalog your organization already uses and allow Immuta to continually sync your tags with your data sources for you.
: Identification allows you to automate data tagging using identifiers that detect certain data patterns.

Start using the Governance app

These guides provide instructions on using the Governance app for the first time.

: Once you add your data metadata to Immuta, you can immediately create policies that utilize your tags and apply to your tables. Subscription policies can be created to dictate access to data sources.
: Data metadata can also be used to create data policies that apply to data sources as they are registered in Immuta. Data policies dictate what data a user can see once they are granted access to a data source. Using catalog and identification tags you can create proactive policies, knowing that they will apply to data sources as they are added to Immuta with the automated tagging.
: Once you have your data sources and users, and policies granting them access, you can set up audit export. This will export the audit logs from user queries, policy changes, and tagging updates.

How-to Guides

Customizing the Integration

You can customize the Databricks Spark integration settings using these components Immuta provides:

Cluster policies
Spark environment variables
Hadoop configuration file

Cluster policies

Immuta provides cluster policies that set the Spark environment variables and configuration on your Databricks cluster once you apply that policy to your cluster. These policies generated by Immuta must be applied to your cluster manually. The Configure a Databricks Spark integration guide includes instructions for generating and applying these cluster policies. Each cluster policy is described below.

Python and SQL

This is the most performant policy configuration.

In this configuration, Immuta is able to rely on Databricks-native security controls, reducing overhead. The key security control here is the enablement of process isolation. This prevents users from obtaining unintentional access to the queries of other users. In other words, masked and filtered data is consistently made accessible to users in accordance with their assigned attributes. This Immuta cluster configuration relies on Py4J security being enabled. Consequently, the following Databricks features are unsupported:

Many Python ML classes (such as LogisticRegression, StringIndexer, and DecisionTreeClassifier)
dbutils.fs
Databricks Connect client library

For full details on Databricks’ best practices in configuring clusters, read their governance documentation.

Python, SQL, and R

Additional overhead: Compared to the Python and SQL cluster policy, this configuration trades some additional overhead for added support of the R language.

In this configuration, you are able to rely on the Databricks-native security controls. The key security control here is the enablement of process isolation. This prevents users from obtaining unintentional access to the queries of other users. In other words, masked and filtered data is consistently made accessible to users in accordance with their assigned attributes.

Like the Python & SQL configuration, Py4j security is enabled for the Python & SQL & R configuration. However, because R has been added Immuta enables the Security Manager, in addition to Py4J security, to provide more security guarantees. For example, by default all actions in R execute as the root user; among other things, this permits access to the entire filesystem (including sensitive configuration data), and, without iptable restrictions, a user may freely access the cluster’s cloud storage credentials. To address these security issues, Immuta’s initialization script wraps the R and Rscript binaries to launch each command as a temporary, non-privileged user with limited filesystem and network access and installs the Immuta Security Manager, which prevents users from bypassing policies and protects against the above vulnerabilities from within the JVM.

Consequently, the cost of introducing R is that the Security Manager incurs a small increase in performance overhead; however, average latency will vary depending on whether the cluster is homogeneous or heterogeneous. (In homogeneous clusters, all users are at the same level of groups/authorizations; this is enforced externally, rather than directly by Immuta.)

When users install third-party Java/Scala libraries, they will be denied access to sensitive resources by default. However, cluster administrators can specify which of the installed Databricks libraries should be trusted by Immuta.

The following Databricks features are unsupported when this cluster policy is applied:

Many Python ML classes (such as LogisticRegression, StringIndexer, and DecisionTreeClassifier)
dbutils.fs
Databricks Connect client library

For full details on Databricks’ best practices in configuring clusters, read their governance documentation.

Python, SQL, and R with library support

Py4J security disabled: In addition to support for Python, SQL, and R, this configuration adds support for additional Python libraries and utilities by disabling Databricks-native Py4J security.

This configuration does not rely on Databricks-native Py4J security to secure the cluster, while process isolation is still enabled to secure filesystem and network access from within Python processes. On an Immuta-enabled cluster, once Py4J security is disabled the Immuta Security Manager is installed to prevent nefarious actions from Python in the JVM. Disabling Py4J security also allows for expanded Python library support, including many Python ML classes (such as LogisticRegression, StringIndexer, and DecisionTreeClassifier) and dbutils.fs.

By default, all actions in R will execute as the root user. Among other things, this permits access to the entire filesystem (including sensitive configuration data). And without iptable restrictions, a user may freely access the cluster’s cloud storage credentials. To properly support the use of the R language, Immuta’s initialization script wraps the R and Rscript binaries to launch each command as a temporary, non-privileged user. This user has limited filesystem and network access. The Immuta Security Manager is also installed to prevent users from bypassing policies and protects against the above vulnerabilities from within the JVM.

The Security Manager will incur a small increase in performance overhead; average latency will vary depending on whether the cluster is homogeneous or heterogeneous. (In homogeneous clusters, all users are at the same level of groups/authorizations; this is enforced externally, rather than directly by Immuta.)

A homogeneous cluster is recommended for configurations where Py4J security is disabled. If all users have the same level of authorization, there would not be any data leakage, even if a nefarious action was taken.

For full details on Databricks’ best practices in configuring clusters, read their governance documentation.

Scala

Scala clusters: This configuration is for Scala-only clusters.

Where Scala language support is needed, this configuration can be used in the Custom access mode.

According to Databricks’ cluster type support documentation, Scala clusters are intended for single users only. However, nothing inherently prevents a Scala cluster from being configured for multiple users. Even with the Immuta Security Manager enabled, there are limitations to user isolation within a Scala job.

For a secure configuration, it is recommended that clusters intended for Scala workloads are limited to Scala jobs only and are made homogeneous through the use of project equalization or externally via convention/cluster ACLs. (In homogeneous clusters, all users are at the same level of groups/authorizations; this is enforced externally, rather than directly by Immuta.)

For full details on Databricks’ best practices in configuring clusters, read their governance documentation.

Sparklyr

Single-user clusters recommended: Like Databricks, Immuta recommends single-user clusters for sparklyr when user isolation is required. A single-user cluster can either be a job cluster or a cluster with credential passthrough enabled. Note: spark-submit jobs are not currently supported.

Two cluster types can be configured with sparklyr: Single-User Clusters (recommended) and Multi-User Clusters (discouraged).

Single-User Clusters: Credential Passthrough (required on Databricks) allows a single-user cluster to be created. This setting automatically configures the cluster to assume the role of the attached user when reading from storage. Because Immuta requires that raw data is readable by the cluster, the instance profile associated with the cluster should be used rather than a role assigned to the attached user.
Multi-User Clusters: Because Immuta cannot guarantee user isolation in a multi-user sparklyr cluster, it is not recommended to deploy a multi-user cluster. To force all users to act under the same set of attributes, groups, and purposes with respect to their data access and eliminate the risk of a data leak, all sparklyr multi-user clusters must be equalized either by convention (all users able to attach to the cluster have the same level of data access in Immuta) or by configuration (detailed below).

Single-user cluster configuration

1 - Enable sparklyr

In addition to the configuration for an Immuta cluster with R, add this environment variable to the Environment Variables section of the cluster:

IMMUTA_DATABRICKS_SPARKLYR_SUPPORT_ENABLED=true

This configuration makes changes to the iptables rules on the cluster to allow the sparklyr client to connect to the required ports on the JVM used by the sparklyr backend service.

2 - Set up a sparklyr connection in Databricks

Install and load libraries into a notebook. Databricks includes the stable version of sparklyr, so library(sparklyr) in an R notebook is sufficient, but you may opt to install the latest version of sparklyr from CRAN. Additionally, loading library(DBI) will allow you to execute SQL queries.

Set up a sparklyr connection:

sc <- spark_connect(method = "databricks")

Pass the connection object to execute queries:
```
dbGetQuery(sc, "show tables in immuta")
```

3 - Configure a single-user cluster

Add the following items to the Spark Config section of the cluster:

spark.databricks.passthrough.enabled true

spark.databricks.pyspark.trustedFilesystems com.databricks.s3a.S3AFileSystem,shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem,shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem,com.databricks.adl.AdlFileSystem,shaded.databricks.V2_1_4.com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem,shaded.databricks.org.apache.hadoop.fs.azure.NativeAzureFileSystem,shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem,org.apache.hadoop.fs.ImmutaSecureFileSystemWrapper

spark.hadoop.fs.s3a.aws.credentials.provider com.amazonaws.auth.InstanceProfileCredentialsProvider

The trustedFileSystems setting is required to allow Immuta’s wrapper FileSystem (used in conjunction with the Security Manager for data security purposes) to be used with credential passthrough. Additionally, the InstanceProfileCredentialsProvider must be configured to continue using the cluster’s instance profile for data access, rather than a role associated with the attached user.

Multi-user cluster configuration

Avoid deploying multi-user clusters with sparklyr configuration

It is possible, but not recommended, to deploy a multi-user cluster sparklyr configuration. Immuta cannot guarantee user isolation in a multi-user sparklyr configuration.

The configurations in this section enable sparklyr, require project equalization, map sparklyr sessions to the correct Immuta user, and prevent users from accessing Immuta native workspaces.

Add the following environment variables to the Environment Variables section of your cluster configuration:

IMMUTA_DATABRICKS_SPARKLYR_SUPPORT_ENABLED=true

IMMUTA_SPARK_REQUIRE_EQUALIZATION=true

IMMUTA_SPARK_CURRENT_USER_SCIM_FALLBACK=false

Add the following items to the Spark Config section:

immuta.spark.acl.assume.not.privileged true

immuta.api.key=<user’s API key>

Limitations

Immuta’s integration with sparklyr does not currently support

spark-submit jobs
UDFs

Spark environment variables

The Spark environment variables reference guide lists the various possible settings controlled by these variables that you can set in your cluster policy before attaching it to your cluster.

Additional Hadoop configuration file (optional)

In some cases it is necessary to add sensitive configuration to SparkSession.sparkContext.hadoopConfiguration to allow Spark to read data.

For example, when accessing external tables stored in Azure Data Lake Gen2, Spark must have credentials to access the target containers or filesystems in Azure Data Lake Gen2, but users must not have access to those credentials. In this case, an additional configuration file may be provided with a storage account key that the cluster may use to access Azure Data Lake Gen2.

To use an additional Hadoop configuration file, set the IMMUTA_INIT_ADDITIONAL_CONF_URI Spark environment variable to be the full URI to this file.

Configurable settings

Data source settings

Protected and unprotected tables

Databricks non-privileged users will only see sources to which they are subscribed in Immuta, and this can present problems if organizations have a data lake full of non-sensitive data and Immuta removes access to all of it. Immuta addresses this challenge by allowing Immuta users to access any tables that are not protected by Immuta (i.e., not registered as a data source or a table in a native workspace). Although this is similar to how privileged users in Databricks operate, non-privileged users cannot bypass Immuta controls.

Protected until made available by policy: This setting means that users can only see tables that Immuta has explicitly subscribed them to.

Behavior change

If a table is registered in Immuta and does not have a subscription policy applied to it, that data will be visible to users, even if the Protected until made available by policy setting is enabled.

If you have enabled this setting, author an "Allow individually selected users" global subscription policy that applies to all data sources.

Available until protected by policy: This setting means all tables are open until explicitly registered and protected by Immuta. This setting allows both non-Immuta reads and non-Immuta writes:
- IMMUTA_SPARK_DATABRICKS_ALLOW_NON_IMMUTA_READS: Immuta users with regular (non-privileged) Databricks roles may SELECT from tables that are not registered in Immuta. This setting does not allow reading data directly with commands like spark.read.format("x"). Users are still required to read data and query tables using Spark SQL. When non-Immuta reads are enabled through the cluster policy, Immuta users will see all databases and tables when they run show databases or show tables. However, this does not mean they will be able to query all of them.
- IMMUTA_SPARK_DATABRICKS_ALLOW_NON_IMMUTA_WRITES: Immuta users with regular (non-privileged) Databricks roles can run DDL commands and data-modifying commands against tables or spaces that are not registered in Immuta. With non-Immuta writes enabled through the cluster policy, users on the cluster can mix any policy-enforced data they may have access to via any registered data sources in Immuta with non-Immuta data and write the ensuing result to a non-Immuta write space where it would be visible to others. If this is not a desired possibility, the cluster should instead be configured to only use Immuta’s project workspaces.

The Configure a Databricks Spark integration guide includes instructions for applying these settings to your cluster.

Ephemeral overrides

In Immuta, a Databricks data source is considered ephemeral, meaning that the compute resources associated with that data source will not always be available.

Ephemeral data sources allow the use of ephemeral overrides, user-specific connection parameter overrides that are applied to Immuta metadata operations.

When a user runs a Spark job in Databricks, the Immuta plugin automatically submits ephemeral overrides for that user to Immuta for all applicable data sources to use the current cluster as compute for all subsequent metadata operations for that user against the applicable data sources.

For more details about ephemeral overrides and how to configure or disable them, see the Ephemeral overrides page.

Restricting users' access with Immuta projects

Immuta projects combine users and data sources under a common purpose. Sometimes this purpose is for a single user to organize their data sources or to control an entire schema of data sources through a single projects screen; however, most often this is an Immuta purpose for which the data has been approved to be used and will restrict access to data and streamline team collaboration. Consequently, data owners can restrict access to data for a specified purpose through projects.

When a user is working within the context of a project, data users will only see the data in that project. This helps to prevent data leaks when users collaborate. Users can switch project contexts to access various data sources while acting under the appropriate purpose. Consider adjusting the following project settings to suit your organization's needs:

Project UDFs (web service and on-cluster caches): Immuta caches a mapping of user accounts and users' current projects in the Immuta Web Service and on-cluster. When users change their project with UDFs instead of the Immuta UI, Immuta invalidates all the caches on-cluster (so that everything changes immediately) and the cluster submits a request to change the project context to a web worker. Immediately after that request, another call is made to a web worker to refresh the current project. To allow use of project UDFs in Spark jobs, raise the caching on-cluster and lower the cache timeouts for the Immuta Web Service. Otherwise, caching could cause dissonance among the requests and calls to multiple web workers when users try to change their project contexts.
Preventing users from changing projects within a session: If your compliance requirements restrict users from changing projects within a session, you can block the use of Immuta's project UDFs on a Databricks Spark cluster. To do so, configure the IMMUTA_SPARK_DATABRICKS_DISABLED_UDFS Spark environment variable.

Databricks features

This section describes how Immuta interacts with common Databricks features.

Change data feed

Databricks users can see the on queried tables if they are allowed to read raw data and meet specific qualifications. Immuta does not support applying policies to the changed data, and the CDF cannot be read for data source tables if the user does not have access to the raw data in Databricks or for streaming queries.

The CDF can be read if the querying user is allowed to read the raw data and ONE of the following statements is true:

the table is in the current workspace
the table is in a scratch path
non-Immuta reads are enabled AND the table does not intersect with a workspace under which the current user is not acting
non-Immuta reads are enabled AND the table is not part of an Immuta data source

Databricks trusted libraries

Security vulnerability

Using this feature could create a security vulnerability, depending on the third-party library. For example, if a library exposes a public method named readProtectedFile that displays the contents of a sensitive file, then trusting that library would allow end users access to that file. Work with your Immuta support professional to determine if the risk does not apply to your environment or use case.

The trusted libraries feature allows Databricks cluster administrators to avoid . An administrator can specify an installed library as trusted, which will enable that library's code to bypass the Immuta security manager. This feature does not impact Immuta's ability to apply policies; trusting a library only allows code through that otherwise would have been blocked by the Security Manager.

The following types of libraries are supported when installing a third-party library using the Databricks UI or the Databricks Libraries API:

Library source is Upload, DBFS or DBFS/S3 and the Library Type is Jar.
Library source is Maven.

When users install third-party libraries, those libraries will be denied access to sensitive resources by default. However, cluster administrators can specify which of the installed Databricks libraries should be trusted by Immuta. See the Install a trusted library guide to add a trusted library to your configuration.

Limitations

Installing trusted libraries outside of the Databricks Libraries API (e.g., ADD JAR ...) is not supported.
Databricks installs libraries right after a cluster has started, but there is no guarantee that library installation will complete before a user's code is executed. If a user executes code before a trusted library installation has completed, Immuta will not be able to identify the library as trusted. This can be solved by either
- waiting for library installation to complete before running any third-party library commands or
- executing a Spark query. This will force Immuta to wait for any trusted Immuta libraries to complete installation before proceeding.
When installing a library using Maven as a library source, Databricks will also install any transitive dependencies for the library. However, those transitive dependencies are installed behind the scenes and will not appear as installed libraries in either the Databricks UI or using the Databricks Libraries API. Only libraries specifically listed in the IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URIS environment variable will be trusted by Immuta, which does not include installed transitive dependencies. This effectively means that any code paths that include a class from a transitive dependency but do not include a class from a trusted third-party library can still be blocked by the Immuta security manager. For example, if a user installs a trusted third-party library that has a transitive dependency of a file-util library, the user will not be able to directly use the file-util library to read a sensitive file that is normally protected by the Immuta security manager.
In many cases, it is not a problem if dependent libraries aren't trusted because code paths where the trusted library calls down into dependent libraries will still be trusted. However, if the dependent library needs to be trusted, there is a workaround:
1. Add the transitive dependency jar paths to the IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URIS Spark environment variable. In the driver log4j logs, Databricks outputs the source jar locations when it installs transitive dependencies. In the cluster driver logs, look for a log message similar to the following:
  INFO LibraryDownloadManager: Downloaded library dbfs:/FileStore/jars/maven/org/slf4j/slf4j-api-1.7.25.jar as local file /local_disk0/tmp/addedFile8569165920223626894slf4j_api_1_7_25-784af.jar
2. In the above example, where slf4j is the transitive dependency, you would add the path dbfs:/FileStore/jars/maven/org/slf4j/slf4j-api-1.7.25.jar to the IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URIS environment variable and restart your cluster.

External catalogs

Connect any of these supported external catalogs to work with your Databricks Spark integration so data owners can tag their data.

External metastores

Immuta supports the use of external metastores in local or remote mode:

Local mode: The metastore client running inside a cluster connects to the underlying metastore database directly via JDBC.
Remote mode: Instead of connecting to the underlying database directly, the metastore client connects to a separate metastore service via the Thrift protocol. The metastore service connects to the underlying database. When running a metastore in remote mode, DBFS is not supported.

For more details about these deployment modes, see how to set up Databricks clusters to connect to an existing external Apache Hive metastore.

Configure external Hive metastore

Download the metastore jars and point to them as specified in Databricks documentation. Metastore jars must end up on the cluster's local disk at this explicit path: /databricks/hive_metastore_jars.

If using DBR 7.x with Hive 2.3.x, either

Set spark.sql.hive.metastore.version to 2.3.7 and spark.sql.hive.metastore.jars to builtin or
Download the metastore jars and set spark.sql.hive.metastore.jars to /databricks/hive_metastore_jars/* as before.

Configure AWS Glue Data Catalog

To use AWS Glue Data Catalog as the metastore for Databricks, see the Databricks documentation.

Notebook-scoped libraries on machine learning clusters

Users on Databricks Runtimes 8+ can manage notebook-scoped libraries with %pip commands.

However, this functionality differs from the support for Databricks trusted libraries, and Python libraries are not supported as trusted libraries. The Immuta Security Manager will deny the code of libraries installed with %pip access to sensitive resources.

Scratch paths

Scratch paths are cluster-specific remote file paths that Databricks users are allowed to directly read from and write to without restriction. The creator of a Databricks cluster specifies the set of remote file paths that are designated as scratch paths on that cluster when they configure a Databricks cluster. Scratch paths are useful for scenarios where non-sensitive data needs to be written out to a specific location using a Databricks cluster protected by Immuta.

To configure a scratch path, use the IMMUTA_SPARK_DATABRICKS_SCRATCH_PATHS Spark environment variable.

Configure Starburst (Trino) Integration

The plugin comes pre-installed with Starburst Enterprise, so this page provides separate sets of guidelines for configuration:

Starburst Cluster Configuration: These instructions are specific to Starburst Enterprise clusters.
Trino Cluster Configuration: These instructions are specific to open-source Trino clusters.

Starburst Cluster Configuration

Requirements

A valid Starburst Enterprise license.
The Starburst Cluster must be publicly accessible or have private connectivity configured.

Starburst does not support using Starburst built-in access control (BIAC) concurrently with any other access control providers such as Immuta. If Starburst BIAC is in use, it must be disabled to allow Immuta to enforce policies on cluster.

1 - Enable the Integration

Click the App Settings icon in the navigation menu.
Click the Integrations tab.
Click Add Integration and select Trino from the Integration Type dropdown menu.
Click Save.

2 - Configure the Immuta System Access Control Plugin in Starburst

Default configuration property values

If you use the default property values in the configuration file described in this section,

you will give users read and write access to tables that are not registered in Immuta and
results for SHOW queries will not be filtered on table metadata.

These default settings help ensure that a new Starburst integration installation is minimally disruptive for existing Starburst deployments, allowing you to then add Immuta data sources and update configuration to enforce more controls as you see fit.

However, the access-control.config-files property can be configured to allow Immuta to work with existing Starburst installations that have already configured an access control provider. For example, if the Starburst integration is configured to allow users write access to tables that are not protected by Immuta, you can still lock down write access for specific non-Immuta tables using an additional access control provider.

Create the Immuta access control configuration file in the Starburst configuration directory (/etc/starburst/immuta-access-control.properties for Docker installations or <starburst_install_directory>/etc/immuta-access-control.properties for standalone installations).
The table below describes the properties that can be set during configuration.
Property
Starburst version
Required or optional
Description
access-control.name
392 and newer
Required
This property enables the integration.
access-control.config-files
392 and newer
Optional
Starburst allows you to enable multiple system access control providers at the same time. To do so, add providers to this property as comma-separated values. Immuta has tested the Immuta system access control provider alongside the . This approach allows Immuta to work with existing Starburst installations that have already configured an access control provider. Immuta does not manage all permissions in Starburst and will default to allowing access to anything Immuta does not manage so that the Starburst integration complements existing controls. For example, if the Starburst integration is configured to allow users write access to tables that are not protected by Immuta, you can still lock down write access for specific non-Immuta tables using an additional access control provider.
immuta.allowed.immuta.datasource.operations
413 and newer
Optional
This property defines a comma-separated list of allowed operations for Starburst (Trino) users on tables registered as Immuta data sources: READ,WRITE, and OWN. (See the for details about the OWN operation.) When set to WRITE, all querying users are allowed read and write operations to data source schemas and tables. By default, this property is set to READ, which blocks write operations on data source tables and schemas. If are enabled for your Immuta tenant, this property is set to READ,WRITE by default, so users are allowed read and write operations to data source schemas and tables.
immuta.allowed.non.immuta.datasource.operations
392 and newer
Optional
This property defines a comma-separated list of allowed operations users will have on tables not registered as Immuta data sources: READ, WRITE, CREATE, and OWN. (See the for details about CREATE and OWN operations.) When set to READ, users are allowed read operations on tables not registered as Immuta data sources. When set to WRITE, users are allowed read and write operations on tables not registered as Immuta data sources. If this property is left empty, users will not get access to any tables outside Immuta. By default, this property is set to READ,WRITE. If are enabled for your Immuta tenant, this property is set to READ,WRITE,OWN,CREATE by default.
immuta.apikey
392 and newer
Required
This should be set to the Immuta API key displayed when enabling the integration on the app settings page. To rotate this API key, use the to generate a new API key, and then replace the existing immuta.apikey value with the new one.
immuta.audit.legacy.enabled
435 and newer
Optional
This property allows you to turn off Starburst (Trino) audit. Must set both immuta.audit.legacy.enabled and immuta.audit.uam.enabled to false to fully disable query audit.
immuta.audit.uam.enabled
435 and newer
Optional
This property allows you to turn off Starburst (Trino) audit. Must set both immuta.audit.legacy.enabled and immuta.audit.uam.enabled to false to fully disable query audit.
immuta.ca-file
392 and newer
Optional
This property allows you to specify a path to your CA file.
immuta.cache.views.seconds
392 and newer
Optional
Amount of time in seconds for which a user's specific representation of an Immuta data source will be cached for. Changing this will impact how quickly policy changes are reflected for users actively querying Starburst. By default, cache expires after 30 seconds.
immuta.cache.datasource.seconds
392 and newer
Optional
Amount of time in seconds for which a user's available Immuta data sources will be cached for. Changing this will impact how quickly data sources will be available due to changing projects or subscriptions. By default, cache expires after 30 seconds.
immuta.endpoint
392 and newer
Required
The protocol and fully qualified domain name (FQDN) for the Immuta tenant used by Starburst (for example, https://my.immuta.tenant.io). This should be set to the endpoint displayed when enabling the integration on the app settings page.
immuta.filter.unallowed.table.metadata
392 and newer
Optional
When set to false, Immuta won't filter unallowed table metadata, which helps ensure Immuta remains noninvasive and performant. If this property is set to true, running show catalogs, for example, will reflect what that user has access to instead of returning all catalogs. By default, this property is set to false.
immuta.group.admin
420 and newer
Required if immuta.user.admin is not set
This property identifies the Starburst group that is the Immuta administrator. The users in this group will not have Immuta policies applied to them. Therefore, data sources should be created by users in this group so that they have access to everything. This property can be used in conjunction with the immuta.user.admin property, and regex filtering can be used (with a | delimiter at the end of each expression) to assign multiple groups as the Immuta administrator. Note that you must escape regex special characters (for example, john\\.doe+svcacct@immuta\\.com).
immuta.http.timeout.milliseconds
464 and newer
Optional
The timeout for all HTTP calls made to Immuta in milliseconds. Defaults to 30000 (30 seconds).
immuta.user.admin
392 and newer
Required if immuta.group.admin is not set
This property identifies the Starburst user who is an Immuta administrator (for example, immuta.user.admin=immuta_system_account). This user will not have Immuta policies applied to them because this account will run the subqueries. Therefore, data sources should be created by this user so that they have access to everything. This property can be used in conjunction with the immuta.group.admin property, and regex filtering can be used (with a | delimiter at the end of each expression) to assign multiple users as the Immuta administrator. Note that you must escape regex special characters (for example, john\\.doe+svcacct@immuta\\.com).
Enable the Immuta access control plugin in Starburst's configuration file (/etc/starburst/config.properties for Docker installations or <starburst_install_directory>/etc/config.properties for standalone installations). For example,
```
access-control.config-files=/etc/starburst/immuta-access-control.properties
```

Example Immuta System Access Control Configuration

The example configuration snippet below uses the default configuration settings for immuta.allowed.immuta.datasource.operations and immuta.allowed.non.immuta.datasource.operations, which allow read access for data registered as Immuta data sources and read and write access on data that is not registered in Immuta. See the Granting Starburst (Trino) privileges section for details about customizing and enforcing read and write access controls in Starburst.

# Enable the Immuta System Access Control (v2) implementation.
access-control.name=immuta

# The Immuta endpoint that was displayed when enabling the Starburst integration in Immuta.
immuta.endpoint=http://service.immuta.com:3000

# The Immuta API key that was displayed when enabling the Starburst integration in Immuta.
immuta.apikey=45jdljfkoe82b13eccfb9c

# The administrator user regex. Starburst usernames matching this regex will not be subject to
# Immuta policies. This regex should match the user name provided at Immuta data source
# registration.
immuta.user.admin=immuta_system_account

# Optional argument (default is shown).
# A CSV list of operations allowed on schemas/tables registered as Immuta data sources.
immuta.allowed.immuta.datasource.operations=READ

# Optional argument (default is shown).
# A CSV list of operations allowed on schemas/tables not registered as Immuta data sources.
# Set to empty to allow no operations on non-Immuta data sources.
immuta.allowed.non.immuta.datasource.operations=READ,WRITE

# Optional argument (default is shown).
# Controls table metadata filtering for inaccessible tables.
#   - When this property is enabled and non-Immuta reads are also enabled, a user performing
#     'show catalogs/schemas/tables' will not see metadata for a table that is registered as
#     an Immuta data source but the user does not have access to through Immuta.
#   - When this property is enabled and non-Immuta reads and writes are disabled, a user
#     performing 'show catalogs/schemas/tables' will only see metadata for tables that the
#     user has access to through Immuta.
#   - When this property is disabled, a user performing 'show catalogs/schemas/tables' can see
#     all metadata.
immuta.filter.unallowed.table.metadata=false

3 - Add Starburst Users to Immuta

Configure your external IAM to add users to Immuta.
Map their Starburst usernames when configuring your IAM (or map usernames manually) to Immuta.
- All Starburst users must map to Immuta users or match the immuta.user.admin regex configured on the cluster, and their Starburst username must be mapped to Immuta so they can query policy-enforced data.
- A user impersonating a different user in Starburst requires the IMPERSONATE_USER permission in Immuta. Both users must be mapped to an Immuta user, or the querying user must match the configured immuta.user.admin regex.

4 - Register data

Trino Cluster Configuration

1 - Enable the Integration

Click the App Settings icon in the navigation menu.
Click the Integrations tab.
Click Add Integration and select Trino from the dropdown menu.
Click Save.

2 - Configure the Immuta System Access Control Plugin in Trino

Default configuration property values

If you use the default property values in the configuration file described in this section,

you will give users read and write access to tables that are not registered in Immuta and
results for SHOW queries will not be filtered on table metadata.

These default settings help ensure that a new Starburst integration installation is minimally disruptive for existing Trino deployments, allowing you to then add Immuta data sources and update configuration to enforce more controls as you see fit.

However, the access-control.config-files property can be configured to allow Immuta to work with existing Trino installations that have already configured an access control provider. For example, if the Starburst (Trino) integration is configured to allow users write access to tables that are not protected by Immuta, you can still lock down write access for specific non-Immuta tables using an additional access control provider.

The Immuta Trino plugin version matches the version of the corresponding Trino releases. For example, the Immuta plugin version supporting Trino version 403 is simply version 403. Navigate to the Immuta GitHub repository for a list of supported Trino versions. Immuta follows Starburst's release cycle, but you can contact your Immuta representative for a specific Trino OSS release.
Download the assets for the release that corresponds to your Trino version.
Enable Immuta on your cluster. Select the tab below that corresponds to your installation method for instructions:

Docker installations

Follow Trino's documentation to install the plugin archive on all nodes in your cluster.
Create the Immuta access control configuration file in the Trino configuration directory: /etc/trino/immuta-access-control.properties.

immuta-trino Docker image

For Trino versions 414 and newer, an immuta-trino Docker image that includes the Trino plugin jars is available from ocir.immuta.com. Before using this image, consider the following factors:

This image was designed to provide a method for organizations to quickly set up and validate the integration, so it should be used in a development environment. Use the Docker installation method above for production environments.
Immuta only supports the Immuta Trino plugin on the Docker image, not any other software packaged on the image.
If you experience an issue with the image outside of the scope of the Immuta plugin, you must rebuild your own version of the image using the Docker installation method above.

To use this image,

Pull the image and start the container. The example below specifies the Immuta Trino plugin version 414 with the 414 tag, but any supported Trino version newer than 414 can be used:
```
docker run ocir.immuta.com/immuta/immuta-trino:414
```
Create the Immuta access control configuration file in the Trino configuration directory: /etc/trino/immuta-access-control.properties.

Standalone installations

Follow Trino's documentation to install the plugin archive on all nodes in your cluster.
Create the Immuta access control configuration file in the Trino configuration directory: <trino_install_directory>/etc/immuta-access-control.properties.

Configure the properties described in the table below.
Property
Trino version
Required or optional
Description
access-control.name
392 and newer
Required
This property enables the integration.
access-control.config-files
392 and newer
Optional
Trino allows you to enable multiple system access control providers at the same time. To do so, add providers to this property as comma-separated values. This approach allows Immuta to work with existing Trino installations that have already configured an access control provider. Immuta does not manage all permissions in Trino and will default to allowing access to anything Immuta does not manage so that the Starburst (Trino) integration complements existing controls. For example, if the Starburst (Trino) integration is configured to allow users write access to tables that are not protected by Immuta, you can still lock down write access for specific non-Immuta tables using an additional access control provider.
immuta.allowed.immuta.datasource.operations
413 and newer
Optional
This property defines a comma-separated list of allowed operations for Starburst (Trino) users on tables registered as Immuta data sources: READ,WRITE, and OWN. (See the for details about the OWN operation.) When set to WRITE, all querying users are allowed read and write operations to data source schemas and tables. By default, this property is set to READ, which blocks write operations on data source tables and schemas. If are enabled for your Immuta tenant, this property is set to READ,WRITE by default, so users are allowed read and write operations to data source schemas and tables.
immuta.allowed.non.immuta.datasource.operations
392 and newer
Optional
This property defines a comma-separated list of allowed operations users will have on tables not registered as Immuta data sources: READ, WRITE, CREATE, and OWN. (See the for details about CREATE and OWN operations.) When set to READ, users are allowed read operations on tables not registered as Immuta data sources. When set to WRITE, users are allowed read and write operations on tables not registered as Immuta data sources. If this property is left empty, users will not get access to any tables outside Immuta. By default, this property is set to READ,WRITE. If are enabled for your Immuta tenant, this property is set to READ,WRITE,OWN,CREATE by default.
immuta.apikey
392 and newer
Required
This should be set to the Immuta API key displayed when enabling the integration on the app settings page. To rotate this API key, use the to generate a new API key, and then replace the existing immuta.apikey value with the new one.
immuta.audit.legacy.enabled
435 and newer
Optional
This property allows you to turn off Starburst (Trino) audit. Must set both immuta.audit.legacy.enabled and immuta.audit.uam.enabled to false to fully disable query audit.
immuta.audit.uam.enabled
435 and newer
Optional
This property allows you to turn off Starburst (Trino) audit. Must set both immuta.audit.legacy.enabled and immuta.audit.uam.enabled to false to fully disable query audit.
immuta.ca-file
392 and newer
Optional
This property allows you to specify a path to your CA file.
immuta.cache.views.seconds
392 and newer
Optional
Amount of time in seconds for which a user's specific representation of an Immuta data source will be cached for. Changing this will impact how quickly policy changes are reflected for users actively querying Trino. By default, cache expires after 30 seconds.
immuta.cache.datasource.seconds
392 and newer
Optional
Amount of time in seconds for which a user's available Immuta data sources will be cached for. Changing this will impact how quickly data sources will be available due to changing projects or subscriptions. By default, cache expires after 30 seconds.
immuta.endpoint
392 and newer
Required
The protocol and fully qualified domain name (FQDN) for the Immuta tenant used by Trino (for example, https://my.immuta.tenant.io). This should be set to the endpoint displayed when enabling the integration on the app settings page.
immuta.filter.unallowed.table.metadata
392 and newer
Optional
When set to false, Immuta won't filter unallowed table metadata, which helps ensure Immuta remains noninvasive and performant. If this property is set to true, running show catalogs, for example, will reflect what that user has access to instead of returning all catalogs. By default, this property is set to false.
immuta.group.admin
420 and newer
Required if immuta.user.admin is not set
This property identifies the Trino group that is the Immuta administrator. The users in this group will not have Immuta policies applied to them. Therefore, data sources should be created by users in this group so that they have access to everything. This property can be used in conjunction with the immuta.user.admin property, and regex filtering can be used (with a | delimiter at the end of each expression) to assign multiple groups as the Immuta administrator. Note that you must escape regex special characters (for example, john\\.doe+svcacct@immuta\\.com).
immuta.http.timeout.milliseconds
464 and newer
Optional
The timeout for all HTTP calls made to Immuta in milliseconds. Defaults to 30000 (30 seconds).
immuta.user.admin
392 and newer
Required if immuta.group.admin is not set
This property identifies the Trino user who is an Immuta administrator (for example, immuta.user.admin=immuta_system_account). This user will not have Immuta policies applied to them because this account will run the subqueries. Therefore, data sources should be created by this user so that they have access to everything. This property can be used in conjunction with the immuta.group.admin property, and regex filtering can be used (with a | delimiter at the end of each expression) to assign multiple users as the Immuta administrator. Note that you must escape regex special characters (for example, john\\.doe+svcacct@immuta\\.com).
Enable the Immuta access control plugin in Trino's configuration file (/etc/trino/config.properties for Docker installations or <trino_install_directory>/etc/config.properties for standalone installations). For example,
```
access-control.config-files=/etc/trino/immuta-access-control.properties
```

Example Immuta System Access Control Configuration

# Enable the Immuta System Access Control (v2) implementation.
access-control.name=immuta

# The Immuta endpoint that was displayed when enabling the Starburst integration in Immuta.
immuta.endpoint=http://service.immuta.com:3000

# The Immuta API key that was displayed when enabling the Starburst integration in Immuta.
immuta.apikey=45jdljfkoe82b13eccfb9c

# The administrator user regex. Starburst usernames matching this regex will not be subject to
# Immuta policies. This regex should match the user name provided at Immuta data source
# registration.
immuta.user.admin=immuta_system_account

# Optional argument (default is shown).
# A CSV list of operations allowed on schemas/tables registered as Immuta data sources.
immuta.allowed.immuta.datasource.operations=READ

# Optional argument (default is shown).
# A CSV list of operations allowed on schemas/tables not registered as Immuta data sources.
# Set to empty to allow no operations on non-Immuta data sources.
immuta.allowed.non.immuta.datasource.operations=READ,WRITE

# Optional argument (default is shown).
# Controls table metadata filtering for inaccessible tables.
#   - When this property is enabled and non-Immuta reads are also enabled, a user performing
#     'show catalogs/schemas/tables' will not see metadata for a table that is registered as
#     an Immuta data source but the user does not have access to through Immuta.
#   - When this property is enabled and non-Immuta reads and writes are disabled, a user
#     performing 'show catalogs/schemas/tables' will only see metadata for tables that the
#     user has access to through Immuta.
#   - When this property is disabled, a user performing 'show catalogs/schemas/tables' can see
#     all metadata.
immuta.filter.unallowed.table.metadata=false

3 - Add Trino Users to Immuta

Configure your external IAM to add users to Immuta.
Map their Trino usernames when configuring your IAM (or map usernames manually) to Immuta.
- All Trino users must map to Immuta users or match the immuta.user.admin regex configured on the cluster, and their Trino username must be mapped to Immuta so they can query policy-enforced data.
- A user impersonating a different user in Trino requires the IMPERSONATE_USER permission in Immuta. Both users must be mapped to an Immuta user, or the querying user must match the configured immuta.user.admin regex.

4 - Register data

Amazon S3

Private preview: This integration is available to select accounts. Contact your Immuta representative for details.

Getting started

Immuta's Amazon S3 integration allows users to apply subscription policies to data in S3 to restrict what prefixes, buckets, or objects users can access. To enforce access controls on this data, Immuta creates S3 grants that are administered by S3 Access Grants, an AWS feature that defines access permissions to data in S3.

Requirements

No location is registered in your S3 Access Grants instance before configuring the integration in Immuta
Write policies private preview enabled for your account; contact your Immuta representative to get this feature enabled
: IDC is the best approach for user provisioning because it treats users as users, not users as roles. Consequently, access controls are enforced for the querying user, nothing more. This approach eliminates over-provisioning and permits granular access control. Furthermore, IDC uses trusted identity propagation, meaning AWS propagates a user's identity wherever that user may operate within the AWS ecosystem. As a result, a user's identity always remains known and consistent as they navigate across AWS services, which is a key requirement for organizations to properly govern that user. Enabling IDC does not impact any existing access controls; it is additive. Immuta will manage the GRANTs for you using IDC if it is enabled and configured in Immuta. See the protect data section for instructions on mapping users from AWS IDC to user accounts in Immuta.

Permissions

APPLICATION_ADMIN Immuta permission to configure the integration
CREATE_S3_DATASOURCE Immuta permission to register S3 prefixes
The AWS account credentials or optional AWS IAM role you provide Immuta to configure the integration must
- have ownership of the buckets Immuta will enforce policies on
- have the permissions to perform the following actions to create locations and issue grants:
  - accessgrantslocation resource:
    s3:CreateAccessGrant
    s3:DeleteAccessGrantsLocation
    s3:GetAccessGrantsLocation
    s3:UpdateAccessGrantsLocation
  - accessgrantsinstance resource:
    s3:CreateAccessGrantsInstance
    s3:CreateAccessGrantsLocation
    s3:DeleteAccessGrantsInstance
    s3:GetAccessGrantsInstance
    s3:GetAccessGrantsInstanceForPrefix
    s3:GetAccessGrantsInstanceResourcePolicy
    s3:ListAccessGrants
    s3:ListAccessGrantsLocations
  - accessgrant resource:
    s3:DeleteAccessGrant
    s3:GetAccessGrant
  - bucket resource: s3:ListBucket
  - role resource:
    iam:GetRole
    iam:PassRole
  - all resources: s3:ListAccessGrantsInstances

Set up S3 Access Grants instance

Follow AWS documentation to create an Access Grants instance using the S3 console, AWS CLI, AWS SDKs, or the REST API. AWS supports one Access Grants instance per region per AWS account.
Follow the instructions at the top of the "Register a location" page in AWS documentation to create an AWS IAM role and edit the trust policy to give the S3 Access Grants service principal access to this role in the resource policy file. You will add this role to your integration configuration in Immuta so that Immuta can register this role with your Access Grants location. The policy should include at least the following permissions, but might need additional permissions depending on other local setup factors. An example trust policy is provided below.
- sts:AssumeRole
- sts:SetSourceIdentity

IAM role trust policy example

{
  "Version": "2012-10-17",
    "Statement": [
    {
      "Sid": "Stmt1234567891011",
      "Effect": "Allow",
      "Principal": {
        "Service":"access-grants.s3.amazonaws.com"
      },
      "Action": [
        "sts:AssumeRole", 
        "sts:SetSourceIdentity"
      ]
    }
  ]
}

Follow the instructions at the top of the "Register a location" page in AWS documentation to create an IAM policy with the following permissions, and attach the policy to the IAM role you created to grant the permissions to the role. The policy should include the following permissions. An example policy is provided below.

s3:GetObject
s3:GetObjectVersion
s3:GetObjectAcl
s3:GetObjectVersionAcl
s3:ListMultipartUploadParts
s3:PutObject
s3:PutObjectAcl
s3:PutObjectVersionAcl
s3:DeleteObject
s3:DeleteObjectVersion
s3:AbortMultipartUpload
s3:ListBucket
s3:ListAllMyBuckets

IAM policy example

Replace <bucket_arn> in the example below with the ARN of the bucket scope that contains data you want to grant access to.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ObjectLevelReadPermissions",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:GetObjectAcl",
                "s3:GetObjectVersionAcl",
                "s3:ListMultipartUploadParts"
            ],
            "Resource": [
                <bucket arn>
            ]
        },
        {
            "Sid": "ObjectLevelWritePermissions",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:PutObjectAcl",
                "s3:PutObjectVersionAcl",
                "s3:DeleteObject",
                "s3:DeleteObjectVersion",
                "s3:AbortMultipartUpload"
            ],
            "Resource": [
                <bucket arn>
            ]
        },
        {
            "Sid": "BucketLevelReadPermissions",
            "Effect": "Allow",
            "Action": [
                "s3:ListAllMyBuckets",
                "s3:ListBucket"
            ],
            "Resource": [
                <bucket arn>
            ]
        }
    ]
}

If you use server-side encryption with AWS Key Management Service (AWS KMS) keys to encrypt your data, the following permissions are required for the IAM role in the policy. If you do not use this feature, do not include these permissions in your IAM policy:

kms:Decrypt
kms:GenerateDataKey

Opt to create an AWS IAM role that Immuta can use to create Access Grants locations and issue grants. This role must have the S3 permissions listed in the permissions section. An example policy is provided below.

IAM policy example

Replace <role_arn> and <access_grants_instance_arn> in the example below with the ARNs of the role you created and your Access Grants instance, respectively. The Access Grants instance resource ARN should be scoped to apply to any future locations that will be created under this Access Grants instance. For example, "Resource": "arn:aws:s3:us-east-2:6********499:access-grants/default*" ensures that the role would have permissions for both of these locations:

arn:aws:s3:us-east-2:6********499:access-grants/default/newlocation1
arn:aws:s3:us-east-2:6********499:access-grants/default/newlocation2

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "RolePermissions",
            "Effect": "Allow",
            "Action": [
                "iam:GetRole",
                "iam:PassRole"
            ],
            "Resource": "<role_arn>"
        },
        {
            "Sid": "AccessGrants",
            "Effect": "Allow",
            "Action": [
                "s3:CreateAccessGrant",
                "s3:DeleteAccessGrantsLocation",
                "s3:GetAccessGrantsLocation",
                "s3:CreateAccessGrantsLocation",
                "s3:GetAccessGrantsInstance",
                "s3:GetAccessGrantsInstanceForPrefix",
                "s3:GetAccessGrantsInstanceResourcePolicy",
                "s3:ListAccessGrants",
                "s3:ListAccessGrantsLocations",
                "s3:ListAccessGrantsInstances",
                "s3:DeleteAccessGrant",
                "s3:GetAccessGrant"
            ],
            "Resource": [
                "<access_grants_instance_arn>"
            ]
        }
    ]
}

If you use AWS IAM Identity Center, associate your IAM Identity Center instance with your S3 Access Grants instance. Then add the permissions listed in the sample policy below to your IAM policy, and attach the policy to the IAM role you created to grant the permissions to the role.

IAM policy example

Copy the JSON below and replace the following bracketed placeholder values with your own. For details about the actions and resource values, see the IAM Identity Center API reference documentation.

<iam_identity_center_instance_arn>: The ARN of the instance of IAM Identity Center (InstanceArn) that is configured with the application.
<iam_identity_center_application_arn_for_s3_access_grants>: The ARN of the S3 Access Grants instance (ApplicationArn) configured with IAM Identity Center.
<aws_account>: Your AWS account ID.
<identity_store_id>: The globally unique identifier for the identity store (IdentityStoreId) that is connected to the Identity Center instance. This value is generated when a new identity store is created.

{
  "Sid": "sso",
  "Effect": "Allow",
  "Action": [
    "sso:DescribeInstance",
    "sso:DescribeApplication",
    "sso-directory:DescribeUsers"
  ],
  "Resource": [
    "<iam_identity_center_instance_arn>",
    "<iam_identity_center_application_arn_for_s3_access_grants>",
    "arn:aws:identitystore:::user/*",
    "arn:aws:identitystore::<aws_account>:identitystore/<identity_store_id>"
  ]
}, {
  "Sid": "idc",
  "Effect": "Allow",
  "Action": [
    "identitystore:DescribeUser",
    "identitystore:DescribeGroup"
  ],
  "Resource": [
    "<iam_identity_center_instance_arn>",
    "<iam_identity_center_application_arn_for_s3_access_grants>",
    "arn:aws:identitystore:::user/*",
    "arn:aws:identitystore::<aws_account>:identitystore/<identity_store_id>"
  ]
}

Configure the integration in Immuta

In Immuta, click the App Settings icon in the navigation menu and click the Integrations tab.
Click + Add Integration.
Select Amazon S3 from the dropdown menu and click Continue Configuration.
Complete the connection details fields, where
- Friendly Name is a name for the integration that is unique across all Amazon S3 integrations configured in Immuta.
- AWS Account ID is the ID of your AWS account.
- AWS Region is the AWS region to use.
- S3 Access Grants Location IAM Role ARN is the role the S3 Access Grants service assumes to vend credentials to the grantee. When a grantee accesses S3 data, the Access Grants service attaches session policies and assumes this role in order to vend credentials scoped to a prefix or bucket to the grantee. This role needs full access to all paths under the S3 location prefix.
- S3 Access Grants S3 Location Scope is the base S3 location that Immuta will use for this connection when registering S3 prefixes. This path must be unique across all S3 integrations configured in Immuta. During data source registration, this prefix is prepended to the data source prefixes to build the final path used to grant or revoke access to that data in S3. For example, a location prefix of s3://research-data would be prepended to the data source prefix /demographics to generate a final path of s3://research-data/demographics.
Select your authentication method:
- Access using AWS IAM role (recommended): Immuta will assume this IAM role from Immuta's AWS account in order to perform any operations in your AWS account. Contact your Immuta representative before proceeding, and Immuta will
  1. Provide the AWS account to add to your trust policy.
  2. Update the Immuta AWS configuration to allow Immuta to assume the role of your service principal.
  Then, complete the steps below.
  1. Enter the role ARN in the AWS IAM Role field. Immuta will assume this role when interacting with AWS.
  2. Set the external ID provided in a condition on the trust relationship for the cross-account IAM specified above. See the AWS documentation for guidance.
- Access using access key and secret access key: Provide your AWS Access Key ID and AWS Secret Access Key.
Click Verify Credentials.
Click Next to review and confirm your connection information, and then click Complete Setup.

Editing an integration

You can edit the following settings for an existing Amazon S3 integration on the app settings page:

friendly name
authentication type and values (access key, secret, and role)

To edit settings for an existing integration via the API, see the Configure an Amazon S3 integration API guide.

Register S3 data

Follow the Create an S3 data source guide to register prefixes in Immuta.
Recommended: Organize your data sources into domains and assign domain permissions to accountable teams.

To create an S3 data source using the API, see the Create and manage an Amazon S3 data source API guide.

Protect data

Requirements: USER_ADMIN Immuta permission and either the GOVERNANCE or CREATE_S3_DATASOURCE Immuta permission

Build read or write subscription policies in Immuta to enforce access controls.
Map AWS IAM principals to each Immuta user to ensure Immuta properly enforces policies:
1. Click Identities in the navigation menu and select Users.
2. Navigate to the user's page and click the more actions icon next to their username.
3. Select Change S3 User or AWS IAM Role from the dropdown menu.
4. Use the dropdown menu to select the User Type. Then complete the S3 field. User and role names are case-sensitive. See the AWS documentation for details.
  - AWS IAM role principals: Only a single Immuta user can be mapped to an IAM role. This restriction prohibits enforcing policies on AWS users who could assume that role. Therefore, if using role principals, create a new user in Immuta that represents the role so that the role then has the permissions applied specifically to it.
  - AWS IAM user principals
  - AWS Identity Center user IDs: You must use the numeric User ID value found in AWS IAM Identity Center, not the user's email address. Ensure that you have added the content to your IAM policy JSON as outlined in the Set up S3 Access Grants instance section above to allow Immuta to use AWS Identity Center.
  - Unset (fallback to Immuta username): When selecting this option, the S3 username is assumed to be the same as the Immuta username.
5. Click Save.
See the Mapping IAM principals in Immuta section for details about supported principals.

Access data

Requirement: User must be subscribed to the data source in Immuta

Request access to Amazon S3 data through S3 Access Grants. If you're accessing S3 data through one of the supported S3 Access Grants integrations (such as Amazon EMR on EC2), that application will make this request on your behalf, so you can skip this step.
Use the temporary credentials you received in the previous step to access the data in S3.

S3 integration overview

With this integration, users can avoid

hand-writing AWS IAM policies
managing AWS IAM role limits
manually tracking what user or role has access to what files in AWS S3 and verifying those are consistent with intent

S3 Access Grants components

To enforce controls on S3 data, Immuta interacts with several S3 Access Grants components:

Access Grants instance: An Access Grants instance is a logical container for individual grants that specify who can access what level of data in S3 in your AWS account and region. AWS supports one Access Grants instance per region per AWS account.
Location: A location specifies what data the Access Grants instance can grant access to. For example, registering a location with a scope of s3:// allows Access Grants to manage access to all S3 buckets in that AWS account and region, whereas setting the bucket s3://research-data as the scope limits Access Grants to managing access to that single bucket for that location. When you configure the S3 integration in Immuta, you specify a location's scope and IAM assumed role, and Immuta registers the location in your Access Grants instance and associates it with the provided IAM role for you. Each S3 integration you configure in Immuta is associated with one location, and Immuta manages all grants in that location. Therefore, grants cannot be manually created by users in an Access Grants instance location that Immuta has registered and manages. During data source registration, this location scope is prepended to the data source prefixes to build the final path used to grant or revoke access to that data in S3. For example, a location scope of s3://research-data would be prepended to the data source prefix /demographics to generate a final path of s3://research-data/demographics.
Individual grants: Individual permission grants in S3 Access Grants specify the identity that can access the data, the access level, and the location of the S3 data. Immuta creates a grant for each user subscribed to a prefix, bucket, or object by interacting with the Access Grants API. Each grant has its own ID and gives the user or role principle access to the data.
IAM assumed role: This is an IAM role you create in S3 that has full access to all prefixes, buckets, and objects in the Access Grants location registered by Immuta. This IAM role is used to vend temporary credentials to users or applications. When a grantee requests temporary credentials, the S3 Access Grants service assumes this role to vend credentials scoped to the prefix, bucket, or object specified in the grant to the grantee. The grantee then uses these credentials to access S3 data. When configuring the integration in Immuta, you specify this role, and then Immuta associates this role with the registered location in the Access Grants instance.
Temporary credentials: These just-in-time access credentials provide access to a prefix, bucket, or object with a permission level of READ or READWRITE in S3. When a user or application requests temporary credentials to access S3 data, the S3 Access Grants instance evaluates the request against the grants Immuta has created for that user. If a matching grant exists, S3 Access Grants assumes the IAM role associated with the location of the matching grant and scopes the permissions of the IAM session to the S3 prefix, bucket, or object specified by the grant and vends these temporary credentials to the requester. These credentials have a default timeout of 1 hour, but this duration can be changed by the requester.

The diagram below illustrates how these S3 Access Grants components interact.

For more details about these Access Grants concepts, see the S3 Access Grants documentation.

How does the integration work?

After an administrator creates an Access Grants instance and an assumed IAM role in their AWS account, an application administrator configures the Amazon S3 integration in Immuta. During configuration, the administrator provides the following connection information so that Immuta can create and register a location in that Access Grants instance:

AWS account ID and region
ARN for the existing Access Grants instance
ARN for the assumed IAM role

When Immuta registers this location, it associates the assumed IAM role with the location. This allows the IAM role to create temporary credentials with access scoped to a particular S3 prefix, bucket, or object in the location. The IAM role you create for this location must have all the object- and bucket-level permissions listed in the set up S3 Access Grants instance section on all buckets and objects in the location; if it is missing permissions, the IAM role will not be able to grant those missing permissions to users or applications requesting temporary credentials.

In the example below, an application administrator registers the following location prefix and IAM role for their Access Grants instance in AWS account 123456:

Location path: s3://. This path allows a single Amazon S3 integration to manage all objects in S3 in that AWS account and region. Data owners can scope down access further when registering specific S3 prefixes and applying policies.
Location IAM role: The arn:aws:iam::123456:role/access-grants-role IAM role will be used to vend temporary credentials to users and applications.

Immuta registers this location and associated IAM role in the user's Access Grants instance:

After the S3 integration is configured, a data owner can register S3 prefixes and buckets that are in the configured Access Grants location path to enforce access controls on resources. Immuta stores the connection information for the prefix so that the metadata can be used to create and enforce subscription policies on S3 data.

A data owner or governor can apply a subscription policy to a registered prefix, bucket, or object to control who can access objects beginning with that prefix or in that bucket after it is registered in Immuta. Once a subscription policy is created and Immuta users are subscribed to the prefix, bucket, or object, Immuta calls the Access Grants API to create a grant for each subscribed user, specifying the following parameters in the payload so that Access Grants can create and store a grant for each user:

Access Grants location
READ access
User or role principle
Registered prefix, bucket, or object

In the example below, a data owner registers the s3://research-data/* bucket, and Immuta stores the connection information in the Immuta metadata database. Once the user, Taylor, is subscribed to s3://research-data/*, Immuta calls the Access Grants API to create a grant for that user to allow them to read and write S3 data in that bucket:

Integration health status

Accessing S3 data

To access S3 data registered in Immuta, users must be subscribed to the prefix, bucket, or object in Immuta, and their principals must be mapped to their Immuta user accounts. Once users are subscribed, they request temporary credentials from S3 Access Grants. Access Grants looks up the grant ID associated with the requester. If no matching grant exists, they receive an access denied error. If one exists, Access Grants assumes the IAM role associated with the location and requests temporary credentials that are scoped to the prefix, bucket, or object and permissions specified by the individual grant. Access Grants vends the credentials to the requester, who uses those temporary credentials to access the data in S3.

In the example below, Taylor requests temporary credentials from S3 Access Grants. Access Grants looks up the grant ID (1) for that user, assumes the arn:aws:iam::123456:role/access-grants-role IAM role for the location, and vends temporary credentials to Taylor, who then uses the credentials to access the research-data bucket in S3:

Note that when accessing data through S3 Access Grants, the user or application interacts directly with the Access Grants API to request temporary credentials; Immuta does not act in this process at all. See the diagram below for an illustration of the process for accessing data through S3 Access Grants.

AWS services that support S3 Access Grants will request temporary credentials for users automatically. If users are not using a service that supports S3 Access Grants, they must have the permissions listed in the AWS documentation to call the Access Grants API directly themselves to request temporary credentials to access data through the access grant.

For a list of AWS services that support S3 Access Grants, see the AWS documentation.

Policy enforcement

Immuta's S3 integration allows data owners and governors to apply object-level access controls on data in S3 through subscription policies. When a user is subscribed to a registered prefix, bucket, or object, Immuta calls the Access Grants API to create an individual grant that narrows the scope of access within the location to that registered prefix, bucket, or object. See the diagram below for a visualization of this process.

When a user's entitlements change or a subscription policy is added to, updated, or deleted from a prefix, Immuta performs one of the following processes for each user subscribed to the registered prefix:

User added to the prefix: Immuta specifies a permission (READ or READWRITE) for each user and uses the Access Grants API to create an individual grant for each user.
User updated: Immuta deletes the current grant ID and creates a new one using the Access Grants API.
User deleted: Immuta deletes the grant ID using the Access Grants API.

Immuta offers two subscription policy access types to manage read and write access to data in S3:

Read access policies manage who can get objects from S3.
Write access policies manage who can modify data in S3.

Data policies, which provide more granular controls by redacting or masking values in a table, are not supported for S3.

Prefix registration

Data owners can register an S3 prefix at any level in the S3 path by creating an Immuta data source. During this process, Immuta stores the connection information for use in subscription policies.

Each prefix added in the data registration workflow is created as a single Immuta data source, and a subscription policy added to a data source applies to any objects in that bucket or beginning with that prefix:

Therefore, data owners should register prefixes or buckets at the lowest level of access control they need for that data. Using the example above, if the data owner needed to allow different users to access s3://yellow-bucket/research-data/* than those who should access s3://yellow-bucket/analyst-data/*, the data owner must register the research-data/* and analyst-data/* prefixes separately and then apply a subscription policy to those prefixes:

Deleting registered prefixes

When an S3 data source is deleted, Immuta deletes all the grants associated with that prefix, bucket, or object in that location.

User provisioning

Access can be managed in AWS using IAM users, roles, or Identity Center (IDC). Immuta supports all three methods for user provisioning in the S3 integration.

However, if you manage access in AWS through IAM roles instead of users, user provisioning in Immuta must be done using IAM role principals. This means that if users share IAM roles, you could end up in a situation where you over-provision access to everyone in the IAM role.

See the guidelines below for the best practices to avoid this behavior if you currently use IAM roles to manage access.

Enable AWS IAM Identity Center (IDC) (recommended): IDC is the best approach for user provisioning because it treats users as users, not users as roles. Consequently, access controls are enforced for the querying user, nothing more. This approach eliminates over-provisioning and permits granular access control. Furthermore, IDC uses trusted identity propagation, meaning AWS propagates a user's identity wherever that user may operate within the AWS ecosystem. As a result, a user's identity always remains known and consistent as they navigate across AWS services, which is a key requirement for organizations to properly govern that user. Enabling IDC does not impact any existing access controls; it is additive. Immuta will manage the GRANTs for you using IDC if it is enabled and configured in Immuta. See the protect data section for instructions on mapping users from AWS IDC to user accounts in Immuta.
Create an IAM role per user: If you do not have IDC enabled, create an IAM role per user that is unique to that user and assign that IAM role to each corresponding user in Immuta. Ensure that the IAM role cannot be shared with other users. This approach can be a challenge because there is an IAM role max limit of 5,000 per AWS account.
Request on behalf of IAM roles (not recommended): Create users in Immuta that map to each of your existing IAM roles. Then, when users request access to data, they request on behalf of the IAM role user rather than themselves. This approach is not recommended because everyone in that role will gain access to data when granted access through a policy, and adding future users to that role will also grant access. Furthermore, it requires policy authors and approvers to understand what role should have access to what data.

Mapping IAM principals in Immuta

Names are case-sensitive

The IAM role name and IAM user name are case-sensitive. See the AWS documentation for details.

Immuta supports mapping an Immuta user to AWS in one of the following ways:

AWS IAM Identity Center user IDs
IAM role principals: Only a single Immuta user can be mapped to an IAM role. This restriction prohibits enforcing policies on AWS users who could assume that role. Therefore, if using role principals, create a new user in Immuta that represents the role so that the role then has the permissions applied specifically to it.
IAM user principals

See the protect data section for instructions on mapping principals to user accounts in Immuta.

Existing S3 integrations

The Amazon S3 integration will not interfere with existing legacy S3 integrations, and multiple S3 integrations can exist in a single Immuta tenant.

Supported AWS services

For a list of AWS services that support S3 Access Grants, see the AWS documentation.

Limitations

During private preview, Immuta supports up to 500 prefixes (data sources) and up to 20 Immuta users that are mapped to S3 identities principals. This is a preview limitation that will be removed in a future phase of the integration.
S3 Access Grants allows 100,000 grants per region per account. Thus, if you have 5 Immuta users with access to 20,000 registered prefixes, you would reach this limit. See AWS documentation for details.
The following Immuta features are not currently supported by the integration in private preview:
- Audit
- Data policies
- Schema monitoring
- Tag ingestion

Databricks Unity Catalog Integration Reference Guide

Immuta’s integration with Unity Catalog allows you to enforce fine-grained access controls on Unity Catalog securable objects with Immuta policies. Instead of manually creating UDFs or granting access to each table in Databricks, you can author your policies in Immuta and have Immuta manage and orchestrate Unity Catalog access-control policies on your data in Databricks clusters or SQL warehouses:

Subscription policies: Immuta subscription policies automatically grant and revoke access to specific Databricks securable objects.
Data policies: Immuta data policies enforce row- and column-level security.

Unity Catalog object model

Unity Catalog uses the following hierarchy of data objects:

Metastore: Created at the account level and is attached to one or more Databricks workspaces. The metastore contains metadata of all the catalogs, schemas, and tables available to query. All clusters on that workspace use the configured metastore and all workspaces that are configured to use a single metastore share those objects.
Catalog: Sits on top of schemas (also called databases) and tables to manage permissions across a set of schemas
Schema: Organizes tables and views
Table-etc: Table (managed or external tables), view, volume, model, and function

For details about the Unity Catalog object model, see the Databricks Unity Catalog documentation.

Feature support

The Databricks Unity Catalog integration supports

managing and accessing data across multiple Databricks workspaces
enforcing Unity Catalog row-, column-, and table-level access controls on Databricks clusters and SQL warehouses:
- applying column masks and row filters on specific securable objects
- applying subscription policies on tables and views
enforcing Unity Catalog access controls, even if Immuta becomes disconnected
auditing activity of both Immuta users and non-Immuta users
allowing non-Immuta reads and writes
using Photon
using a proxy server

What does Immuta do in my Databricks environment?

Unity Catalog supports managing permissions account-wide in Databricks through controls applied directly to objects in the metastore. To establish a connection with Databricks and apply controls to securable objects within the metastore, Immuta requires with privileges to manage all data protected by Immuta. Databricks OAuth for service principals (OAuth M2M) or a personal access token (PAT) can be provided for Immuta to authenticate as the service principal. See the Databricks Unity Catalog privileges section for a list of specific Databricks privileges.

Immuta uses this service principal to run queries that set up user-defined functions (UDFs) and other data necessary for policy enforcement. Upon enabling the integration, Immuta will create a catalog that contains these schemas:

immuta_system: Contains internal Immuta data.
immuta_policies_n: Contains policy UDFs.

When policies require changes to be pushed to Unity Catalog, Immuta updates the internal tables in the immuta_system schema with the updated policy information. If necessary, new UDFs are pushed to replace any out-of-date policies in the immuta_policies_n schemas and any row filters or column masks are updated to point at the new policies. Many of these operations require compute on the configured Databricks cluster or SQL warehouse, so compute must be available for these policies to succeed.

Workspace-catalog binding

Workspace-catalog binding allows users to leverage Databricks’ catalog isolation mode to limit catalog access to specific Databricks workspaces. The default isolation mode is OPEN, meaning all workspaces can access the catalog (with the exception of the automatically-created workspace catalog), provided they are in the metastore attached to the catalog. Setting this mode to ISOLATED allows the catalog owner to specify a workspace-catalog binding, which means the owner can dictate which workspaces are authorized to access the catalog. This prevents other workspaces from accessing the specified catalogs. To bind a catalog to a specific workspace in Databricks Unity Catalog, see the Databricks documentation.

Use cases

Typical use cases for binding a catalog to specific workspaces include

Ensuring users can only access production data from a production workspace environment.
For example, you may have production data in a prod_catalog, as well as a production workspace you are introducing to your organization. Binding the prod_catalog to the prod_workspace ensures that workspace admins and users can only access prod_catalog from the prod_workspace environment.
Ensuring users can only process sensitive data from a specific workspace. Limiting the environments from which users can access sensitive data helps better secure your organization’s data. Limiting access to one workspace also simplifies any monitoring, auditing, and understanding of which users are accessing specific data. This would entail a similar setup as the example above.
Giving users read-only access to production data from a developer workspace.
This enables your organization to effectively conduct development and testing, while minimizing risk to production data. All user access to this catalog from this workspace can be specified as read-only, ensuring developers can access the data they need for testing without risk of any unwanted updates.

Additional workspace connections

Immuta’s Databricks Unity Catalog integration allows users to configure additional workspace connections to support using Databricks' workspace-catalog binding feature. Users can configure additional workspace connections in their Immuta integrations to be consistent with the workspace-catalog bindings that are set up in Databricks. Immuta will use each additional workspace connection to govern the catalog(s) that workspace is bound to in Databricks. If desired, each set of bound catalogs can also be configured to run on its own compute.

To use this feature, you should first set up a workspace-catalog binding in your Databricks account. Once that is configured, you can use Immuta's Integrations API to configure an additional workspace connection. This can be added when you initially set up the integration or by updating your existing integration configuration.

Limitations

Additional workspace connections in Databricks Unity Catalog are not currently supported in Immuta's connections.
Each additional workspace connection must be in the same metastore as the primary workspace used to set up the integration.
No two additional workspace connections can be responsible for the same catalog.

Databricks Unity Catalog privileges

The privileges the Databricks Unity Catalog integration requires align to the least privilege security principle. The table below describes each privilege required in Databricks Unity Catalog for the and the .

Databricks Unity Catalog privilege

User requiring the privilege

Explanation

Account admin

Setup user

This privilege allows the setup user to grant the Immuta service principal the necessary permissions to orchestrate Unity Catalog access controls and maintain state between Immuta and Databricks Unity Catalog.

CREATE CATALOG on the Unity Catalog metastore

Setup user

This privilege allows the setup user to create an Immuta-owned catalog and tables.

Metastore admin

Setup user

This privilege is required only if enabling query audit, which requires granting access to system tables to the Immuta service principal. To grant access, a user that is both a metastore admin and an account admin must grant USE and SELECT permissions on the system schemas to the service principal. See for more details.

USE CATALOG and MANAGE on all catalogs containing securables registered as Immuta data sources
USE SCHEMA on all schemas containing securables registered as Immuta data sources

Immuta service principal

These privileges allow the service principal to apply row filters and column masks on the securable.

MODIFY and SELECT on all securables registered as Immuta data sources

Immuta service principal

These privileges allow the service principal to apply row filters and column masks on the securable. Additionally, they are required for to run on the securable.

OWNER on the Immuta catalog

Immuta service principal

The Immuta service principal must own the catalog Immuta creates during setup that stores the Immuta policy information. The Immuta setup script grants ownership of this catalog to the Immuta service principal when you configure the integration.

USE CATALOG on the system catalog

USE SCHEMA on the system.access and system.query schemas

SELECT on the following system tables:
- system.access.table_lineage
- system.access.column_lineage
- system.access.audit
- system.query.history

Immuta service principal

These privileges allow Immuta to audit user queries in Databricks Unity Catalog.

USE CATALOG on the system catalog

USE SCHEMA on the system.access schema

SELECT on the following system table:
- system.access.audit

Immuta service principal

These privileges allow Immuta to ingest and apply Databricks Unity Catalog to Immuta data sources.

Policy enforcement

Immuta’s Unity Catalog integration applies Databricks table-, row-, and column-level security controls that are enforced natively within Databricks. Immuta's management of these Databricks security controls is automated and ensures that they synchronize with Immuta policy or user entitlement changes.

Table-level security: Immuta manages REVOKE and GRANT privileges on Databricks securable objects that have been registered as Immuta data sources. When you register a data source in Immuta, Immuta uses the Unity Catalog API to issue GRANTS or REVOKES against the catalog, schema, or table in Databricks .
Row-level security: Immuta applies SQL UDFs to restrict access to rows for querying users.
Column-level security: Immuta applies column-mask SQL UDFs to tables for querying users. These column-mask UDFs run for any column that requires masking.

User permissions Immuta revokes

On securable objects

If you enable a Databricks Unity Catalog object in Immuta and it has no subscription policy set on it, Immuta will REVOKE access to that object in Databricks for all Immuta users, even if they had been directly granted access to that object outside of Immuta.

If you disable a Unity Catalog data source in Immuta, all existing grants and policies on that object will be removed in Databricks for all Immuta users. All existing grants and policies will be removed, regardless of whether they were set in Immuta or in Unity Catalog directly.

If a user is not registered in Immuta, Immuta will have no effect on that user's access to data in Unity Catalog.

On schemas and catalogs

By default, Immuta will revoke Immuta users' USE CATALOG and USE SCHEMA privileges in Unity Catalog for users that do not have access to any of the resources within that catalog/schema. This includes any USE CATALOG or USE SCHEMA privileges that were granted outside of Immuta.

If you disable this setting, Immuta will only revoke the permissions granted on the securable objects themselves, and users' USE CATALOG and USE SCHEMA permissions will remain even if the user does not have access to any resource in that catalog/schema.

See the App settings page for instructions on changing this setting.

Supported policies

The Unity Catalog integration supports the following policy types:

Subscription policies
Select masking policies
- Conditional masking
- Constant
- Custom masking
- Hashing
- Null (including on ARRAY, MAP, and STRUCT type columns)
- Regex: You must use the global regex flag (g) when creating a regex masking policy in this integration. You cannot use the case insensitive regex flag (i) when creating a regex masking policy in this integration. See the limitations section for examples.
- Rounding (date and numeric rounding)
Row-level policies
- Matching (only show rows where)
  - Custom WHERE
  - Never
  - Where user
  - Where value in column
- Minimization
- Time-based restrictions

Project-scoped purpose exceptions for Databricks Unity Catalog

Project-scoped purpose exceptions for Databricks Unity Catalog integrations allow you to apply purpose-based policies to Databricks data sources in a project. As a result, users can only access that data when they are working within that specific project.

Databricks Unity Catalog views

If you are using views in Databricks Unity Catalog, one of the following must be true for project-scoped purpose exceptions to apply to the views in Databricks:

The view and underlying table are registered as Immuta data sources and added to a project: If a view and its underlying table are both added as Immuta data sources, both of these assets must be added to the project for the project-scoped purpose exception to apply. If a view and underlying table are both added as data sources but the table is not added to an Immuta project, the purpose exception will not apply to the view because Databricks does not support fine-grained access controls on views.
Only the underlying table is registered as an Immuta data source and added to a project: If only the underlying table is registered as an Immuta data source but the view is not registered, the purpose exception will apply to both the table and corresponding view in Databricks. Views are the only Databricks object that will have Immuta policies applied to them even if they're not registered as Immuta data sources (as long as their underlying tables are registered).

Masked joins for Databricks Unity Catalog

This feature allows masked columns to be joined across data sources that belong to the same project. When data sources do not belong to a project, Immuta uses a unique salt per data source for hashing to prevent masked values from being joined. (See the Why use masked joins? guide for an explanation of that behavior.) However, once you add Databricks Unity Catalog data sources to a project and enable masked joins, Immuta uses a consistent salt across all the data sources in that project to allow the join.

For more information about masked joins and enabling them for your project, see the Masked joins section of documentation.

Policy exemption group

The Databricks group configured as the policy exemption group in Immuta will be exempt from Immuta data policy enforcement. This account-level group is created and managed in Databricks, not in Immuta.

If you have service or system accounts that need to be exempt from masking and row-level policy enforcement, add them to an account-level group in Databricks and include this group name in the Databricks Unity Catalog configuration in Immuta. Then, group members will be excluded from having data policies applied to them when they query Immuta-protected tables in Databricks.

Typically, service or system accounts that perform the following actions are added to an exemption group in Databricks:

Automated queries
ETL
Report generation

If you have multiple groups that must be exempt from data policies, add each group to a single group in Databricks that you then set as the policy exemption group in Immuta.

The service principal used to register data sources in Immuta will be automatically added to the exemption group for the Databricks securables it registers. Consequently, accounts added to the exemption group and used to register data sources in Immuta should be limited to service accounts.

For guidance on configuring a policy exemption group on the Immuta app settings page, see the Configure a Databricks Unity Catalog integration guide. Alternatively, this group can be configured via the integrations API or the connections API using the groupPattern object.

Policy support with `hive_metastore`

When enabling Unity Catalog support in Immuta, the catalog for all Databricks data sources will be updated to point at the default hive_metastore catalog. Internally, Databricks exposes this catalog as a proxy to the workspace-level Hive metastore that schemas and tables were kept in before Unity Catalog. Since this catalog is not a real Unity Catalog catalog, it does not support any Unity Catalog policies. Therefore, Immuta will ignore any data sources in the hive_metastore in any Databricks Unity Catalog integration, and policies will not be applied to tables there.

However, with Databricks metastore magic you can use hive_metastore and enforce subscription and data policies with the Databricks Spark integration.

Authentication methods

The Databricks Unity Catalog integration supports the following authentication methods to configure the integration and create data sources:

Personal access token (PAT): This is the access token for the Immuta service principal. This service principal must have the metastore privileges listed in the permissions section for the metastore associated with the Databricks workspace. If this token is configured to expire, update this field regularly for the integration to continue to function.
OAuth machine-to-machine (M2M): Immuta uses the Client Credentials Flow to integrate with Databricks OAuth machine-to-machine authentication, which allows Immuta to authenticate with Databricks using a client secret. Once Databricks verifies the Immuta service principal’s identity using the client secret, Immuta is granted a temporary OAuth token to perform token-based authentication in subsequent requests. When that token expires (after one hour), Immuta requests a new temporary token. See the Databricks OAuth machine-to-machine (M2M) authentication page for more details.

Integration health status

Immuta data sources in Unity Catalog

The Unity Catalog data object model introduces a 3-tiered namespace, as outlined above. Consequently, your Databricks tables registered as data sources in Immuta will reference the catalog, schema (also called a database), and table.

Supported object types

When applying read and write access subscription policies to data sources, the privileges granted by Immuta vary depending on the object type. See an outline of privileges granted by Immuta on the Subscription policy access types page.

Object type

Subscription policy support

Data policy support

Marketplace support

Table

✅

View

✅

❌

✅

Materialized view

✅

Streaming table

✅

External table

✅

Foreign table

✅

(Public preview)

✅

❌

✅

(Public preview)

✅

❌

✅

(Public preview)

✅

❌

✅

Delta Shares

✅

❌

✅

External data connectors and query-federated tables

External data connectors and query-federated tables are preview features in Databricks. See the Databricks documentation for details about the support and limitations of these features before registering them as data sources in the Unity Catalog integration.

Query audit

Access requirements

For Databricks Unity Catalog audit to work, Immuta must have, at minimum, the following access:

USE CATALOG on the system catalog
USE SCHEMA on the system.access and system.query schemas
SELECT on the following system tables:
- system.access.table_lineage
- system.access.column_lineage
- system.access.audit
- system.query.history

Immuta uses Databricks tables from the system catalog to understand the queries users make and present them in the query audit logs. See the Databricks Unity Catalog audit page for details about the contents of the logs.

The audit ingest is set when configuring the integration and can be scoped to only ingest specific workspaces if needed. The default ingest frequency is every hour, but this can be configured to a different frequency on the Immuta app settings page. Additionally, audit ingestion can be manually requested at any time from the Immuta audit page. When manually requested, it will only search for new queries that were created since the last query that had been audited. The job is run in the background, so the new queries will not be immediately available.

Tag ingestion

Private preview: This feature is only available to select accounts. Contact your Immuta representative to enable this feature.

You can enable tag ingestion to allow Immuta to ingest Databricks Unity Catalog table and column tags so that you can use them in Immuta policies to enforce access controls. When you enable this feature, Immuta uses the credentials and connection information from the Databricks Unity Catalog integration to pull tags from Databricks and apply them to data sources as they are registered in Immuta. If Databricks data sources preexist the Databricks Unity Catalog tag ingestion enablement, those data sources will automatically sync to the catalog and tags will apply.

Immuta checks for changes to tags in Databricks and syncs Immuta data sources to those changes every hour by default. Immuta's tag ingestion process has a delta logic in order to establish all resources that have had a tag or description change inside Databricks Unity Catalog within a given timeframe to reduce excessive processing time and reduce compute cost.

Access requirements for Databricks Unity Catalog tag ingestion (delta logic)

Since the delta logic leverages the system.access.audit table in Databricks, Immuta must have, at minimum, the following access:

USE CATALOG on the system catalog
USE SCHEMA on the system.access schema
SELECT on the following system table:
- system.access.audit

Note that without these permissions, Immuta will not be able to process any tag changes post the initial onboarding of data sources.

Once external tags are applied to Databricks data sources, those tags can be used to create subscription and data policies.

To enable Databricks Unity Catalog tag ingestion, see the Configure a Databricks Unity Catalog integration page.

Syncing tag changes

After making changes to tags in Databricks, you can manually sync the catalog so that the changes immediately apply to the data sources in Immuta. Otherwise, tag changes will automatically sync within a 24 hour timeframe. Please note that you may see this timeframe being exceeded in cases where Immuta has to process a lot of tag changes.

When syncing data sources to Databricks Unity Catalog tags, Immuta pulls the following information:

Table tags: These tags apply to the table and appear on the data source details tab. Databricks tags' key and value pairs are reflected in Immuta as a hierarchy with each level separated by a . delimiter. For example, the Databricks Unity Catalog tag Location: US would be represented as Location.US in Immuta.
Column tags: These tags are applied to data source columns and appear on the columns listed in the data dictionary tab. Databricks tags' key and value pairs are reflected in Immuta as a hierarchy with each level separated by a . delimiter. For example, the Databricks Unity Catalog tag Location: US would be represented as Location.US in Immuta.
Table comments field: This content appears as the data source description on the data source details tab.
Column comments field: This content appears as dictionary column descriptions on the data dictionary tab.

Limitations

Only tags that apply to Databricks data sources in Immuta are available to build policies in Immuta. Immuta will not pull tags in from Databricks Unity Catalog unless those tags apply to registered data sources.
Cost implications: Tag ingestion in Databricks Unity Catalog requires compute resources. Therefore, having many Databricks data sources or frequently manually syncing data sources to Databricks Unity Catalog may incur additional costs.
Databricks Unity Catalog tag ingestion only supports tenants with fewer than 10,000 data sources registered.

Configuration requirements

See the Enable Unity Catalog guide for a list of requirements.

Unity Catalog caveats

Row access policies with more than 1023 columns are unsupported. This is an underlying limitation of UDFs in Databricks. Immuta will only create row access policies with the minimum number of referenced columns. This limit will therefore apply to the number of columns referenced in the policy and not the total number in the table.
If you disable table grants, Immuta revokes the grants. Therefore, if users had access to a table before enabling Immuta, they’ll lose access.
If multiple Immuta tenants are connected to your Databricks environment, you must create a separate Immuta catalog for each of those tenants during configuration. Having multiple Immuta tenants use the same Immuta catalog causes failures in policy enforcement.
You must use the global regex flag (g) when creating a regex masking policy in this integration, and you cannot use the case insensitive regex flag (i) when creating a regex masking policy in this integration. See the examples below for guidance:
- regex with a global flag (supported): /^ssn|social ?security$/g
- regex without a global flag (unsupported): /^ssn|social ?security$/
- regex with a case insensitive flag (unsupported): /^ssn|social ?security$/gi
- regex without a case insensitive flag (supported): /^ssn|social ?security$/g

Azure Databricks Unity Catalog limitation

If a registered data source is owned by a Databricks group at the table level, then the Unity Catalog integration cannot apply data masking policies to that table in Unity Catalog.

Therefore, set all table-level ownership on your Unity Catalog data sources to an individual user or service principal instead of a Databricks group. Catalogs and schemas can still be owned by a Databricks group, as ownership at that level doesn't interfere with the integration.

Feature limitations

The following features are currently unsupported:

Immuta project workspaces
Multiple IAMs on a single cluster
Row filters and column masking policies on the following object types:
- Functions
- Models
- Views
- Volumes
Mixing masking policies on the same column
R and Scala cluster support
Scratch paths
User impersonation
Policy enforcement on raw Spark reads
Python UDFs for advanced masking functions
Direct file-to-SQL reads
Data policies (except for masking with NULL) on ARRAY, MAP, or STRUCT type columns
Shallow clones

Configure the Databricks Unity Catalog integration.