Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Immuta offers two integrations for Amazon Redshift:
Amazon Redshift integration: In this integration, Immuta uses connections to configure the integration and register data objects in a single step. Once data is registered, Immuta can enforce access controls on that data.
Amazon Redshift Spectrum integration: In this integration, Immuta generates policy-enforced views in your configured Redshift schema for tables registered as Immuta data sources. The integration is configured separately from data source registration.
Public preview: This integration is available to all accounts that request to enable it for their tenant. Contact your Immuta representative to enable it.
The Amazon Redshift integration allows you to configure your integration and register data from Amazon Redshift in Immuta in a single step. Once data is registered, Immuta can enforce access controls on that data.
This getting started guide outlines how to connect Amazon Redshift to Immuta.
: This guide describes the design and components of the integration.
: This guide provides an overview of the Immuta features that provide security for your users and that allow you to prove compliance and monitor for anomalies.
: This guide provides an overview of how to protect securables with Immuta policies.
One platform to optimize how you access and control data.
Public preview: This integration is available to all accounts that request to enable it for their tenant. Contact your Immuta representative to enable it.
The how-to guides linked on this page illustrate how to use Amazon Redshift with Immuta. See the for information about the Amazon Redshift integration.
Immuta integrates with your data platforms so you can register your data and effectively manage access controls on that data.
This section includes guidance for connecting your data platform and keeping it synced with Immuta.
This reference guide outlines the features, policies, and audit capabilities of each data platform Immuta supports.
Accessing data: This guide provides an overview of how Amazon Redshift users access data registered in Immuta.
These guides provide instructions on getting your data set up in Immuta for the Request and Governance apps.
Register your Amazon Redshift connection: Using a single setup process, connect Amazon Redshift to Immuta. This will register your data objects in Immuta and allow you to start dictating access through access requests or global policies.
Organize your data sources into domains and assign domain permissions to accountable teams: Use domains to segment your data and assign responsibilities to the appropriate team members. These domains will then be used to manage permissions for publishing data products, authoring policies, and managing audit.
Register your users
These guides provide instructions on getting your users set up in Immuta for the Request and Governance apps.
Connect an IAM: Bring the IAM your organization already uses and allow Immuta to register your users for you.
: Ensure the user IDs in Immuta and your data platform are aligned so that the right policies impact the right users. This step can be completed during initial configuration of your IAM or after it has been connected to Immuta.
Start using the Request app
These guides provide instructions on using the Request app for the first time.
Set up a request form for your assets: Once you register your data, assets will appear in the Request app where you can attach request forms.
: After you set up request forms, you can put access request links in your catalog for your data consumers to click. This will take them to the Request app to fill out the request form to request access to data.
: To grant access to data, data stewards will respond to the access request.
Add data metadata
This guide provides instructions for getting your data metadata set up in Immuta for the Governance app.
Connect an external catalog: Bring the external catalog your organization already uses and allow Immuta to continually sync your tags with your data sources for you.
Start using the Governance app
These guides provide instructions on using the Governance app for the first time.
Author a global subscription policy: Once you add your data metadata to Immuta, you can immediately create policies that use your tags and apply to your data sources. Subscription policies can be created to dictate access to data sources.
: Data metadata can also be used to create data policies that apply to data sources as they are registered in Immuta. Data policies dictate what data a user can see once they are granted access to a data source. By using catalog tags, you can create proactive policies that will apply to data sources as they are added to Immuta.
: Once you have your data sources and users, and policies granting them access, you can set up audit export. This will export the audit logs from policy changes and tagging updates.
The guides in these sections include information about how to connect your data platform to Immuta:
: This section includes guides for , , and integrations.
This reference guide outlines the actions and features that trigger Immuta queries in your remote platform that may incur cost.
Immuta integrates with your data platforms so you can register your data and effectively manage access controls on that data. This section includes concept, reference, and how-to guides for registering and managing data sources.


Once data is registered through the Amazon Redshift connection, you will access your data through Amazon Redshift as you normally would. If you are subscribed to the data source, Immuta grants you access to the data in Amazon Redshift.
When you submit a query, the SQL client submits the query to Amazon Redshift, which then processes the query and determines what data your role is allowed to see. Then, Amazon Redshift queries the database and returns the query results to the SQL client, which then returns policy-enforced data to you.
The diagram below illustrates how Immuta, Amazon Redshift, and the SQL client interact when a user queries data registered in Immuta.
Because subscription policies are managed through roles, you must be acting under the role Immuta creates for you (immuta_<username>) to get access to the data sources you are subscribed to.
Once data is registered through the AWS Lake Formation connection, you will access your data in one of these AWS analytic engines as you normally would:
Amazon Athena
Amazon EMR Spark
Amazon Redshift Spectrum
If you are subscribed to the data source, Immuta either directly grants you access to the resource through Lake Formation or generates and assigns a Lake Formation tag to that resource to grant you access. See the for details about how policies are enforced.
When you submit a query, the analytic engine requests metadata from Glue Data Catalog, which then queries Lake Formation to determine what data you are allowed to see. Then, the analytic engine requests temporary access from Lake Formation, retrieves the data from S3, and filters the data to return policy-enforced data to you.
The diagram below illustrates how the analytic engine interacts with Glue Data Catalog and Lake Formation to access data.
In this integration, Immuta generates policy-enforced views in a schema in your configured Azure Synapse Analytics Dedicated SQL pool for tables registered as Immuta data sources.
This guide outlines how to integrate Azure Synapse Analytics with Immuta.
: Configure the integration in Immuta.
: This guide describes the design and components of the integration.
This integration allows you to manage and access data in your Databricks account across all of your workspaces. With Immuta’s Databricks Unity Catalog integration, you can write your policies in Immuta and have them enforced automatically by Databricks across data in your Unity Catalog metastore.
This getting started guide outlines how to integrate Databricks Unity Catalog with Immuta.
: Migrate from the legacy Databricks Spark integrations to the Databricks Unity Catalog integration.
: This guide describes the design and components of the integration.
Once data is registered through the Databricks Lakebase connection, you will access your data as you normally would. If you are subscribed to the data source, Immuta grants you access to the data through a PostgreSQL role.
When you submit a query (through PostgreSQL or through the Databricks Lakebase instance), the PostgreSQL client submits the SQL query to the PostgreSQL server, which then processes the query and determines what data your role is allowed to see. Then, the PostgreSQL server queries the database and returns the query results to the PostgreSQL client, which then returns policy-enforced data to you.
The diagram below illustrates how Immuta, the PostgreSQL server, and PostgreSQL client interact to access data.
In the Amazon Redshift integration, Immuta administers Amazon Redshift privileges on data registered in Immuta. Then, Immuta users who have been granted access to the data sources can query them.
The sequence diagram below outlines the events that occur when an Immuta user who is subscribed to a data source queries it in Amazon Redshift.
The Amazon Redshift integration is configured and data is registered through , an Immuta feature that allows administrators to register data objects in a technology through a single connection to make data registration more scalable for your organization.
Public preview: This feature is available to all accounts.
In the Lake Formation integration, Immuta orchestrates Lake Formation access controls on data registered in the Glue Data Catalog. Then, Immuta users who have been granted access to the Glue Data Catalog table or view can query it using one of these analytic engines:
In the Databricks Lakebase integration, Immuta administers PostgreSQL privileges on data registered in Immuta. Then, Immuta users who have been granted access to the tables can query them with policies enforced.
The sequence diagram below outlines the events that occur when an Immuta user who is subscribed to a data source queries it in PostgreSQL.
Databricks Lakebase is configured and data is registered through , an Immuta feature that allows administrators to register data objects in a technology through a single connection to make data registration more scalable for your organization.
The Databricks Spark integration is one of two integrations Immuta offers for Databricks.
In this integration, Immuta installs an Immuta-maintained Spark plugin on your Databricks cluster. When a user queries data that has been registered in Immuta as a data source, the plugin injects policy logic into the plan Spark builds so that the results returned to the user only include data that specific user should see.
The reference guides in this section are written for Databricks administrators who are responsible for setting up the integration, securing Databricks clusters, and setting up users:
In the context of the Databricks Spark integration, Immuta uses the term ephemeral to describe data sources where the associated compute resources can vary over time. This means that the compute bound to these data sources is not fixed and can change. All Databricks data sources in Immuta are ephemeral.
Ephemeral overrides are specific to each data source and user. They effectively bind cluster compute resources to a data source for a given user. Immuta uses these overrides to determine which cluster compute to use when connecting to Databricks for various maintenance operations.
The operations that use the ephemeral overrides include
Visibility checks on the data source for a particular user. These checks assess how to apply row-level policies for specific users.
Immuta policies will not be automatically enforced in MariaDB
While you can author and apply subscription and data policies on MariaDB data sources within Immuta, these policies will not be enforced natively in the MariaDB platform. You can use to be notified about changes to user access and make appropriate access updates in MariaDB using your own process.
To use this integration, contact your Immuta representative.
The MariaDB integration allows you to register data from MariaDB in Immuta.
Immuta policies will not be automatically enforced in Oracle
While you can author and apply subscription and data policies on Oracle data sources within Immuta, these policies will not be enforced natively in the Oracle platform. You can use to be notified about changes to user access and make appropriate access updates in Oracle using your own process.
To use this integration, contact your Immuta representative.
The Oracle integration allows you to register data from Oracle in Immuta.
Click the App Settings icon in the navigation menu and scroll to the Global Integration Settings section.
Click the Enable Snowflake Low Row Access Policy Mode checkbox to enable the feature.
Confirm to allow Immuta to automatically disable impersonation for the Snowflake integration. If you do not confirm, you will not be able to enable Snowflake low row access policy mode.
Immuta policies will not be automatically enforced in MySQL
While you can author and apply subscription and data policies on MySQL data sources within Immuta, these policies will not be enforced natively in the MySQL platform. You can use to be notified about changes to user access and make appropriate access updates in MySQL using your own process.
To use this integration, contact your Immuta representative.
The MySQL integration uses connections to register data from MySQL in Immuta.
Once data is registered through the PostgreSQL connection, you will access your data through your PostgreSQL client as you normally would. If you are subscribed to the data source, Immuta grants you access to the data in PostgreSQL.
When you submit a query, the PostgreSQL client submits the SQL query to the PostgreSQL server, which then processes the query and determines what data your role is allowed to see. Then, the PostgreSQL server queries the database and returns the query results to the PostgreSQL client, which then returns policy-enforced data to you.
The diagram below illustrates how Immuta, the PostgreSQL server, and PostgreSQL client interact to access data.
The Lake Formation integration supports the following authentication methods to register a connection:
Access using AWS IAM role (recommended): Immuta will assume this role when interacting with the AWS API. This option allows you to provide Immuta with an IAM role from your AWS account that is granted a trust relationship with Immuta's IAM role. Immuta will assume this IAM role from Immuta's AWS account in order to perform any operations in your AWS account.
Access using access key and secret access key: These credentials are used temporarily by Immuta to register the connection.
The built-in Immuta IAM can be used as a complete solution for authentication and user entitlement. However, you can connect your existing identity management provider to Immuta to use that system for authentication and user entitlement instead.
Each of the supported identity providers includes a specific set of configuration options that enable Immuta to communicate with the IAM system and map the users, permissions, groups, and attributes into Immuta.
See the Identity managers guide for a list of supported providers and details.
See the AWS Lake Formation reference guide for details about user provisioning and mapping AWS user accounts to Immuta.
Immuta provides governance reports so that data owners and governors can monitor users' access to data and detect anomalies in behavior.
Immuta governance reports allow users with the GOVERNANCE Immuta permission to use a natural language builder to instantly create reports that delineate user activity across Immuta. These reports can be based on various entity types, including users, groups, projects, data sources, purposes, policy types, or connection types.
See the Governance report types page for a list of report types and guidance.
The Databricks Lakebase connection supports OAuth machine-to-machine (M2M) authentication to register a connection.
The Databricks Lakebase connection authenticates as a Databricks identity and generates an OAuth token. Immuta then uses that token as a password when connecting to PostgreSQL. To enable secure, automated machine-to machine access to the database instance, the connection must obtain an OAuth token using a Databricks service principal. See the Databricks OAuth machine-to-machine (M2M) authentication page for more details.
The built-in Immuta IAM can be used as a complete solution for authentication and user entitlement. However, you can connect your existing identity management provider to Immuta to use that system for authentication and user entitlement instead.
Each of the supported identity providers includes a specific set of configuration options that enable Immuta to communicate with the IAM system and map the users, permissions, groups, and attributes into Immuta.
See the Identity managers guide for a list of supported providers and details.
See the Databricks Lakebase integration reference guide for details about user user provisioning and mapping user accounts to Immuta.
Immuta provides governance reports so that data owners and governors can monitor users' access to data and detect anomalies in behavior.
Immuta governance reports allow users with the GOVERNANCE Immuta permission to use a natural language builder to instantly create reports that delineate user activity across Immuta. These reports can be based on various entity types, including users, groups, projects, data sources, purposes, policy types, or connection types.
See the Governance report types page for a list of report types and guidance.
Customizing the integration: Consult this guide for information about customizing the Databricks Spark integration settings.
Setting up users: Consult this guide for information about connecting data users and setting up user impersonation.
Spark environment variables: This guide provides a list of Spark environment variables used to configure the integration.
Ephemeral overrides: This guide describes ephemeral overrides and how to configure them to reduce the risk that a user has overrides set to a cluster (or multiple clusters) that aren't currently up.
Stats collection triggered by a specific user.
Validating a custom WHERE clause policy against a data source. When owners or governors create custom WHERE clause policies, Immuta uses compute resources to validate the SQL in the policy. In this case, the ephemeral overrides for the user writing the policy are used to contact a cluster for SQL validation.
High cardinality column detection. Certain advanced policy types (e.g., minimization) in Immuta require a high cardinality column, and that column is computed on data source creation. It can be recomputed on demand and, if so, will use the ephemeral overrides for the user requesting computation.
An ephemeral override request can be triggered when a user queries the securable corresponding to a data source in a Databricks cluster with the Spark plug-in configured. The actual triggering of this request depends on the configuration settings.
Ephemeral overrides can also be set for a data source in the Immuta UI by navigating to a data source page, clicking on the data source actions button, and selecting Ephemeral overrides from the dropdown menu.
Ephemeral override requests made from a cluster for data sources and users where ephemeral overrides were set in the UI will not be successful.
If ephemeral overrides are never set (either through the user interface or the cluster configuration), the system will continue to use the connection details directly associated with the data source, which are set during data source registration.
Ephemeral overrides can be problematic in environments that have a dedicated cluster to handle maintenance activities, since ephemeral overrides can cause these operations to execute on a different cluster than the dedicated one.
To reduce the risk that a user has overrides set to a cluster (or multiple clusters) that aren't currently up, complete one of the following actions:
Direct all clusters' HTTP paths for overrides to a cluster dedicated for metadata queries using the IMMUTA_EPHEMERAL_HOST_OVERRIDE_HTTPPATH Spark environment variable.
Disable ephemeral overrides completely by setting the IMMTUA_EPHEMERAL_HOST_OVERRIDE Spark environment variable to false.
Ephemeral overrides best practices
Disable ephemeral overrides for clusters when using multiple workspaces and dedicate a single cluster to serve queries from Immuta in a single workspace.
If you use multiple E2 workspaces without disabling ephemeral overrides, avoid applying the where user row-level policy to data sources.
Click Save.
If you already have a Snowflake integration configured, you don't need to reconfigure your integration. Your Snowflake policies automatically refresh when you enable Snowflake low row access policy mode.
Configure your Snowflake integration. Note that you will not be able to enable project workspaces or user impersonation with Snowflake low row access policy mode enabled.
Click Save and Confirm your changes.
Request
Manage user requests for data access, publish data products, and manage assets.
Configuration
Connect your data, metadata, and users.
Governance
Unify data access control across multiple data platforms.
Developer guides
Interact with Immuta through the Immuta CLI and API.
Amazon Redshift Spectrum integration overview: This guide describes the design of the integration and policy enforcement.
MariaDB integration reference guide: This guide describes the design and components of the integration.
Oracle integration reference guide: This guide describes the design and components of the integration.
MySQL integration reference guide: This guide describes the design and components of the integration.




APPLICATION_ADMIN Immuta permission
The Amazon Redshift user registering the connection must be a superuser or have the following Amazon Redshift privileges:
CREATEDB
CREATE USER
sys:secadmin role
USAGE on all databases and schemas that contain data you want to register
The following privileges WITH GRANT OPTION on objects registered in Immuta:
DELETE
INSERT
For descriptions and explanations of privileges Immuta needs to enforce policies and maintain state in Amazon Redshift, see the .
Enable Amazon Redshift masking on data objects Immuta will protect using the ALTER TABLE command with the MASKING ON clause.
See the Amazon Redshift documentation for details.
Create a new database user in Redshift to serve as the Immuta system account. Immuta will use this system account continuously to crawl the connection.
Grant this account the following Redshift privileges:
USAGE on all databases and schemas that contain data you want to register
CREATE ROLE
sys:secadmin role
The following privileges WITH GRANT OPTION on objects registered in Immuta:
DELETE
INSERT
Create a new role in Amazon Redshift called immuta_exemption.
Grant any users who should be exempt from Immuta data policies to this role.
In your Amazon Redshift environment, create an Immuta database that Immuta can use to connect to your Amazon Redshift instance to register the connection and maintain state with Amazon Redshift.
Having this separate database for Immuta prevents custom ETL processes or jobs deleting the database you use to register the connection, which would break the connection.
In Immuta, click Data and select Connections in the navigation menu.
Click the + Add Connection button.
Select the Amazon Redshift tile.
Enter the host connection information:
Display Name: This is the name of your new connection. This name will be used in the API (connectionKey), in data source names from the host, and on the connections page. Avoid the use of periods (.) or
Enter the username and password of the .
Click Save connection.
Requirement: USER_ADMIN Immuta permission
Map Amazon Redshift usernames to each Immuta user account to ensure Immuta properly enforces policies.
The instructions below illustrate how to do this for individual users, but you can also configure user mapping in your IAM connection on the app settings page.
Click People and select Users in the navigation menu.
Click the user's name to navigate to their page and scroll to the External User Mapping section.
Click Edit in the Redshift User row.
Enter the user's Redshift username.
Click Save.
Once the Amazon Redshift connection is registered, you can author subscription and data policies in Immuta to enforce access controls.
See the Amazon Redshift integration reference guide for more details about registering a connection.
After data is registered in Immuta, you can author subscription and data policies in Immuta to enforce access controls.
When a subscription policy is applied to a data source, users who meet the conditions of the policy will be automatically subscribed to the data source. Immuta creates roles for those users (if an Immuta-generated role for them does not already exist) and grants Amazon Redshift privileges to that role. Once a data policy is applied to a data source, Immuta generates a masking or row-level policy in Amazon Redshift and attaches that policy to the data object it applies to.
Consider the following example that illustrates how Immuta enforces a subscription policy that only allows users in the analysts group to access the yellow-table. When this policy is authored and applied to the data source, Immuta issues a SQL statement in Amazon Redshift that grants the SELECT privilege on yellow-table to users registered in Immuta that are part of the analysts group.
In the image above, the user in the analysts group accesses yellow-table, while the user who is a part of the research group is denied access.
See the Subscription policies page or the Data policies page for guidance on applying policies to a data source. See the Amazon Redshift integration page for details about the supported policy types.

Amazon Athena
Amazon EMR Spark
Amazon Redshift Spectrum
This getting started guide outlines how to integrate AWS Lake Formation with Immuta.
AWS Lake Formation: This guide describes the design and components of the integration.
Security and compliance: This guide provides an overview of the Immuta features that provide security for your users and that allow you to prove compliance and monitor for anomalies.
Protecting data: This guide provides an overview of how to protect AWS securables with Immuta policies.
: This guide provides an overview of how AWS users access data registered in Immuta.
This getting started guide outlines how to connect Databricks Lakebase to Immuta.
Databricks Lakebase integration: This guide describes the design and components of the integration.
Security and compliance: This guide provides an overview of the Immuta features that provide security for your users and that allow you to prove compliance and monitor for anomalies.
Protecting data: This guide provides an overview of how to protect securables with Immuta policies.
: This guide provides an overview of how Databricks Lakebase users access data registered in Immuta.
Once the Databricks Lakebase connection is registered, you can author subscription policies in Immuta to enforce access controls.
See the Databricks Lakebase connection reference guide for more details about registering a connection.
After tables are registered in Immuta, you can author subscription policies in Immuta to enforce access controls.
When a policy is applied to a data source, users who meet the conditions of the policy will be automatically subscribed to the data source. Then, Immuta issues a SQL statement in PostgreSQL that grants the SELECT privilege to users on those tables.
Consider the following example that illustrates how Immuta enforces a subscription policy that only allows users in the analysts group to access to yellow-table. When this policy is authored and applied to the data source, Immuta issues a SQL statement in PostgreSQL that grants the SELECT privilege on yellow-table to users (registered in Immuta) that are part of the analysts group.
In the image above, the user in the analysts group accesses yellow-table, while the user who is a part of the research group is denied access.
See the Author a subscription policy page for guidance on applying a subscription policy to a data source. See the Subscription policy access types page details about the subscription policy types supported and PostgreSQL privileges Immuta grants on tables registered as Immuta data sources.

Public preview: This feature is available to all accounts.
The how-to guides linked on this page illustrate how to use AWS Lake Formation with Immuta. See the reference guide for information about the AWS Lake Formation integration.
Connect your technology
These guides provide instructions on getting your data set up in Immuta for the Request and Governance apps.
: Using a single setup process, connect AWS Lake Formation to Immuta. This will register your data objects in Immuta and allow you to start dictating access through access requests or global policies.
: Use domains to segment your data and assign responsibilities to the appropriate team members. These domains will then be used to manage permissions to publish data policies, author policies, view audit, and manage identification.
Register your users
These guides provide instructions on getting your users set up in Immuta for the Request and Governance apps.
: Bring the IAM your organization already uses and allow Immuta to register your users for you.
Start using the Request app
These guides provide instructions on using the for the first time.
: Once you register your data, assets will appear in the Request app where you can attach request forms.
Add data metadata
These guides provide instructions on getting your data metadata set up in Immuta for the Governance app.
: Bring the external catalog your organization already uses and allow Immuta to continually sync your tags with your data sources for you.
Start using the Governance app
These guides provide instructions on using the Governance app for the first time.
: Once you add your data metadata to Immuta, you can immediately create policies that utilize your tags and apply to your tables. Subscription policies can be created to dictate access to data sources.
In the AWS Lake Formation integration, Immuta orchestrates Lake Formation access controls on data registered in the Glue Data Catalog. Then, Immuta users who have been granted access to the Glue Data Catalog table or view can query it using one of these analytic engines:
Amazon Athena
Amazon EMR Spark
Amazon Redshift Spectrum
The sequence diagram below outlines the events that occur when an Immuta user who is subscribed to a data source submits a query in their AWS analytic engine.
See the for more details about Lake Formation access controls.
AWS Lake Formation is configured and data is registered through , an Immuta feature that allows administrators to register data objects in a technology through a single connection to make data registration more scalable for your organization.
Once the Lake Formation connection is registered, you can author policies in Immuta to orchestrate Lake Formation access controls.
See the for more details about registering a connection.
After Glue Data Catalog views and tables are registered in Immuta, you can author subscription policies in Immuta to orchestrate Lake Formation access controls. Once a subscription policy is applied, users can be subscribed to data sources in the following ways:
Manually subscribed: If a data owner , Immuta issues a grant directly to the data object in AWS.
Automatically subscribed through policy logic: When a policy is applied to a data source, users who meet the conditions of the policy will be automatically subscribed to the data source. Then, Immuta generates a Lake Formation tag and applies it to the corresponding data object in AWS and grants subscribers access to that tag, which in turn grants them access to the data. See the for details about this process.
Consider the following example that illustrates how Immuta enforces a subscription policy that only allows users in the analysts group to access the yellow-table. When this policy is authored and applied to the data source, Immuta generates a Lake Formation (LF) tag that is applied to the Glue Data Catalog yellow-table and permissions on that tag are granted to all AWS users (registered in Immuta) that are part of the analysts group.
In the image above, the user in the analysts group accesses yellow-table, while the user who is a part of the research group is denied access.
See the for guidance on applying a subscription policy to a data source. See the page for details about the subscription policy types supported and permissions Immuta grants on securables registered as Immuta data sources.
This page describes the Azure Synapse Analytics integration, through which Immuta applies policies directly in Azure Synapse Analytics. For a tutorial on configuring Azure Synapse Analytics see the Azure Synapse Integration page.
The Azure Synapse Analytics is a policy push integration that allows Immuta to apply policies directly in Azure Synapse Analytics Dedicated SQL pools without the need for users to go through a proxy. Instead, users can work within their existing Synapse Studio and have per-user policies dynamically applied at query time.
This integration works on a per-Dedicated-SQL-pool basis: all of Immuta's policy definitions and user entitlements data need to be in the same pool as the target data sources because Dedicated SQL pools do not support cross-database joins. Immuta creates schemas inside the configured Dedicated SQL pool that contain policy-enforced views that users query.
When the integration is configured, the Application Admin specifies the
Immuta Database: This is the pre-existing database Immuta uses. Immuta will create views from the tables contained in this database, and all schemas and views created by Immuta will exist in this database, such as the schemas immuta_system, immuta_functions, and the immuta_procedures that contain the tables, views, UDFs, and stored procedures that support the integration.
Immuta Schema: The schema that Immuta manages. All views generated by Immuta for tables registered as data sources will be created in this schema.
For a tutorial on configuring the integration see the .
Synapse data sources are represented as views and are under one schema instead of a database, so their view names are a combination of their schema and table name, separated by an underscore.
For example, with a configuration that uses IMMUTA as the schema in the database dedicated_pool, the view name for the data source dedicated_pool.tpc.case would be dedicated_pool.IMMUTA.tpc_case.
You can see the view information on the data source details page under Connection Information.
This integration uses webhooks to keep views up-to-date with the corresponding Immuta data sources. When a data source or policy is created, updated, or disabled, a webhook is called that creates, modifies, or deletes the dynamic view in the Immuta schema. Note that only standard views are available because Azure Synapse Analytics Dedicated SQL pools do not support secure views.
The definitions for each status and the state of configured data platform integrations is available in the .
An Immuta Application Administrator , registering their initial Synapse Dedicated SQL pool with Immuta.
Immuta creates Immuta schemas inside the configured Synapse Dedicated SQL pool.
A Data Owner in Immuta as data sources. A Data Owner, Data Governor, or Administrator or in Immuta.
When you enable Unity Catalog, Immuta automatically migrates your existing Databricks data sources in Immuta to reference the legacy hive_metastore catalog to account for Unity Catalog's three-level hierarchy. New data sources will reference the Unity Catalog metastore you create and attach to your Databricks workspace.
Because the hive_metastore catalog is not managed by Unity Catalog, existing data sources in the hive_metastore cannot have Unity Catalog access controls applied to them. Data sources in the Hive Metastore must be managed by the Databricks Spark integration.
To allow Immuta to administer Unity Catalog access controls on that data, move the data to Unity Catalog and re-register those tables in Immuta by completing the steps below. If you don't move all data before configuring the integration, metastore magic will protect your existing data sources throughout the migration process.
Ensure that all Databricks clusters that have Immuta installed are stopped and the Immuta configuration is removed from the cluster. Immuta-specific cluster configuration is no longer needed with the Databricks Unity Catalog integration.
Move all data into Unity Catalog before configuring Immuta with Unity Catalog. Existing data sources will need to be re-created after they are moved to Unity Catalog and the Unity Catalog integration is configured.
.
This integration enforces policies on Databricks securables registered in the legacy Hive metastore. Once these securables are registered as Immuta data sources, users can query policy-enforced data on Databricks clusters.
The guides in this section outline how to integrate Databricks Spark with Immuta.
This getting started guide outlines how to integrate Databricks with Immuta.
: Configure the Databricks Spark integration.
: Manually update your cluster to reflect changes in the Immuta init script or cluster policies.
: Register a Databricks library with Immuta as a trusted library to avoid Immuta security manager errors when using third-party libraries.
: This guide describes the design and components of the integration.
: This guide provides an overview of the Immuta features that provide security for your users and Databricks clusters and that allow you to prove compliance and monitor for anomalies.
: This guide provides an overview of registering Databricks securables and protecting them with Immuta policies.
If a Databricks cluster needs to be manually updated to reflect changes in the Immuta init script or cluster policies, you can remove and set up your integration again to get the updated policies and init script.
Log in to Immuta as an Application Admin.
Click the App Settings icon in the navigation menu and scroll to the Integration Settings section.
Your existing Databricks Spark integration should be listed here; expand it and note the configuration values. Now select Remove to remove your integration.
Click Add Integration and select Databricks Integration to add a new integration.
Enter your Databricks Spark integration settings again as configured previously.
Click Add Integration to add the integration, and then select Configure Cluster Policies to set up the updated cluster policies and init script.
Select the cluster policies you wish to use for your Immuta-enabled Databricks clusters.
Automatically push cluster policies and the init script (recommended) or manually update your cluster policies.
Automatically push cluster policies
Select Automatically Push Cluster Policies and enter your privileged Databricks access token. This token must have privileges to write to cluster policies.
Restart any Databricks clusters using these updated policies for the changes to take effect.
In the Databricks Clusters UI, install your third-party library .jar or Maven artifact with Library Source Upload, DBFS, DBFS/S3, or Maven. Alternatively, use the Databricks libraries API.
In the Databricks Clusters UI, add the IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URIS property as a Spark environment variable and set it to your artifact's URI. To specify more than one trusted library, comma delimit the URIs:
For Maven artifacts, the URI is maven:/<maven_coordinates>, where <maven_coordinates> is the Coordinates field found when clicking on the installed artifact on the Libraries tab in the Databricks Clusters UI. Here's an example of an installed artifact:
In this example, you would add the following Spark environment variable:
For jar artifacts, the URI is the Source field found when clicking on the installed artifact on the Libraries tab in the Databricks Clusters UI. For artifacts installed from DBFS or S3, this ends up being the original URI to your artifact. For uploaded artifacts, Databricks will rename your .jar and put it in a directory in DBFS. Here's an example of an installed artifact:
Once you've finished making your changes, restart the cluster.
Once the cluster is up, execute a command in a notebook. If the trusted library installation is successful, you should see driver log messages like this:
This page outlines the configuration for setting up project UDFs, which allow users to set their current project in Immuta through Spark. For details about the specific functions available and how to use them, see the Use Project UDFs (Databricks) page.
Use project UDFs in Databricks Spark
Currently, caches are not all invalidated outside of Databricks because Immuta caches information pertaining to a user's current project. Consequently, this feature should only be used in Databricks.
Lower the web service cache timeout in Immuta:
Click the App Settings icon and scroll to the HDFS Cache Settings section.
Lower the Cache TTL of HDFS user names (ms) to 0.
Raise the cache timeout on your Databricks cluster: In the Spark environment variables section, set the IMMUTA_CURRENT_PROJECT_CACHE_TIMEOUT_SECONDS and IMMUTA_PROJECT_CACHE_TIMEOUT_SECONDS to high values (like 10000).
Note: These caches will be invalidated on cluster when a user calls immuta.set_current_project, so they can effectively be cached permanently on cluster to avoid periodically reaching out to the web service.
This page provides guidelines for troubleshooting issues with the Databricks Spark integration and resolving Py4J security and Databricks trusted library errors.
For easier debugging of the Databricks Spark integration, follow the recommendations below.
Enable cluster init script logging:
In the cluster page in Databricks for the target cluster, navigate to Advanced Options -> Logging.
Change the Destination from NONE to DBFS and change the path to the desired output location. Note: The unique cluster ID will be added onto the end of the provided path.
View the Spark UI on your target Databricks cluster: On the cluster page, click the Spark UI tab, which shows the Spark application UI for the cluster. If you encounter issues creating Databricks data sources in Immuta, you can also view the JDBC/ODBC Server portion of the Spark UI to see the result of queries that have been sent from Immuta to Databricks.
The validation and debugging notebook is designed to be used by or under the guidance of an Immuta support professional. Reach out to your Immuta representative for assistance.
Import the notebook into a Databricks workspace by navigating to Home in your Databricks instance.
Click the arrow next to your name and select Import.
Once you have executed commands in the notebook and populated it with debugging information, export the notebook and its contents by opening the File menu, selecting Export, and then selecting DBC Archive.
Error Message: py4j.security.Py4JSecurityException: Constructor <> is not allowlisted
Explanation: This error indicates you are being blocked by Py4J security rather than the Immuta Security Manager. Py4J security is strict and generally ends up blocking many ML libraries.
Solution: Turn off Py4J security on the offending cluster by setting IMMUTA_SPARK_DATABRICKS_PY4J_STRICT_ENABLED=false
Check the driver logs for details. Some possible causes of failure include
One of the Immuta-configured trusted library URIs does not point to a Databricks library. Check that you have configured the correct URI for the Databricks library.
For trusted Maven artifacts, the URI must follow this format: maven:/group.id:artifact-id:version.
Databricks failed to install a library. Any Databricks library installation errors will appear in the Databricks UI under the Libraries tab.
Immuta is compatible with Snowflake Secure Data Sharing. Using both Immuta and Snowflake, organizations can share the policy-protected data of their Snowflake database with other Snowflake accounts with Immuta policies enforced in real time.
Prerequisites:
Required Permission: Immuta: GOVERNANCE
to fit your organization's compliance requirements.
It's important to understand that subscription policies are not relevant to Snowflake data shares, because the act of sharing the data is the subscription policy. Data policies can be enforced on the consuming account from the producer account on a share following these instructions.
Required Permission: Immuta: USER_ADMIN
To register the Snowflake data consumer in Immuta,
.
to match the account ID for the data consumer. This value is the output on the data consumer side when SELECT CURRENT_ACCOUNT() is run in Snowflake.
for your organization's policies.
Required Permission: Snowflake ACCOUNTADMIN
To share the policy-protected data source,
of the Snowflake table that has been registered in Immuta.
Grant reference usage on the Immuta database to the share you created:
Replace the content in angle brackets above with the name of your Immuta database and Snowflake data share.
To migrate from the private preview version of table grants (available before September 2022) to the GA version, complete the steps below.
Navigate to the App Settings page.
Scroll to the Global Integrations Settings section.
Uncheck the Snowflake Table Grants checkbox to disable the feature.
Click Save. Wait for about 1 minute per 1000 users. This gives time for Immuta to drop all the previously created user roles.
Use the to re-enable the feature.
Public preview: This integration is available to all accounts that request to enable it for their tenant. Contact your Immuta representative to enable it.
In the PostgreSQL integration, Immuta registers data from PostgreSQL and enforces subscription policies on that data.
This getting started guide outlines how to integrate PostgreSQL with Immuta.
: This guide describes the design and components of the integration.
: This guide provides an overview of the Immuta features that provide security for your users and that allow you to prove compliance and monitor for anomalies.
: This guide provides an overview of how to protect securables with Immuta policies.
The how-to guides linked on this page illustrate how to integrate Databricks Unity Catalog with Immuta. See the for information about the Databricks Unity Catalog integration.
Requirements:
Unity Catalog and attached to a Databricks workspace. Immuta supports configuring a single metastore for each configured integration, and that metastore may be attached to multiple Databricks workspaces.
Unity Catalog enabled on your Databricks cluster or SQL warehouse. All SQL warehouses have Unity Catalog enabled if your workspace is attached to a Unity Catalog metastore.
Public preview: This integration is available to all accounts that request to enable it for their tenant. Contact your Immuta representative to enable it.
The how-to guides linked on this page illustrate how to use Databricks Lakebase with Immuta. See the reference guide for information about the Databricks Lakebase integration.
The how-to guides linked on this page illustrate how to integrate Databricks Spark with Immuta.
Requirements
If Databricks Unity Catalog is enabled in a Databricks workspace, you must use an when you set up the Databricks Spark integration to create an Immuta-enabled cluster.
If Databricks Unity Catalog is not enabled in your Databricks workspace, you must disable Unity Catalog in your Immuta tenant before proceeding with your configuration of Databricks Spark:
This page outlines how to enable access to DBFS in Databricks for non-sensitive data. Databricks administrators should place the desired configuration in the Spark environment variables.
This Databricks feature mounts DBFS to the local cluster filesystem at /dbfs. Although disabled when using process isolation, this feature can safely be enabled if raw, unfiltered data is not stored in DBFS and all users on the cluster are authorized to see each other’s files. When enabled, the entirety of DBFS essentially becomes a scratch path where users can read and write files in /dfbs/path/to/my/file as though they were local files.
When using Delta Lake, the API does not go through the normal Spark execution path. This means that Immuta's Spark extensions do not provide protection for the API. To solve this issue and ensure that Immuta has control over what a user can access, the Delta Lake API is blocked.
Spark SQL can be used instead to give the same functionality with all of Immuta's data protections.
Below is a table of the Delta Lake API with the Spark SQL that may be used instead.
In the PostgreSQL integration, Immuta administers PostgreSQL privileges on data registered in Immuta. Then, Immuta users who have been granted access to the tables can query them with policies enforced.
The sequence diagram below outlines the events that occur when an Immuta user who is subscribed to a data source queries it in PostgreSQL.
PostgreSQL is configured and data is registered through , an Immuta feature that allows administrators to register data objects in a technology through a single connection to make data registration more scalable for your organization.
Select Apply Policies to push the cluster policies and init script again.
Click Save and Confirm to deploy your changes.
Manually update cluster policies
Download the init script and the new cluster policies to your local computer.
Click Save and Confirm to save your changes in Immuta.
Log in to your Databricks workspace with your administrator account to set up cluster policies.
Get the path you will upload the init script (immuta_cluster_init_script_proxy.sh) to by opening one of the cluster policy .json files and looking for the defaultValue of the field init_scripts.0.dbfs.destination. This should be a DBFS path in the form of dbfs:/immuta-plugin/hostname/immuta_cluster_init_script_proxy.sh.
Click Data in the left pane to upload your init script to DBFS to the path you found above.
To find your existing cluster policies you need to update, click Compute in the left pane and select the Cluster policies tab.
Edit each of these cluster policies that were configured before and overwrite the contents of the JSON with the new cluster policy JSON you downloaded.
The Amazon Redshift connection supports username and password authentication to register a connection. The credentials provided must be for an account with the permissions listed in the Register an Amazon Redshift connection guide.
The built-in Immuta IAM can be used as a complete solution for authentication and user entitlement. However, you can connect your existing identity management provider to Immuta to use that system for authentication and user entitlement instead.
Each of the supported identity providers includes a specific set of configuration options that enable Immuta to communicate with the IAM system and map the users, permissions, groups, and attributes into Immuta.
See the Identity managers guide for a list of supported providers and details.
See the Amazon Redshift integration reference guide for details about mapping user accounts to Immuta.
Immuta provides governance reports so that data owners and governors can monitor users' access to data and detect anomalies in behavior.
Immuta governance reports allow users with the GOVERNANCE Immuta permission to use a natural language builder to instantly create reports that delineate user activity across Immuta. These reports can be based on various entity types, including users, groups, projects, data sources, purposes, policy types, or connection types.
See the Governance report types page for a list of report types and guidance.
SELECT
TRUNCATE
UPDATE
SELECT
TRUNCATE
UPDATE
Hostname: URL of your Amazon Redshift instance.
Port: Port configured for Amazon Redshift.
Database: The Redshift database you created for Immuta. All databases in the host will be registered.
Run R and Scala spark-submit jobs on Databricks: Run R and Scala spark-submit jobs on your Databricks cluster.
DBFS access: Access DBFS in Databricks for non-sensitive data.
Troubleshooting: Resolve errors in the Databricks Spark configuration.
Accessing data: This guide provides an overview of how Databricks users access data registered in Immuta.
Accessing data: This guide provides an overview of how PostgreSQL users access data registered in Immuta.


User Profile Delimiters: Since Azure Synapse Analytics dedicated SQL pools do not support array or hash objects, certain user access information is stored as delimited strings; the Application Admin can modify those delimiters to ensure they do not conflict with possible characters in strings.
The Immuta Web Service calls a stored procedure that modifies the user entitlements or policies and updates data source view definitions as necessary.
A Synapse user who is subscribed to the data source in Immuta queries the corresponding data source view in Synapse and sees policy-enforced data.



Respond to an access request: To grant access to data, data stewards will respond to the access request.
DBFS FUSE mount limitation: This feature cannot be used in environments with E2 Private Link enabled.
For example,
In Python,
Note: This solution also works in R and Scala.
To enable the DBFS FUSE mount, set this configuration in the Spark environment variables: IMMUTA_SPARK_DATABRICKS_DBFS_MOUNT_ENABLED=true.
Mounting a bucket
Users can mount additional buckets to DBFS that can also be accessed using the FUSE mount.
Mounting a bucket is a one-time action, and the mount will be available to all clusters in the workspace from that point on.
Mounting must be performed from a non-Immuta cluster.
Scratch paths will work when performing arbitrary remote filesystem operations with fs magic or Scala dbutils.fs functions. For example,
To support %fs magic and Scala DBUtils with scratch paths, configure
To use dbutils in Python, set this configuration: immuta.spark.databricks.py4j.strict.enabled=false.
This section illustrates the workflow for getting a file from a remote scratch path, editing it locally with Python, and writing it back to a remote scratch path.
Get the file from remote storage:
Make a copy if you want to explicitly edit localScratchFile, as it will be read-only and owned by root:
Write the new file back to remote storage:
GRANT REFERENCE_USAGE ON DATABASE "<Immuta database of the provider account>" TO SHARE "<DATA_SHARE>";dbutils.fs.cp(s3ScratchFile, "file://{}".format(localScratchFile))shutil.copy(localScratchFile, localScratchFileCopy)
with open(localScratchFileCopy, "a") as f:
f.write("Some appended file content")dbutils.fs.cp("file://{}".format(localScratchFileCopy), s3ScratchFile)%sh echo "I'm creating a new file in DBFS" > /dbfs/my/newfile.txt%python
with open("/dbfs/my/newfile.txt", "w") as f:
f.write("I'm creating a new file in DBFS")%fs put -f s3://my-bucket/my/scratch/path/mynewfile.txt "I'm creating a new file in S3"
%scala dbutils.fs.put("s3://my-bucket/my/scratch/path/mynewfile.txt", "I'm creating a new file in S3") <property>
<name>immuta.spark.databricks.scratch.paths</name>
<value>s3://my-bucket/my/scratch/path</value>
</property>%python
import os
import shutil
s3ScratchFile = "s3://some-bucket/path/to/scratch/file"
localScratchDir = os.environ.get("IMMUTA_LOCAL_SCRATCH_DIR")
localScratchFile = "{}/myfile.txt".format(localScratchDir)
localScratchFileCopy = "{}/myfile_copy.txt".format(localScratchDir)In this example, you would add the following Spark environment variable:

IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URIS=maven:/com.github.immuta.hadoop.immuta-spark-third-party-maven-lib-test:2020-11-17-144644Connect your technology
These guides provide instructions on getting your data set up in Immuta for the Request and Governance apps.
Register your Databricks Unity Catalog connection: Using a single setup process, connect Databricks Unity Catalog to Immuta. This will register your data objects into Immuta and allow you to start dictating access through access requests or global policies.
: Use domains to segment your data and assign responsibilities to the appropriate team members. These domains will then be used to manage permissions for publishing data products, authoring policies, viewing audit and managing identification.
Connections are available on all tenants created after February 26, 2025. If you do not have connections enabled on your tenant, and using the legacy workflow.
Register your users
These guides provide instructions on getting your users set up in Immuta for the Request and Governance apps.
: Bring the IAM your organization already uses and allow Immuta to register your users for you.
Start using the Request app
These guides provide instructions on using the for the first time.
: Once you register your data, assets will appear in the Request app where you can attach request forms.
Add data metadata
These guides provide instructions on getting your data metadata set up in Immuta for the Governance app.
: Bring the external catalog your organization already uses and allow Immuta to continually sync your tags with your data sources for you.
Start using the Governance app
These guides provide instructions on using the Governance app for the first time.
: Once you add your data metadata to Immuta, you can immediately create policies that utilize your tags and apply to your tables. Subscription policies can be created to dictate access to data sources.
These guides provide instructions on getting your data set up in Immuta for the Request and Governance apps.
Register your Databricks Lakebase connection: Using a single setup process, connect Databricks Lakebase to Immuta. This will register your data objects in Immuta and allow you to start dictating access through access requests or global policies.
Organize your data sources into domains and assign domain permissions to accountable teams: Use domains to segment your data and assign responsibilities to the appropriate team members. These domains will then be used to manage permissions for publishing data products, authoring policies, viewing audit and managing identification.
Register your users
These guides provide instructions on getting your users set up in Immuta for the Request and Governance apps.
Connect an IAM: Bring the IAM your organization already uses and allow Immuta to register your users for you.
: Ensure the user IDs in Immuta and your data platform are aligned so that the right policies impact the right users. This step can be completed during initial configuration of your IAM or after it has been connected to Immuta.
Start using the Request app
These guides provide instructions on using the Request app for the first time.
Set up a request form for your assets: Once you register your data, assets will appear in the Request app where you can attach request forms.
: After you set up request forms, you can put access request links in your catalog for your data consumers to click. This will take them to the Request app to fill out the request form to request access to data.
: To grant access to data, data stewards will respond to the access request.
Add data metadata
These guides provide instructions on getting your data metadata set up in Immuta for the Governance app.
Connect an external catalog: Bring the external catalog your organization already uses and allow Immuta to continually sync your tags with your data sources for you.
: Identification allows you to automate data tagging using identifiers that detect certain data patterns.
Start using the Governance app
These guides provide instructions on using the Governance app for the first time.
Author a global subscription policy: Once you add your data metadata to Immuta, you can immediately create policies that utilize your tags and apply to your tables. Subscription policies can be created to dictate access to data sources.
: Once you have your data sources and users, and policies granting them access, you can set up audit export. This will export the audit logs from policy changes and tagging updates.
Navigate to the App Settings page and click Integration Settings.
Uncheck the Enable Unity Catalog checkbox.
Click Save.
Connect your technology
These guides provide instructions for getting your data set up in Immuta.
: Use domains to segment your data and assign responsibilities to the appropriate team members. These domains will then be used in policies, audit, and identification.
Register your users
These guides provide instructions on setting up your users in Immuta.
: Connect the IAM your organization already uses and allow Immuta to register your users for you.
: Ensure the user IDs in Immuta, Databricks, and your IAM are aligned so that the right policies impact the right users.
Add data metadata
These guides provide instructions on getting your data metadata set up in Immuta for use in policies.
: Connect the external catalog your organization already uses and allow Immuta to continually sync your tags with your data sources for you.
Protect and monitor data access
These guides provide instructions on authoring policies and auditing data access.
: Once you add your data metadata to Immuta, you can immediately create policies that utilize your tags and apply to your tables. Subscription policies can be created to dictate access to data sources.
DeltaTable.convertToDelta
CONVERT TO DELTA parquet./path/to/parquet/
DeltaTable.delete
DELETE FROM [table_identifier delta./path/to/delta/] WHERE condition
DeltaTable.generate
GENERATE symlink_format_manifest FOR TABLE [table_identifier delta./path/to/delta]
DeltaTable.history
DESCRIBE HISTORY [table_identifier delta./path/to/delta] (LIMIT x)
DeltaTable.merge
MERGE INTO
DeltaTable.update
UPDATE [table_identifier delta./path/to/delta/] SET column = valueWHERE (condition)
DeltaTable.vacuum
VACUUM [table_identifier delta./path/to/delta]
See here for a complete list of the Delta SQL Commands.
When a table is created in a project workspace, you can merge a different Immuta data source from that workspace into that table you created.
Create a temporary view of the Immuta data source you want to merge into that table.
Use that temporary view as the data source you add to the project workspace.
Run the following command:
Once the PostgreSQL connection is registered, you can author subscription policies in Immuta to enforce access controls.
See the PostgreSQL integration reference guide for more details about registering a connection.
After tables are registered in Immuta, you can author subscription policies in Immuta to enforce access controls.
When a policy is applied to a data source, users who meet the conditions of the policy will be automatically subscribed to the data source. Then, Immuta issues a SQL statement in PostgreSQL that grants the SELECT privilege to users on those tables.
Consider the following example that illustrates how Immuta enforces a subscription policy that only allows users in the analysts group to access the yellow-table. When this policy is authored and applied to the data source, Immuta issues a SQL statement in PostgreSQL that grants the SELECT privilege on yellow-table to users (registered in Immuta) that are part of the analysts group.
In the image above, the user in the analysts group accesses yellow-table, while the user who is a part of the research group is denied access.
See the Author a subscription policy page for guidance on applying a subscription policy to a data source. See the Subscription policy access types page for details about the subscription policy types supported and PostgreSQL privileges Immuta grants on tables registered as Immuta data sources.

The PostgreSQL integration supports the following authentication methods to register a connection:
Amazon Aurora and Amazon RDS deployments
Access using AWS IAM role (recommended): Immuta will assume this IAM role from Immuta's AWS account when interacting with the AWS API to perform any operations in your AWS account. This option allows you to provide Immuta with an IAM role from your AWS account that is granted a trust relationship with Immuta's IAM role.
Access using access key and secret access key: These credentials are used temporarily by Immuta to register the connection. The access key ID and secret access key provided must be for an AWS account with the permissions listed in the .
Neon and PostgreSQL deployments
Username and password: These credentials are used temporarily by Immuta to register the connection. The credentials provided must be for an account with the permissions listed in the .
The built-in Immuta IAM can be used as a complete solution for authentication and user entitlement. However, you can connect your existing identity management provider to Immuta to use that system for authentication and user entitlement instead.
Each of the supported identity providers includes a specific set of configuration options that enable Immuta to communicate with the IAM system and map the users, permissions, groups, and attributes into Immuta.
See the Identity managers guide for a list of supported providers and details.
See the PostgreSQL integration reference guide for details about user provisioning and mapping user accounts to Immuta.
Immuta provides governance reports so that data owners and governors can monitor users' access to data and detect anomalies in behavior.
Immuta governance reports allow users with the GOVERNANCE Immuta permission to use a natural language builder to instantly create reports that delineate user activity across Immuta. These reports can be based on various entity types, including users, groups, projects, data sources, purposes, policy types, or connection types.
See the Governance report types page for a list of report types and guidance.
The how-to guides linked on this page illustrate how to integrate Amazon Redshift Spectrum with Immuta. See the reference guide for information about the Amazon Redshift Spectrum integration.
Connect your technology
These guides provide instructions on getting your data set up in Immuta for the Request and Governance apps.
: Configure an Amazon Redshift Spectrum integration with Immuta so that Immuta can create policy-protected views for your users to query.
: This will register your data objects into Immuta and allow you to start dictating access through access requests or global policies.
: Use domains to segment your data and assign responsibilities to the appropriate team members. These domains will then be used to manage permissions for publishing data products, authoring policies, and managing audit.
Register your users
These guides provide instructions on getting your users set up in Immuta for the Request and Governance apps.
: Bring the IAM your organization already uses and allow Immuta to register your users for you.
Start using the Request app
These guides provide instructions on using the for the first time.
: Once you register your data and users, you can immediately start publishing data products.
Add data metadata
These guides provide instructions on getting your data metadata set up in Immuta for the Governance app.
: Bring the external catalog your organization already uses and allow Immuta to continually sync your tags with your data sources for you.
Start using the Governance app
These guides provide instructions on using the Governance app for the first time.
: Once you add your data metadata to Immuta, you can immediately create policies that utilize your tags and apply to your tables. Subscription policies can be created to dictate access to data sources.
The how-to guides linked on this page illustrate how to integrate Azure Synapse Analytics with Immuta. See the reference guide for information about the Azure Synapse Analytics integration.
Requirement: A running Dedicated SQL pool
Connect your technology
These guides provide instructions on getting your data set up in Immuta for the Request and Governance apps.
: Configure an Azure Synapse Analytics integration with Immuta so that Immuta can create policy protected views for your users to query.
: This will register your data objects into Immuta and allow you to start dictating access through access requests or global policies.
: Use domains to segment your data and assign responsibilities to the appropriate team members. These domains will then be used to manage permissions for publishing data products, authoring policies, and managing audit.
Register your users
These guides provide instructions on getting your users set up in Immuta for the Request and Governance apps.
: Bring the IAM your organization already uses and allow Immuta to register your users for you.
Start using the Request app
These guides provide instructions on using the Request app for the first time.
: Once you register your data and users, you can immediately start publishing data products.
Add data metadata
These guides provide instructions on getting your data metadata set up in Immuta for the Governance app.
: Bring the external catalog your organization already uses and allow Immuta to continually sync your tags with your data sources for you.
Start using the Governance app
These guides provide instructions on using the Governance app for the first time.
: Once you add your data metadata to Immuta, you can immediately create policies that utilize your tags and apply to your tables. Subscription policies can be created to dictate access to data sources.
Immuta offers three integrations for Databricks:
Databricks Unity Catalog integration: This integration supports working with database objects registered in Unity Catalog.
Databricks Lakebase integration: This connection supports working with Lakebase Postgres database objects within Databricks Lakebase.
: This integration supports working with .
To determine which integration you should use, consider which metastore you use:
Legacy Hive metastore: Databricks recommends that you migrate all data from the legacy Hive metastore to Unity Catalog. However, when this migration is not possible, use the to protect securables registered in the Hive metastore.
Unity Catalog: To protect securables registered in the Unity Catalog metastore, use the .
: To register and protect fully managed PostgreSQL-compatible data objects, use the Databricks Lakebase integration.
Databricks metastore magic allows you to migrate your data from the Databricks legacy Hive metastore to the Unity Catalog metastore while protecting data and maintaining your current processes in a single Immuta instance.
Databricks metastore magic is for organizations who intend to use the , but must still protect tables in the Hive metastore until they can migrate all of their data to Unity Catalog.
Unity Catalog support is enabled in Immuta.
Databricks has two built-in metastores that contain metadata about your tables, views, and storage credentials:
Legacy Hive metastore: Created at the workspace level. This metastore contains metadata of the registered securables in that workspace available to query.
Unity Catalog metastore: Created at the account level and is attached to one or more Databricks workspaces. This metastore contains metadata of the registered securables available to query. All clusters on that workspace use the configured metastore and all workspaces that are configured to use a single metastore share those securables.
Databricks allows you to use the legacy Hive metastore and the Unity Catalog metastore simultaneously. However, Unity Catalog does not support controls on the Hive metastore, so you must attach a Unity Catalog metastore to your workspace and move existing databases and tables to the attached Unity Catalog metastore to use the governance capabilities of Unity Catalog.
Immuta's Databricks Spark integration and Unity Catalog integration enforce access controls on the Hive and Unity Catalog metastores, respectively. However, because these metastores have two distinct security models, users were discouraged from using both in a single Immuta instance before metastore magic; the Databricks Spark integration and Unity Catalog integration were unaware of each other, so using both concurrently caused undefined behavior.
Metastore magic reconciles the distinct security models of the legacy Hive metastore and the Unity Catalog metastore, allowing you to use multiple metastores (specifically, the Hive metastore or alongside Unity Catalog metastores) within a Databricks workspace and single Immuta instance and keep policies enforced on all your tables as you migrate them. The diagram below shows Immuta enforcing policies on registered tables across workspaces.
In clusters A and D, Immuta enforces policies on data sources in each workspace's Hive metastore and in the Unity Catalog metastore shared by those workspaces. In clusters B, C, and E (which don't have Unity Catalog enabled in Databricks), Immuta enforces policies on data sources in the Hive metastores for each workspace.
With metastore magic, the Databricks Spark integration enforces policies only on data in the Hive metastore, while the Unity Catalog integration enforces policies on tables in the Unity Catalog metastore.
To enforce plugin-based policies on Hive metastore tables and Unity Catalog native controls on Unity Catalog metastore tables, enable the Databricks Spark integration and the Databricks Unity Catalog integration. Note that some Immuta policies are not supported in the Databricks Unity Catalog integration. See the for details.
Databricks SQL cannot run the Databricks Spark plugin to protect tables, so Hive metastore data sources will not be policy enforced in Databricks SQL.
To enforce policies on data sources in Databricks SQL, use to manually lock down Hive metastore data sources and the Databricks Unity Catalog integration to protect tables in the Unity Catalog metastore. Table access control is enabled by default on SQL warehouses, and any Databricks cluster without the Immuta plugin must have table access control enabled.
This page provides a tutorial for enabling the Azure Synapse Analytics integration on the Immuta app settings page. To configure this integration via the Immuta API, see the Configure an Azure Synapse Analytics integration API guide.
For an overview of the integration, see the Azure Synapse Analytics overview documentation.
A running dedicated SQL pool is required.
If you are using the OAuth authentication method,
Ensure that Microsoft Entra ID is on the same account as the Azure Synapse Analytics workspace and dedicated SQL pool.
.
Select Accounts in this organizational directory only as the account type.
Click the App Settings icon in the navigation menu.
Click the Integrations tab.
Click the +Add Integration button and select Azure Synapse Analytics from the dropdown menu.
You have two options for configuring your Azure Synapse Analytic environment:
: Grant Immuta one-time use of credentials to automatically configure your environment and the integration.
: Run the Immuta script in your Azure Synapse Analytics environment yourself to configure the integration.
Enter the username and password in the Privileged User Credentials section.
Select Manual.
Download, fill out the appropriate fields, and run the bootstrap master script and bootstrap script linked in the Setup section. Note: The master script is not required if you're using the OAuth authentication method.
Select the authentication method:
Click Save.
.
Click the App Settings icon in the navigation menu.
Navigate to the Integrations tab and click the down arrow next to the Azure Synapse Analytics Integration.
Edit the field you want to change. Note any field shadowed is not editable, and the integration must be disabled and re-installed to change it.
Immuta requires temporary, one-time use of credentials with specific permissions
When performing edits to an integration, Immuta requires temporary, one-time use of credentials of a Superuser or a user with the Manage GRANTS permission.
Alternatively, you can download the Edit Script from your Azure Synapse Analytics configuration on the Immuta app settings page and run it in Azure Synapse Analytics.
Click the App Settings icon in the navigation menu.
Navigate to the Integrations tab and click the down arrow next to the Azure Synapse Analytics Integration.
Click the checkbox to disable the integration.
When the Databricks Spark plugin is running on a Databricks cluster, all Databricks users running jobs or queries are either a privileged user or a non-privileged user:
Privileged users: Privileged users can effectively read from and write to any table or view in the cluster Metastore, or any file path accessible by the cluster, without restriction. Privileged users are either Databricks workspace admins or users specified in IMMUTA_SPARK_ACL_ALLOWLIST. Any user writing queries or jobs impersonating another user is a non-privileged user, even if they are impersonating a privileged user.\
Privileged users have effective authority to read from and write to any securable in the cluster metastore or file path, because in almost all cases Databricks clusters running with the Immuta Spark plug-in installed have disabled Hive metastore table access control. However, if Hive metastore table access control is enabled on the cluster, privileged users will have the authority granted to them that is specified by table access control.
Non-privileged users: Non-privileged users are any users who are not privileged users, and all authorization for non-privileged users is determined by Immuta policies.
Whether a user is a privileged user or a non-privileged user, for a given query or job, is cached once first determined, based on . This caching can be disabled entirely by setting the value of that environment variable to 0.
Usernames in Databricks must match the usernames in the connected Immuta tenant. By default, the Immuta Spark plugin checks the Databricks username against the username within Immuta's internal IAM to determine access. However, you can integrate your existing IAM with Immuta and use that instead of the default internal IAM. Ideally, you should use the same identity manager for Immuta that you use for Databricks. See the for a list of supported identity providers and protocols.
It is possible within Immuta to have multiple users share the same username if they exist within different IAMs. In this case, the cluster can be configured to look up users from a specified IAM. To do this, the value of the must be updated to be the targeted IAM ID configured within the Immuta tenant. The targeted IAM ID can be found on the . Each Databricks cluster can only be mapped to one IAM.
Databricks user impersonation allows a Databricks user to impersonate an Immuta user. With this feature,
the Immuta user who is being impersonated does not have to have a Databricks account, but they must have an Immuta account.
the Databricks user who is impersonating an Immuta user does not have to be associated with Immuta. For example, this could be a service account.
When acting under impersonation, the Databricks user loses their privileged access, so they can only access the tables the Immuta user has access to and only perform DDL commands when that user is acting under an allowed circumstance (such as workspaces, scratch paths, or non-Immuta reads/writes).
Use the to enable user impersonation.
Scala clusters
Immuta discourages use of this feature with Scala clusters, as the proper security mechanisms were not built to account for . Instead, this feature was developed for the BI tool use case in which service accounts connecting to the Databricks cluster need to impersonate Immuta users so that policies can be enforced.
Prevent users from changing impersonation user in a given session
If your BI tool or other service allows users to submit arbitrary SQL or issue SET commands, set IMMUTA_SPARK_DATABRICKS_SINGLE_IMPERSONATION_USER to true to prevent users from changing their impersonation user once it has been set for a given Spark session.
Audited queries include an impersonationUser field, which identifies the Databricks user impersonating the Immuta user:
Public preview: This integration is available to all accounts that request to enable it for their tenant. Contact your Immuta representative to enable it.
The how-to guides linked on this page illustrate how to use PostgreSQL with Immuta. See the reference guide for information about the PostgreSQL integration.
Connect your technology
These guides provide instructions on getting your data set up in Immuta for the Request and Governance apps.
: Using a single setup process, connect PostgreSQL to Immuta. This will register your data objects in Immuta and allow you to start dictating access through access requests or global policies.
: Use domains to segment your data and assign responsibilities to the appropriate team members. These domains will then be used to manage permissions for publishing data products, authoring policies, viewing audit and managing identification.
Register your users
These guides provide instructions on getting your users set up in Immuta for the Request and Governance apps.
: Bring the IAM your organization already uses and allow Immuta to register your users for you.
Start using the Request app
These guides provide instructions on using the for the first time.
: Once you register your data, assets will appear in the Request app where you can attach request forms.
Add data metadata
These guides provide instructions on getting your data metadata set up in Immuta for the Governance app.
: Bring the external catalog your organization already uses and allow Immuta to continually sync your tags with your data sources for you.
Start using the Governance app
These guides provide instructions on using the Governance app for the first time.
: Once you add your data metadata to Immuta, you can immediately create policies that utilize your tags and apply to your tables. Subscription policies can be created to dictate access to data sources.
This page provides an overview of the Amazon Redshift Spectrum integration in Immuta. For a tutorial detailing how to enable this integration, see the .
The Amazon Redshift Spectrum integration is a policy push integration that allows Immuta to apply policies directly on Immuta-created views in Redshift. This allows data analysts to query Redshift views directly instead of going through a proxy and have per-user policies dynamically applied at query time.
The Amazon Redshift Spectrum integration creates views from the tables within the database specified when configured. Then, the user can choose the name for the schema where all the Immuta-generated views will reside. Immuta will also create the schemas
Once a Databricks securable is registered in Immuta as a data source and you are subscribed to that data source, you must access that data through SQL:
With R, you must load the SparkR library in a cell before accessing the data.
See the sections below for more guidance on accessing data using , , and .
Immuta policies will not be automatically enforced in MariaDB
While you can author and apply subscription and data policies on MariaDB data sources within Immuta, these policies will not be enforced natively in the MariaDB platform. You can use to be notified about changes to user access and make appropriate access updates in MariaDB using your own process.
To use this integration, contact your Immuta representative.
Immuta policies will not be automatically enforced in MySQL
While you can author and apply subscription and data policies on MySQL data sources within Immuta, these policies will not be enforced natively in the MySQL platform. You can use to be notified about changes to user access and make appropriate access updates in MySQL using your own process.
To use this integration, contact your Immuta representative.
The how-to guides linked on this page illustrate how to integrate Snowflake with Immuta. See the for information about the Snowflake integration.
Requirements
Snowflake enterprise edition
Access to a Snowflake account that can create a Snowflake user
Immuta policies will not be automatically enforced in Oracle
While you can author and apply subscription and data policies on Oracle data sources within Immuta, these policies will not be enforced natively in the Oracle platform. You can use to be notified about changes to user access and make appropriate access updates in Oracle using your own process.
To use this integration, contact your Immuta representative.
Navigate to the App Settings page.
Scroll to the Global Integrations Settings section.
Ensure the Snowflake Table Grants checkbox is checked. It is enabled by default.
IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URIS=maven:/my.group.id:my-package-id:1.2.3TrustedLibraryUtils: Successfully found all configured Immuta configured trusted libraries in Databricks.
TrustedLibraryUtils: Wrote trusted libs file to [/databricks/immuta/immutaTrustedLibs.json]: true.
TrustedLibraryUtils: Added trusted libs file with 1 entries to spark context.
TrustedLibraryUtils: Trusted library installation complete.IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URIS=dbfs:/immuta/bstabile/jars/immuta-spark-third-party-lib-test.jarOpt to change the Role Prefix. Snowflake table grants creates a new Snowflake role for each Immuta user. To ensure these Snowflake role names do not collide with existing Snowflake roles, each Snowflake role created for Snowflake table grants requires a common prefix. When using multiple Immuta accounts within a single Snowflake account, the Snowflake table grants role prefix should be unique for each Immuta account. The prefix must adhere to Snowflake identifier requirements and be less than 50 characters. Once the configuration is saved, the prefix cannot be modified; however, the Snowflake table grants feature can be disabled and re-enabled to change the prefix.
Finish configuring your integration by following one of these guidelines:
New Snowflake integration: Set up a new Snowflake integration by following the configuration tutorial.
Existing Snowflake integration (automatic setup): You will be prompted to enter connection information for a Snowflake user. Immuta will execute the migration to Snowflake table grants using a connection established with this Snowflake user. The Snowflake user you provide here must have Snowflake privileges to run these privilege grants.
Existing Snowflake integration (manual setup): Immuta will display a link to a migration script you must run in Snowflake and a link to a rollback script for use in the event of a failed migration. Important: Execute the migration script in Snowflake before clicking Save on the app settings page.
Snowflake table grants private preview migration
To migrate from the private preview version of Snowflake table grants (available before September 2022) to the generally available version of Snowflake table grants, follow the steps in the migration guide.
Legacy Hive metastore and Unity Catalog: If you need to work with database objects registered in both the legacy Hive metastore and in Unity Catalog, metastore magic allows you to use both integrations.



Respond to an access request: To grant access to data, data stewards will respond to the access request.
Configure audit: Once you have your data sources and users, and policies granting them access, you can set up audit export. This will export the audit logs from user queries, policy changes, and tagging updates.
Configure audit: Once you have your data sources and users, and policies granting them access, you can set up audit export. This will export the audit logs from user queries, policy changes, and tagging updates.
Respond to an access request: To grant access to a data product and its data sources, respond to the access request.
Configure audit: Once you have your data sources and users, and policies granting them access, you can set up audit export. This will export the audit logs from policy changes and tagging updates.
Respond to an access request: To grant access to a data product and its data sources, respond to the access request.
Configure audit: Once you have your data sources and users, and policies granting them access, you can set up audit export. This will export the audit logs from policy changes and tagging updates.
Respond to an access request: To grant access to data, data stewards will respond to the access request.
Immuta supports the following authentication methods to configure the Databricks Spark integration and register data sources:
OAuth machine-to-machine (M2M): Immuta uses the Client Credentials Flow to integrate with Databricks OAuth machine-to-machine authentication, which allows Immuta to authenticate with Databricks using a client secret. Once Databricks verifies the Immuta service principal’s identity using the client secret, Immuta is granted a temporary OAuth token to perform token-based authentication in subsequent requests. When that token expires (after one hour), Immuta requests a new temporary token. See the Databricks OAuth machine-to-machine (M2M) authentication page for more details.
Personal access token (PAT): This token gives Immuta temporary permission to push the cluster policies to the configured Databricks workspace and overwrite any cluster policy templates previously applied to the workspace when configuring the integration or to register securables as Immuta data sources.
The built-in Immuta IAM can be used as a complete solution for authentication and fine-grained user entitlement. However, you can connect your existing identity management provider to Immuta to use that system for authentication and fine-grained user entitlement instead.
Each of the supported identity providers includes a specific set of configuration options that enable Immuta to communicate with the IAM system and map the users, permissions, groups, and attributes into Immuta.
See the Identity managers guide for a list of supported providers and details.
See the Setting up users guide for details and instructions on mapping Databricks user accounts to Immuta.
Non-administrator users on an Immuta-enabled Databricks cluster must not have access to view or modify Immuta configuration, as this poses a security loophole around Immuta policy enforcement. Databricks secrets allow you to securely apply environment variables to Immuta-enabled clusters.
Databricks secrets can be used in the environment variables configuration section for a cluster by referencing the secret path instead of the actual value of the environment variable.
See the Installation and compliance guide for details and instructions on using Databricks secrets.
There are limitations to isolation among users in Scala jobs on a Databricks cluster. When data is broadcast, cached (spilled to disk), or otherwise saved to SPARK_LOCAL_DIR, it's impossible to distinguish between which user’s data is composed in each file/block. To address this vulnerability, Immuta suggests that you
limit Scala clusters to Scala jobs only and
require equalized projects, which will force all users to act under the same set of attributes, groups, and purposes with respect to their data access. This requirement guarantees that data being dropped into SPARK_LOCAL_DIR will have policies enforced and that those policies will be homogeneous for all users on the cluster. Since each user will have access to the same data, if they attempt to manually access other users' cached/spilled data, they will only see what they have access to via equalized permissions on the cluster. If project equalization is not turned on, users could dig through that directory and find data from another user with heightened access, which would result in a data leak.
See the Installation and compliance guide for more details and configuration instructions.
Immuta provides auditing features and governance reports so that data owners and governors can monitor users' access to data and detect anomalies in behavior.
You can view the information in these audit logs on dashboards or export the full audit logs to S3 and ADLS for long-term backup and processing with log data processors and tools. This capability fosters convenient integrations with log monitoring services and data pipelines.
See the Audit documentation for details about these capabilities and how they work with the Databricks Spark integration.
Immuta captures the code or query that triggers the Spark plan in Databricks, making audit records more useful in assessing what users are doing.
To audit what triggers the Spark plan, Immuta hooks into Databricks where notebook cells and JDBC queries execute and saves the cell or query text. Then, Immuta pulls this information into the audits of the resulting Spark jobs.
Immuta will audit queries that come from interactive notebooks, notebook jobs, and JDBC connections, but will not audit Scala or R submit jobs. Furthermore, Immuta only audits Spark jobs that are associated with Immuta tables. Consequently, Immuta will not audit a query in a notebook cell that does not trigger a Spark job, unless IMMUTA_SPARK_AUDI_ALL_QUERIES is set to true.
See the Databricks Spark query audit logs page for examples of saved queries and the resulting audit records. To exclude query text from audit events, see the App settings page.
Immuta supports auditing all queries run on a Databricks cluster, regardless of whether users touch Immuta-protected data or not.
See the Installation and compliance guide for details and instructions.
When a query is run by a user impersonating another user, the extra.impersonationUser field in the audit log payload is populated with the Databricks username of the user impersonating another user. The userId field will return the Immuta username of the user being impersonated:
See the Setting up users guide for details about user impersonation.
Immuta governance reports allow users with the GOVERNANCE Immuta permission to use a natural language builder to instantly create reports that delineate user activity across Immuta. These reports can be based on various entity types, including users, groups, projects, data sources, purposes, policy types, or connection types.
See the Governance report types page for a list of report types and guidance.
MERGE INTO delta_native.target_native as target
USING immuta_temp_view_data_source as source
ON target.dr_number = source.dr_number
WHEN MATCHED THEN
UPDATE SET target.date_reported = source.date_reported{
"id": "query-a20e-493e-id-c1ada0a23a26",
"dateTime": "1639684812845",
"month": 1463,
"profileId": 4,
"userId": "[email protected]",
"dataSourceId": 1,
"dataSourceName": "Hr Data",
"count": 1,
"recordType": "spark",
"success": true,
"component": "dataSource",
"accessType": "query",
"query": "Relation[id#2644,first_name#2645,last_name#2646,email#2647,gender#2648,race#2649,ssn#2650,dept#2651,job#2652,skills#2653,salary#2654,type#2655] parquet\n",
"extra": {
"databricksWorkspaceID": "0",
"maskedColumns": {},
"metastoreTables": [
"demo.hr_data"
],
"clusterName": "your-cluster-name",
"pathUris": [
"dbfs:/user/hive/warehouse/demo.db/hr_data"
],
"queryText": "select * from demo.hr_data limit 10;",
"queryLanguage": "sql",
"clusterID": "your-171358-cluster-id",
"impersonationUser": "[email protected]"
},
"dataSourceTableName": "demo_hr_data",
"createdAt": "2021-12-16T20:00:12.850Z",
"updatedAt": "2021-12-16T20:00:12.850Z"
}{
"id": "query-a20e-493e-id-c1ada0a23a26",
[...]
"userId": "<immuta_username>",
[...]
"extra": {
[...]
"impersonationUser": "<databricks_username>"
}
[...]
}Opt to check the Enable Impersonation box and customize the Impersonation Role name as needed. This will allow users to natively impersonate another user. Once you finish configuring the integration, you can grant the IMPERSONATE_USER permission to Immuta users. See the Managing users and permissions guide for instructions.
Opt to update the User Profile Delimiters. This will be necessary if any of the provided symbols are used in user profile information.
Username and Password: Enter the username and password in the Immuta System Account Credentials section. The username and password provided must be the credentials that were set in the bootstrap master script when you created the user.
Entra ID OAuth Client Secret: The values below can be found on the overview page of the application you created in Microsoft Entra ID. Before you enter this information, ensure you have completed the prerequisites for OAuth authentication listed above.
Display Name: This must match the name of the OAuth application you registered.
Tenant Id
Client Id
Client Secret: Enter the Value of the secret, not the secret ID.
Use the authentication method and credentials you provided when initially configuring the integration.
Click Save.
Click Save.
immuta_systemimmuta_functionsimmuta_proceduresALL PRIVILEGES ON DATABASE IMMUTA_DB
ALL PRIVILEGES ON ALL SCHEMAS IN DATABASE IMMUTA_DB
USAGE ON FUTURE PROCEDURES IN SCHEMA IMMUTA_DB.IMMUTA_PROCEDURES
USAGE ON LANGUAGE PLPYTHONU
Additionally the PUBLIC role will be granted the following privileges:
USAGE ON DATABASE IMMUTA_DB
TEMP ON DATABASE IMMUTA_DB
USAGE ON SCHEMA IMMUTA_DB.IMMUTA_PROCEDURES
USAGE ON SCHEMA IMMUTA_DB.IMMUTA_FUNCTIONS
USAGE ON FUTURE FUNCTIONS IN SCHEMA IMMUTA_DB.IMMUTA_FUNCTIONS
USAGE ON SCHEMA IMMUTA_DB.IMMUTA_SYSTEM
SELECT ON TABLES TO public
Once the integration is configured, data owners must register Redshift Spectrum data sources using the Immuta CLI or V2 API.
An Immuta application administrator, creates an immuta database in Amazon Redshift (that will contain Immuta policy definitions and user entitlements), configures the Redshift Spectrum integration, and registers Redshift warehouse and databases with Immuta.
A data owner registers Redshift tables in Immuta as data sources.
A data owner, data governor, or administrator creates or changes a policy or user in Immuta.
Data source metadata, tags, user metadata, and policy definitions are stored in Immuta's Metadata Database.
The Immuta Web Service calls a stored procedure that modifies the user entitlements or policies.
A Redshift user who is subscribed to the data source in Immuta directly in Redshift through the immuta database and sees policy-enforced data.
SQL statements are used to create all views, including a join to the secure view: immuta_system.user_profile. This secure view is a select from the immuta_system.profile table (which contains all Immuta users and their current groups, attributes, projects, and a list of valid tables they have access to) with a constraint immuta__userid = current_user() to ensure it only contains the profile row for the current user. The immuta_system.user_profile view is readable by all users, but will only display the data that corresponds to the user executing the query.
The Amazon Redshift Spectrum integration uses webhooks to keep views up-to-date with Immuta data sources. When a data source or policy is created, updated, or disabled, a webhook will be called that will create, modify, or delete the dynamic view. The immuta_system.profile table is updated through webhooks when a user's groups or attributes change, they switch projects, they acknowledge a purpose, or when their data source access is approved or revoked. The profile table can only be read and updated by the Immuta system account.
The definitions for each status and the state of configured data platform integrations is available in the response schema of the integrations API.
The definitions for each status and the state of configured data platform integrations is available in the response schema of the integrations API.
All Redshift cluster types are supported for the Amazon Redshift Spectrum integration, and Immuta's views must exist in the same database as the raw tables. See the Configure an Amazon Redshift Spectrum guide for details about setting up this database for Immuta-managed resources.
Immuta supports a single integration with secure views in a single database per cluster.
The Amazon Redshift Spectrum integration supports username and password authentication to configure the integration and create data sources.
Immuta cannot ingest tags from Amazon Redshift Spectrum, but you can connect any of these supported external catalogs to work with your integration.
Required Redshift privileges
Setup user
OWNERSHIP ON GROUP IMMUTA_IMPERSONATOR_ROLE
CREATE GROUP
Immuta system account
GRANT EXECUTE ON PROCEDURE grant_impersonation
GRANT EXECUTE ON PROCEDURE revoke_impersonation
Impersonation allows users to query data as another Immuta user in Amazon Redshift Spectrum. To enable user impersonation, see the Configure an Amazon Redshift Spectrum integration guide.
Users can enable multiple Amazon Redshift Spectrum integrations with a single Immuta tenant.
The host of the data source must match the host of the integration for the view to be created.
When using multiple Amazon Redshift Spectrum integrations, a user has to have the same user account across all hosts.
Case sensitivity of database, table, and column identifiers is not supported. The enable_case_sensitive_identifier parameter must be set to false (default setting) for your Redshift cluster to configure the integration and register data sources.
❌
❌
✅
❌
✅
A running dedicated SQL pool
The Azure Synapse Analytics integration supports the following authentication methods to configure the integration and create data sources:
Username and password: Immuta supports SQL authentication with username and password for Azure Synapse Analytics. See the SQL Authentication in Azure Synapse Analytics documentation for details.
OAuth authentication with Microsoft Entra ID: You can use this authentication method to register data sources or configure the Azure Synapse Analytics integration using the manual setup method. To use this authentication method, OAuth must be set up via Microsoft Entra ID app registration with a client secret. See the Microsoft Entra documentation for details about using OAuth authentication with Microsoft Entra ID.
Immuta cannot ingest tags from Synapse, but you can connect any of these supported external catalogs to work with your integration.
Impersonation allows users to query data as another Immuta user in Azure Synapse Analytics. To enable user impersonation, see the Configure Azure Synapse Analytics integration guide.
A user can configure multiple integrations of Synapse to a single Immuta tenant.
Immuta does not support the following masking types in this integration because of limitations with dedicated SQL pools (linked below). Any column assigned one of these masking types will be masked to NULL:
Reversible Masking: Synapse UDFs currently only support SQL, but Immuta needs to execute code (such as JavaScript or Python) to support this masking feature. See the Synapse Documentation for details.
Format Preserving Masking: Synapse UDFs currently only support SQL, but Immuta needs to execute code (such as JavaScript or Python) to support this masking feature. See the Synapse Documentation for details.
Regex: The built in string replace function does not support full regex. See the .
The delimiters configured when enabling the integration cannot be changed once they are set. To change the delimiters, the integration has to be disabled and re-enabled.
If the generated view name is more than 128 characters, then the view name is shortened to 128 characters. This could cause collisions between view names if the shortened version is the same for two different data sources.
For proper updates, the dedicated SQL pools have to be running when changes are made to users or data sources in Immuta.
Project Workspaces
Query Audit
The user registering the connection must have the permissions below.
APPLICATION_ADMIN Immuta permission
The account credentials you provide to register the connection should be a Databricks service principal and it must have these Databricks Lakebase privileges:
databricks_superuser
CREATEROLE
For descriptions and explanations of privileges Immuta needs to enforce policies and maintain state in Databricks Lakebase, see the .
In Immuta, click Data and select Connections in the navigation menu.
Click the + Add Connection button.
Select the Databricks Lakebase tile.
Enter the host connection information:
Display Name: This is the name of your new connection. This name will be used in the API (connectionKey), in data source names from the host, and on the connections page. Avoid the use of periods (.) or
Enter privileged credentials to register the connection using OAuth M2M:
Follow for the Immuta service principal and assign this service principal the for the Databricks Lakebase.
Fill out the Workspace URL (e.g., https://<your workspace name>.cloud.databricks.com).
Click Save Connection.
Requirement: USER_ADMIN Immuta permission
Map PostgreSQL usernames to each Immuta user account to ensure Immuta properly enforces policies when the user queries the Databricks Lakebase objects in PostgreSQL.
The instructions below illustrate how to do this for individual users, but you can also configure user mapping in your IAM connection on the app settings page.
Click People and select Users in the navigation menu.
Click the user's name to navigate to their page and scroll to the External User Mapping section.
Click Edit in the PostgreSQL row.
Select one of the following options from the dropdown:
Select PostgreSQL Username to map the PostgreSQL username to the Immuta user and enter the PostgreSQL username in the field. Username mapping is case insensitive.
Select Unset (fallback to Immuta username) to use the Immuta username as the assumed PostgreSQL username. Use this option if the user's PostgreSQL username exactly matches the user's Immuta username. Username mapping is case insensitive.
Click Save.
When using Delta Lake, the API does not go through the normal Spark execution path. This means that Immuta's Spark extensions do not provide protection for the API. To solve this issue and ensure that Immuta has control over what a user can access, the Delta Lake API is blocked.
Spark SQL can be used instead to give the same functionality with all of Immuta's data protections. See the Delta API reference guide for a list of corresponding Spark SQL calls to use.
In addition to supporting direct file reads through workspace and scratch paths, Immuta allows direct file reads in Spark for file paths. As a result, users who prefer to interact with their data using file paths or who have existing workflows revolving around file paths can continue to use these workflows without rewriting those queries for Immuta.
When reading from a path in Spark, the Immuta Databricks Spark plugin queries the Immuta Web Service to find Databricks data sources for the current user that are backed by data from the specified path. If found, the query plan maps to the Immuta data source and follows existing code paths for policy enforcement.
Users can read data from individual parquet files in a sub-directory and partitioned data from a sub-directory (or by using a where predicate). Expand the blocks below to view examples of reading data using these methods.
To read partitioned data from a sub-directory, load a parquet partition from a sub-directory:
spark.read.format("parquet").load("s3:/my_bucket/path/to/my_parquet_table/partition_column=01")Alternatively, load a parquet partition using a where predicate:
spark.read.format("parquet").load("s3:/my_bucket/path/to/my_parquet_table").where("partition_column=01")Read partitioned data from a sub-directoryDirect file reads for Immuta data sources only apply to data sources created from tables, not data sources created from views or queries.
If more than one data source has been created for a path, Immuta will use the first valid data source it finds. It is therefore not recommended to use this integration when more than one data source has been created for a path.
In Databricks, multiple input paths are supported as long as they belong to the same data source.
CSV-backed tables are not currently supported.
Loading a delta partition from a sub-directory is not recommended by Spark and is not supported in Immuta. Instead, use a where predicate:
User impersonation allows Databricks users to query data as another Immuta user. To impersonate another user, see the Impersonate a user page.
df = spark.sql("select * from immuta.table")%sql
select * from immuta.tableimport org.apache.spark.sql.SparkSession
val spark = SparkSession
.builder()
.appName("Spark SQL basic example")
.config("spark.some.config.option", "some-value")
.getOrCreate()
val sqlDF = spark.sql("SELECT * FROM immuta.table")library(SparkR)
df <- SparkR::sql("SELECT * from immuta.table")Amazon RDS for MariaDB
The user registering the connection must have the permissions below.
APPLICATION_ADMIN Immuta permission
The MariaDB user setting up the connection must be the root user or have the GRANT OPTION MariaDB privilege.
Create a new database user in MariaDB to serve as the Immuta system account. Immuta will use this system account continuously to crawl the database you register. How you create this user depends on your database authentication method. Follow the instructions linked below to create this user:
Password authentication: Follow the MariaDB documentation to create the database user and assign that user a password.
. A sample command that provides all these privileges to all databases and views is provided below:
SHOW DATABASES on all databases in the server
SELECT on all databases, tables, and views in the server
In Immuta, click Data and select Connections in the navigation menu.
Click the + Add Connection button.
Select the MariaDB tile.
Select RDS as the deployment method.
Enter the host connection information:
Display Name: This is the name of your new connection. This name will be used in the API (connectionKey), in data source names from the host, and on the connections page. Avoid the use of periods (.) or
Select an authentication method from the dropdown menu.
AWS Access Key: Provide the access key ID and secret access key for the .
AWS Assumed Role (recommended): Immuta will assume this IAM role from Immuta's AWS account to request
Click Save connection.
Amazon RDS or Amazon Aurora for MySQL
The user registering the connection must have the permissions below.
APPLICATION_ADMIN Immuta permission
The MySQL user registering the connection must be the root user or have the GRANT OPTION MySQL privilege.
Create a new database user in MySQL to serve as the Immuta system account. Immuta will use this system account continuously to crawl the database you register. How you create this user depends on your database authentication method. Follow the instructions linked below to create this user:
Password authentication: Follow the MySQL documentation to create the database user and assign that user a password.
. A sample command that provides all these privileges to all databases and views is provided below:
SHOW DATABASES on all databases in the server
SELECT on all databases, tables, and views in the server
In Immuta, click Data and select Connections in the navigation menu.
Click the + Add Connection button.
Select the MySQL tile.
Select your deployment type:
Aurora
RDS
Enter the host connection information:
Display Name: This is the name of your new connection. This name will be used in the API (connectionKey), in data source names from the host, and on the connections page. Avoid the use of periods (.) or
Select an authentication method from the dropdown menu.
AWS Access Key: Provide the access key ID and secret access key for the .
AWS Assumed Role (recommended): Immuta will assume this IAM role from Immuta's AWS account to request
Click Save connection.
Connect your technology
These guides provide instructions on getting your data set up in Immuta for the Request and Governance apps.
Register your Snowflake connection: Using a single setup process, connect Snowflake to Immuta. This will register your data objects into Immuta and allow you to start dictating access through access requests or global policies.
Organize your data sources into domains and assign domain permissions to accountable teams: Use domains to segment your data and assign responsibilities to the appropriate team members. These domains will then be used to manage permissions for publishing data products, authoring policies, viewing audit and managing identification.
Connections are available on all tenants created after February 26, 2025. If you do not have connections enabled on your tenant, and using the legacy workflow.
Register your users
These guides provide instructions on getting your users set up in Immuta for the Request and Governance apps.
Connect an IAM: Bring the IAM your organization already uses and allow Immuta to register your users for you.
: Ensure the user IDs in Immuta, Snowflake, and your IAM are aligned so that the right policies impact the right users.
Start using the Request app
These guides provide instructions on using the Request app for the first time.
Set up a request form for your assets: Once you register your data, assets will appear in the Request app where you can attach request forms.
: After you set up request forms, you can put access request links in your catalog for your data consumers to click. This will take them to the Request app to fill out the request form to request access to data.
: To grant access to data, data stewards will respond to the access request.
Add data metadata
These guides provide instructions on getting your data metadata set up in Immuta for the Governance app.
Connect an external catalog: Bring the external catalog your organization already uses and allow Immuta to continually sync your tags with your data sources for you.
: Identification allows you to automate data tagging using identifiers that detect certain data patterns.
Start using the Governance app
These guides provide instructions on using the Governance app for the first time.
Author a global subscription policy: Once you add your data metadata to Immuta, you can immediately create policies that utilize your tags and apply to your tables. Subscription policies can be created to dictate access to data sources.
: Data metadata can also be used to create data policies that apply to data sources as they are registered in Immuta. Data policies dictate what data a user can see once they are granted access to a data source. Using catalog and identification tags you can create proactive policies, knowing that they will apply to data sources as they are added to Immuta with the automated tagging.
: Once you have your data sources and users, and policies granting them access, you can set up audit export. This will export the audit logs from user queries, policy changes, and tagging updates.
The user registering the connection must have the permissions below.
APPLICATION_ADMIN Immuta permission
Either of the following Oracle system privileges:
GRANT ANY ROLE
GRANT ANY PRIVILEGE
Create a new database user in Oracle to serve as the Immuta system account. Immuta will use this system account continuously to crawl the connection.
Grant this account the SELECT Oracle privilege on the system views listed below:
V$DATABASE
CDB_PDBS
SYS.DBA_USERS
SYS.DBA_TABLES
SYS.DBA_VIEWS
SYS.DBA_MVIEWS
SYS.DBA_TAB_COLUMNS
SYS.DBA_OBJECTS
SYS.DBA_CONSTRAINTS
SYS.DBA_CONS_COLUMNS
In Immuta, click Data and select Connections in the navigation menu.
Click the + Add Connection button.
Select the Oracle tile.
Select RDS as the deployment method.
Enter the host connection information:
Display Name: This is the name of your new connection. This name will be used in the API (connectionKey), in data source names from the host, and on the connections page. Avoid the use of periods (.) or
Enter the username and password of the .
Click Save connection.
Navigate to the App Setting page and click the Integration tab.
Click +Add Integration and select Snowflake from the dropdown menu.
Complete the Host, Port, and Default Warehouse fields.
Enable Query Audit.
Enable Lineage and complete the following fields:
Ingest Batch Sizes: This setting configures the number of rows Immuta ingests per batch when streaming Access History data from your Snowflake instance.
Table Filter: This filter determines which tables Immuta will ingest lineage for. Enter a regular expression that excludes / from the beginning and end to filter tables. Without this filter, Immuta will attempt to ingest lineage for every table on your Snowflake instance.
Select Manual or Automatic Setup and
The Snowflake lineage sync endpoint triggers the lineage ingestion job that allows Immuta to propagate Snowflake tags added through lineage to Immuta data sources.
Copy the example and replace the Immuta URL and API key with your own.
Change the payload attribute values to your own, where
tableFilter (string): This regular expression determines which tables Immuta will ingest lineage for. Enter a regular expression that excludes / from the beginning and end to filter tables. Without this filter, Immuta will attempt to ingest lineage for every table on your Snowflake instance.
batchSize (integer): This parameter configures the number of rows Immuta ingests per batch when streaming Access History data from your Snowflake instance. Minimum 1.
lastTimestamp (string): Setting this parameter will only return lineage events later than the value provided. Use a format like 2022-06-29T09:47:06.012-07:00.
Once the sync job is complete, you can complete the following steps:
Integration settings:
Enable Snowflake table grants: Enable Snowflake table grants and configure the Snowflake role prefix.
: Use Snowflake data sharing with table grants or project workspaces.
: Enable Snowflake low row access policy mode.
: Configure your Snowflake integration to automatically apply tags added to a Snowflake table to its descendant data source columns in Immuta.
Snowflake integration reference guide: This reference guide describes the design and features of the Snowflake integration.
Snowflake table grants: Snowflake table grants simplifies the management of privileges in Snowflake when using Immuta. Instead of manually granting users access to tables registered in Immuta, you allow Immuta to manage privileges on your Snowflake tables and views according to subscription policies. This guide describes the components of Snowflake table grants and how they are used in Immuta's Snowflake integration.
Snowflake data sharing with Immuta: Organizations can share the policy-protected data of their Snowflake database with other Snowflake accounts with Immuta policies enforced in real time. This guide describes the components of using Immuta with Snowflake data shares.
: The Snowflake low row access policy mode improves query performance in Immuta's Snowflake integration. To do so, this mode decreases the number of Snowflake row access policies Immuta creates and uses table grants to manage user access. This guide describes the design and requirements of this mode.
: Snowflake column lineage specifies how data flows from source tables or columns to the target tables in write operations. When Snowflake lineage tag propagation is enabled in Immuta, Immuta automatically applies tags added to a Snowflake table to its descendant data source columns in Immuta so you can build policies using those tags to restrict access to sensitive data.
: Adjust the size and scale of clusters for your warehouse to manage workloads so that you can use Snowflake compute resources the most cost effectively.
Public preview: This integration is available to all accounts that request to enable it for their tenant. Contact your Immuta representative to enable it.
The Databricks Lakebase integration registers data from Databricks Lakebase in Immuta and enforces subscription policies on that data when queried in PostgreSQL. The sequence diagram below outlines the events that occur when an Immuta user who is subscribed to a data source queries that data.
Databricks Lakebase is configured and data is registered through , an Immuta feature that allows you to register your data objects through a single connection to make data registration more scalable for your organization. Instead of registering schema and catalogs individually, you can register them all at once and allow Immuta to monitor your data platform for changes so that data sources are added and removed automatically to reflect the state of data in your data platform.
During connection registration, you provide Immuta credentials with the . When the connection is registered, Immuta ingests and stores connection metadata in the Immuta metadata database.
In the example below, the Immuta application administrator connects the database that contains marketing-data, research-data, and cs-data tables. Immuta registers these tables as data sources and stores the table metadata in the Immuta metadata database.
Immuta presents a hierarchical view of your data that reflects the objects in PostgreSQL hosted on Databricks Lakebase after registration is complete:
Lakebase database
Database
Schema
Table
Beyond making the registration of your data more intuitive, connections provides more control. Instead of performing operations on individual schemas or tables, you can perform operations (such as object sync) at the connection level.
See the for details about connections and how to manage them. To configure your Databricks Lakebase connection, see the .
Immuta enforces read and write subscription policies on Databricks Lakebase tables by issuing SQL statements in PostgreSQL that grant and revoke access to tables according to the policy.
When a user is subscribed to a table registered in Immuta,
Immuta creates a role for that user in PostgreSQL, if one doesn't already exist.
PostgreSQL stores that role in its internal system catalog.
Immuta issues grants to that user's role in PostgreSQL to enforce policy. The provides an example of this policy enforcement.
See the for details about the privileges granted to users when they are subscribed to a data source protected by a subscription policy.
Immuta grants access to Databricks Lakebase through PostgreSQL privileges. See the for details about the privileges granted to users when they are subscribed to a data source protected by a subscription policy.
The privileges that the Databricks Lakebase integration requires align to the least privilege security principle. The table below describes each privilege required by the IMMUTA_SYSTEM_ACCOUNT user.
The following user actions spur various processes in the Databricks Lakebase integration so that Immuta data remains synchronous with data in Databricks Lakebase:
Data source created: Immuta registers data source metadata and stores that metadata in the Immuta metadata database.
Data source deleted: Immuta deletes the data source metadata from the metadata database and removes subscription policies from that table.
: When a user account is mapped to Immuta, their metadata is stored in the metadata database.
The database instance must be up and running for state to be maintained and object sync to successfully complete. If the database instance is stopped, object sync will fail.
Databricks Lakebase holds PostgreSQL objects. See the section for details about the PostgreSQL objects and policies that Immuta supports.
Immuta supports Databricks Lakebase policies through PostgreSQL. See the section for details about the policies that Immuta supports.
The Databricks Lakebase integration supports OAuth machine-to-machine (M2M) authentication to register a connection.
The Databricks Lakebase connection authenticates as a Databricks identity and generates an OAuth token. Immuta then uses that token as a password when connecting to PostgreSQL. To enable secure, automated machine-to machine access to the database instance, the connection must obtain an OAuth token using a Databricks service principal. See the for more details.
The built-in Immuta IAM can be used as a complete solution for authentication and user entitlement. However, you can connect your existing identity management provider to Immuta to use that system for authentication and user entitlement instead. Each of the includes a set of configuration options that enable Immuta to communicate with the IAM system and map the users, permissions, groups, and attributes into Immuta.
For policies to impact the right users, the user account in Immuta must be mapped to the user account in PostgreSQL. You can ensure these accounts are mapped correctly in the following ways:
: If usernames in PostgreSQL align with usernames in the external IAM and those accounts align with an IAM attribute, you can enter that IAM attribute on the app settings page to automatically map user IDs in Immuta to PostgreSQL.
: You can manually map user IDs for individual users.
For guidance on connecting your IAM to Immuta, see the .
The following Immuta features are unsupported:
Data policies
Impersonation
Tag ingestion
Query audit
APPLICATION_ADMIN Immuta permission
The Snowflake user registering the connection and running the script must have the following privileges:
CREATE DATABASE ON ACCOUNT WITH GRANT OPTION
CREATE ROLE ON ACCOUNT WITH GRANT OPTION
CREATE USER ON ACCOUNT WITH GRANT OPTION
No Snowflake integration configured in Immuta. If your Snowflake integration is already configured on the app settings page, follow the guide.
Complete the following actions in Snowflake:
. Immuta will use this system account continuously to orchestrate Snowflake policies and maintain state between Immuta and Snowflake.
with a minimum of the following privileges:
USAGE on all databases and schemas with registered data sources.
To register a Snowflake connection, follow the instructions below.
Click Data and select the Connections tab in the navigation menu.
Click the + Add Connection button.
Select the Snowflake data platform tile.
Deprecation notice
Support for editing or deleting the Snowflake integration using this legacy workflow has been deprecated. Instead, manage your connection settings or deregister your connection.
To edit or remove a Snowflake integration, you have two options:
Automatic: Grant Immuta one-time use of credentials with the following privileges to automatically edit or remove the integration:
CREATE DATABASE ON ACCOUNT WITH GRANT OPTION
CREATE ROLE ON ACCOUNT WITH GRANT OPTION
Manual: Run the Immuta script in your Snowflake environment as a user with the following privileges to edit or remove the integration:
CREATE DATABASE ON ACCOUNT WITH GRANT OPTION
CREATE ROLE ON ACCOUNT WITH GRANT OPTION
Select one of the following options for editing your integration:
: Grant Immuta one-time use of credentials to automatically edit the integration.
: Run the Immuta script in your Snowflake environment yourself to edit the integration.
Click the App Settings icon in the navigation menu.
Click the Integrations tab and click the down arrow next to the Snowflake integration.
Edit the field you want to change or check a checkbox of a feature you would like to enable. Note any field shadowed is not editable, and the integration must be disabled and re-installed to change it.
Click the App Settings icon in the navigation menu.
Click the Integrations tab and click the down arrow next to the Snowflake integration.
Edit the field you want to change or check a checkbox of a feature you would like to enable. Note any field shadowed is not editable, and the integration must be disabled and re-installed to change it.
Select one of the following options for deleting your integration:
: Grant Immuta one-time use of credentials to automatically remove the integration and Immuta-managed resources from your Snowflake environment.
: Run the Immuta script in your Snowflake environment yourself to remove Immuta-managed resources and policies from Snowflake.
Click the App Settings icon in the navigation menu.
Click the Integrations tab and click the down arrow next to the Snowflake integration.
Click the checkbox to disable the integration.
Cleaning up your Snowflake environment Until you manually run the cleanup script in your Snowflake environment, Immuta-managed and Immuta policies will still exist in Snowflake.
Click the App Settings icon in the navigation menu.
Click the Integrations tab and click the down arrow next to the Snowflake integration.
Click the checkbox to disable the integration.
This page outlines configuration details for Immuta-enabled Databricks clusters. Databricks administrators should place the desired configuration in the Spark environment variables.
If you add additional Hadoop configuration during the integration setup, this variable sets the path to that file.
The additional Hadoop configuration is where sensitive configuration goes for remote filesystems (if you are using a secret key pair to access S3, for example).
Immuta policies will not be automatically enforced in Oracle
While you can author and apply subscription and data policies on Oracle data sources within Immuta, these policies will not be enforced natively in the Oracle platform. You can use to be notified about changes to user access and make appropriate access updates in Oracle using your own process.
To use this integration, contact your Immuta representative.
The Oracle integration allows you to register data from Oracle in Immuta. Immuta supports Oracle on Amazon RDS.
Hostname
Port
Database: This should be the PostgreSQL dbname in the Databricks Lakebase connection details.
Fill out the Client ID. This is a combination of letters, numbers, or symbols, used as a public identifier and is the client ID displayed in Databricks when creating the client secret for the service principal.
Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
Hostname: URL of your Oracle instance.
Port: Port configured for Oracle.
Database: The Oracle database you want to connect to. All databases in the host will be registered.
Region: The region of the AWS account with your Oracle instance.
CREATE USER ON ACCOUNT WITH GRANT OPTIONMANAGE GRANTS ON ACCOUNT WITH GRANT OPTION
CREATE USER ON ACCOUNT WITH GRANT OPTION
MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION
APPLY MASKING POLICY ON ACCOUNT WITH GRANT OPTION
APPLY ROW ACCESS POLICY ON ACCOUNT WITH GRANT OPTION
From the Select Authentication Method Dropdown, select either Username and Password or Key Pair Authentication:
Username and Password option: Complete the Username, Password, and Role fields.
Key Pair Authentication option:
Complete the Username field.
When using a private key, enter the private key file password in the Additional Connection String Options. Use the following format: PRIV_KEY_FILE_PWD=<your_pw>
Click Key Pair (Required), and upload a Snowflake key pair file.
Complete the Role field.
Click Save.
Click edit script to download the script, and then run it in Snowflake.
Click Save.
Click Save.
Click Save.
Run the cleanup script in Snowflake.


User subscribed to a data source: When a user is added to a data source by a data owner or through a subscription policy, Immuta creates a role for that user (if a role for them does not already exist) and grants PostgreSQL privileges to their role.
Automatic subscription policy applied to or updated on a data source: Immuta calculates the users and data sources affected by the policy change and grants or revokes users' privileges on the PostgreSQL table. See the Protecting data page for details about this process.
Subscription policy deleted: Immuta revokes privileges from the affected roles.
User removed from a data source: Immuta revokes privileges from the user's role.
Subscription policies on partitioned tables
databricks_superuser
This privilege is required so that Immuta can create and grant permissions to PostgreSQL roles.
CREATEROLE
Because privileges are granted to roles, this privilege is required so that Immuta can create PostgreSQL roles and manage role membership to enforce access controls for Databricks Lakebase objects.


Tag Filter: This filter determines which tags to propagate using lineage. Enter a regular expression that excludes / from the beginning and end to filter tags. Without this filter, Immuta will ingest lineage for every tag on your Snowflake instance.
SHOW VIEW on all views in the server
Hostname: URL of your MariaDB instance.
Port: Port configured with MariaDB.
Region: The region of the AWS account with your MariaDB instance.
Enter the Role ARN of the database account you created above.
Set the external ID provided in a condition on the trust relationship for the role specified above. See the AWS documentation for guidance.
Username and Password: Enter the credentials for the MariaDB database user account you created above.
SHOW VIEW on all views in the server
Hostname: The URL of your MySQL instance.
Port: The port configured for MySQL.
Region: The region of the AWS account with your MySQL instance.
Enter the Username of the database user account you created above
Enter the Role ARN of the database user account you created above.
Set the External ID provided in a condition on the trust relationship for the role specified above. See the AWS documentation for guidance.
Username and Password: Enter the credentials for the MySQL database user account you created above.
Default value: true
Set this to false if ephemeral overrides should not be enabled for Spark. When true, this will automatically override ephemeral data source httpPaths with the httpPath of the Databricks cluster running the user's Spark application.
This configuration item can be used if automatic detection of the Databricks httpPath should be disabled in favor of a static path to use for ephemeral overrides.
Default value: true
When querying Immuta data sources in Spark, the metadata from the Metastore is compared to the metadata for the target source in Immuta to validate that the source being queried exists and is queryable on the current cluster. This check typically validates that the target (database, table) pair exists in the Metastore and that the table’s underlying location matches what is in Immuta. This configuration can be used to disable location checking if that location is dynamic or changes over time. Note: This may lead to undefined behavior if the same table names exist in multiple workspaces but do not correspond to the same underlying data.
A URI that points to a valid calling class file, which is an Immuta artifact you download during the Databricks Spark configuration process.
This is a comma-separated list of Databricks users who can access any table or view in the cluster metastore without restriction.
Default value: 3600
The number of seconds to cache privileged user status for the Immuta ACL. A privileged Databricks user is an admin or is allowlisted in IMMUTA_SPARK_ACL_ALLOWLIST.
Default value: false
Enables auditing all queries run on a Databricks cluster, regardless of whether users touch Immuta-protected data or not.
Default value: false
Allows non-privileged users to SELECT from tables that are not protected by Immuta. See the Customizing the integration guide for details about this feature.
Default value: false
Allows non-privileged users to run DDL commands and data-modifying commands against tables or spaces that are not protected by Immuta. See the Customizing the integration guide for details about this feature.
This is a comma-separated list of Databricks users who are allowed to impersonate Immuta users:
Default value: false
Exposes the DBFS FUSE mount located at /dbfs. Granular permissions are not possible, so all users will have read/write access to all objects therein. Note: Raw, unfiltered source data should never be stored in DBFS.
Block one or more Immuta user-defined functions (UDFs) from being used on an Immuta cluster. This should be a Java regular expression that matches the set of UDFs to block by name (excluding the immuta database). For example to block all project UDFs, you may configure this to be ^.*_projects?$. For a list of functions, see the project UDFs page.
Default value: file:///databricks/jars/immuta-spark-hive.jar
The location of immuta-spark-hive.jar on the filesystem for Databricks. This should not need to change unless a custom initialization script that places immuta-spark-hive in a non-standard location is necessary.
Default value: true
Creates a world-readable or writable scratch directory on local disk to facilitate the use of dbutils and 3rd party libraries that may write to local disk. Its location is non-configurable and is stored in the environment variable IMMUTA_LOCAL_SCRATCH_DIR. Note: Sensitive data should not be stored at this location.
Default value: INFO
The SLF4J log level to apply to Immuta's Spark plugins.
Default value: false
If true, writes logging output to stdout/the console as well as the log4j-active.txt file (default in Databricks).
This configuration is a comma-separated list of additional databases that will appear as scratch databases when running a SHOW DATABASE query. This configuration increases performance by circumventing the Metastore to get the metadata for all the databases to determine what to display for a SHOW DATABASE query; it won't affect access to the scratch databases. Instead, use IMMUTA_SPARK_DATABRICKS_SCRATCH_PATHS to control read and write access to the underlying database paths.
Additionally, this configuration will only display the scratch databases that are configured and will not validate that the configured databases exist in the Metastore. Therefore, it is up to the Databricks administrator to properly set this value and keep it current.
Comma-separated list of remote paths that Databricks users are allowed to directly read/write. These paths amount to unprotected "scratch spaces." You can create a scratch database by configuring its specified location (or configure dbfs:/user/hive/warehouse/<db_name>.db for the default location).
To create a scratch path to a location or a database stored at that location, configure
To create a scratch path to a database created using the default location,
Default value: false
Enables non-privileged users to create or drop scratch databases.
Default value: false
When true, this configuration prevents users from changing their impersonation user once it has been set for a given Spark session. This configuration should be set when the BI tool or other service allows users to submit arbitrary SQL or issue SET commands.
Default value: true
Denotes whether the Spark job will be run that "tags" a Databricks cluster as being associated with Immuta.
A comma-separated list of Databricks trusted library URIs.
Default value: 3600
The number of seconds Immuta caches whether a table has been exposed as a data source in Immuta. This setting only applies when IMMUTA_SPARK_DATABRICKS_ALLOW_NON_IMMUTA_WRITES or IMMUTA_SPARK_DATABRICKS_ALLOW_NON_IMMUTA_READS is enabled.
Default value: false
Requires that users act through a single, equalized project. A cluster should be equalized if users need to run Scala jobs on it, and it should be limited to Scala jobs only via spark.databricks.repl.allowedLanguages.
Default value: true
Enables use of the underlying database and table name in queries against a table-backed Immuta data source. Administrators or allowlisted users can set IMMUTA_SPARK_RESOLVE_RAW_TABLES_ENABLED to false to bypass resolving raw databases or tables as Immuta data sources. This is useful if an admin wants to read raw data but is also an Immuta user. By default, data policies will be applied to a table even for an administrative user if that admin is also an Immuta user.
Default value: true
Same as the IMMUTA_SPARK_RESOLVE_RAW_TABLES_ENABLED variable, but this is a session property that allows users to toggle this functionality. If users run set immuta.spark.session.resolve.raw.tables.enabled=false, they will see raw data only (not Immuta data policy-enforced data). Note: This property is not set in immuta_conf.xml.
Default value: true
This shows the immuta database in the configured Databricks cluster. When set to false Immuta will no longer show this database when a SHOW DATABASES query is performed. However, queries can still be performed against tables in the immuta database using the Immuta-qualified table name (e.g., immuta.my_schema_my_table) regardless of whether or not this feature is enabled.
Default value: true
Immuta checks the versions of its artifacts to verify that they are compatible with each other. When set to true, if versions are incompatible, that information will be logged to the Databricks driver logs and the cluster will not be usable. If a configuration file or the jar artifacts have been patched with a new version (and the artifacts are known to be compatible), this check can be set to false so that the versions don't get logged as incompatible and make the cluster unusable.
Default value: bim
Denotes which IAM in Immuta should be used when mapping the current Spark user's username to a userid in Immuta. This defaults to Immuta's internal IAM (bim) but should be updated to reflect an actual production IAM.
# Not recommended by Spark and not supported in Immuta
spark.read.format("delta").load("s3:/my_bucket/path/to/my_delta_table/partition_column=01")
# Recommended by Spark and supported in Immuta.
spark.read.format("delta").load("s3:/my_bucket/path/to/my_delta_table").where("partition_column=01")
curl -X 'POST' \
'https://www.organization.immuta.com/lineage/ingest/snowflake' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-H 'Authorization: 846e9e43c86a4ct1be14290d95127d13f' \
-d '{
"tableFilter": "MY_DATABASE\\MY_SCHEMA\\..*",
"batchSize": 1,
"lastTimestamp": "2022-06-29T09:47:06.012-07:00"
}'GRANT SELECT, SHOW DATABASES, SHOW VIEW ON *.* TO '<user>'@'%';GRANT SELECT, SHOW DATABASES, SHOW VIEW ON *.* TO '<user>'@'%';"spark_env_vars.IMMUTA_SPARK_DATABRICKS_ALLOWED_IMPERSONATION_USERS": {
"type": "fixed",
"value": "[email protected],[email protected]"
}IMMUTA_SPARK_DATABRICKS_SCRATCH_PATHS=s3://path/to/the/dirIMMUTA_SPARK_DATABRICKS_SCRATCH_PATHS=s3://path/to/the/dir,dbfs:/user/hive/warehouse/any_db_name.db</value>MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION
APPLY MASKING POLICY ON ACCOUNT WITH GRANT OPTION
APPLY ROW ACCESS POLICY ON ACCOUNT WITH GRANT OPTION
REFERENCES on all tables and views registered in Immuta.
SELECT on all tables and views registered in Immuta.
Grant the new Snowflake role to the system account you just created.
Host: The URL of your Snowflake account.
Port: Your Snowflake port.
Warehouse: The warehouse the Immuta system account user will use to run queries and perform Snowflake operations.
Immuta Database: The new, empty database for Immuta to manage. This is where system views, user entitlements, row access policies, column-level policies, procedures, and functions managed by Immuta will be created and stored.
Display Name: The display name represents the unique name of your connection and will be used as prefix in the name for all data objects associated with this connection. It will also appear as the display name in the UI and will be used in all API calls made to update or delete the connection. Avoid the use of periods (.) or
Click Next.
Select an authentication method from the dropdown menu and enter the authentication information for the Immuta system account you created. Enter the Role with the listed privileges, then continue to enter the authentication information:
Username and password (Not recommended): Choose one of the following options.
Select Immuta Generated to have Immuta populate the system account name and password.
Select User Provided to enter your own name and password for the Immuta system account.
Snowflake External OAuth:
Fill out the Token Endpoint, which is where the generated token is sent. It is also known as aud (audience) and iss (issuer).
Fill out the Client ID, which is the subject of the generated token. It is also known as
:
Complete the Username field. This user must be .
If using an encrypted private key, enter the Private Key Password.
Copy the provided script and run it in Snowflake as a user with the privileges listed in the requirements section. Running this script grants the following privileges to the Immuta system account:
CREATE ROLE ON ACCOUNT WITH GRANT OPTION
APPLY MASKING POLICY ON ACCOUNT WITH GRANT OPTION
APPLY ROW ACCESS POLICY ON ACCOUNT WITH GRANT OPTION
MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION
Alternatively, you can grant the Immuta system account OWNERSHIP on the objects that Immuta will secure, instead of granting MANAGE GRANTS ON ACCOUNT. The current role that has OWNERSHIP on the securables will need to be granted to the Immuta system role. However, if granting OWNERSHIP instead of MANAGE GRANTS ON ACCOUNT, Immuta will not be able to manage the role that is granted to the account, so it is recommended to run the script as-is, without changes.
Click Test Connection.
If the connection is successful, click Next. If there are any errors, check the connection details and credentials to ensure they are correct and try again.
Ensure all the details are correct in the summary and click Complete Setup.
The enable_case_sensitive_identifier parameter must be set to false (default setting) for your Redshift cluster.
A Redshift database that contains an external schema and external tables. You have two options for configuring this database:
Configure the integration with an existing database that contains the external tables: Instead of creating an immuta database that manages all schemas and views created when Redshift data is registered in Immuta, the integration adds the Immuta-managed schemas and views to an existing database in Redshift.
Configure the integration by creating a new immuta database: Create a new database for Immuta that manages all schemas and views created when Redshift data is registered in Immuta, and re-create all of your external tables in that database.
The user configuring the integration must have the permissions below.
APPLICATION_ADMIN Immuta permission
The Redshift role used to run the Immuta bootstrap script must have the following privileges when configuring the integration:
If using an existing database
ALL PRIVILEGES ON DATABASE for the database you configure the integration with, as you must manage grants on that database.
CREATE USER
GRANT TEMP ON DATABASE
If creating a new database
CREATE DATABASE
CREATE USER
If enabling user impersonation:
OWNERSHIP ON GROUP IMMUTA_IMPERSONATOR_ROLE
CREATE GROUP
Allow Immuta to create secure views of your external tables through one of these methods:
Configure the integration with an existing database that contains the external tables: Instead of creating an immuta database that manages all schemas and views created when Redshift data is registered in Immuta, the integration adds the Immuta-managed schemas and views to an existing database in Redshift.
Configure the integration by creating a new immuta database: Create a new database for Immuta that manages all schemas and views created when Redshift data is registered in Immuta, and re-create all of your external tables in that database.
Select a tab below for instructions for either method.
Configure the integration with an existing database
Click the App Settings icon in the navigation menu.
Click the Integrations tab.
Click the +Add Integration button and select Redshift from the dropdown menu.
Complete the Host and Port fields.
Enter the name of the database you created the external schema in as the Immuta Database. This database will store all secure schemas and Immuta-created views.
Opt to check the Enable Impersonation box and customize the Impersonation Role name as needed. This will allow users to natively impersonate another user. Once you finish configuring the integration, you can grant the IMPERSONATE_USER permission to Immuta users. See the for instructions.
Select Manual and download the second bootstrap script (bootstrap script (Immuta database)) from the Setup section. The specified role used to run the bootstrap needs to have the for an existing database.
Run the bootstrap script (Immuta database) in the Redshift database that contains the external schema.
Choose username and password as your authentication method, and enter the credentials from the bootstrap script for the Immuta_System_Account.
Click Save.
Configure the integration by creating a new database
Click the App Settings icon in the navigation menu.
Click the Integrations tab.
Click the +Add Integration
Click the App Settings icon in the navigation menu.
Navigate to the Integrations tab and click the down arrow next to the Redshift Spectrum integration.
Edit the field you want to change. Note any field shadowed is not editable, and the integration must be disabled and re-installed to change it.
Download the Edit Script and run it in the Immuta Database in Amazon Redshift.
In Immuta, enter the credentials used to initially configure the integration.
Click Save.
Disabling Amazon Redshift Spectrum
Disabling the Amazon Redshift Spectrum integration is not supported when you set the fields nativeWorkspaceName, nativeViewName, and nativeSchemaName to create Redshift Spectrum data sources. Disabling the integration when these fields are used in metadata ingestion causes undefined behavior.
Click the App Settings icon in the navigation menu.
Navigate to the Integrations tab and click the down arrow next to the Amazon Redshift Spectrum integration.
Click the checkbox to disable the integration.
Enter the credentials that were used to initially configure the integration.
Click cleanup script to download the script.
Click Save.
Run the cleanup script in Amazon Redshift.
CAN MANAGE Databricks privilege on the clusterA Databricks workspace with the Premium tier, which includes cluster policies (required to configure the Spark integration)
A cluster that uses one of these supported Databricks Runtimes:
11.3 LTS
14.3 LTS
Supported languages
Python
R (not supported for Databricks Runtime 14.3 LTS)
A Databricks cluster that is one of these supported compute types:
Custom access mode
A Databricks workspace and cluster with the ability to directly make HTTP calls to the Immuta web service. The Immuta web service also must be able to connect to and perform queries on the Databricks cluster, and to call .
Enable OAuth M2M authentication (recommended) or personal access tokens.
Disable Photon by setting runtime_engine to STANDARD using the Clusters API. Immuta does not support clusters with Photon enabled. Photon is enabled by default on compute running Databricks Runtime 9.1 LTS or newer and must be manually disabled before setting up the integration with Immuta.
Restrict the set of Databricks principals who have CAN MANAGE where the Spark plugin is installed. This is to prevent editing , editing cluster policies, or removing the Spark plugin from the cluster, all of which would cause the Spark plugin to stop working.
If Databricks Unity Catalog is enabled in a Databricks workspace, you must use an Immuta cluster policy when you set up the Databricks Spark integration to create an Immuta-enabled cluster. See the section below for guidance.
If Databricks Unity Catalog is not enabled in your Databricks workspace, you must disable Unity Catalog in your Immuta tenant before proceeding with your configuration of Databricks Spark:
Navigate to the App Settings page and click Integration Settings.
Uncheck the Enable Unity Catalog checkbox.
Click the App Settings icon in Immuta.
Navigate to HDFS > System API Key and click Generate Key.
Click Save and then Confirm. If you do not save and confirm, the system API key will not be saved.
Scroll to the Integration Settings section.
Click + Add Native Integration and select Databricks Spark Integration from the dropdown menu.
Complete the Hostname field.
Enter a Unique ID for the integration. The unique ID is used to name cluster policies clearly, which is important when managing several Databricks Spark integrations. As cluster policies are workspace-scoped, but multiple integrations might be made in one workspace, this ID lets you distinguish between different sets of cluster policies.
Select the identity manager that should be used when mapping the current Spark user to their corresponding identity in Immuta from the Immuta IAM dropdown menu. This should be set to reflect the identity manager you use in Immuta (such as Entra ID or Okta).
Choose an Access Model. The Protected until made available by policy option , whereas the Available until protected by policy option allows it.
Behavior change
If a table is registered in Immuta and does not have a subscription policy applied to it, that data will be visible to users, even if the Protected until made available by policy setting is enabled.
If you have enabled this setting, author an "Allow individually selected users" global subscription policy that applies to all data sources.
Select the Storage Access Type from the dropdown menu.
Opt to add any Additional Hadoop Configuration Files.
Click Add Native Integration, and then click Save and Confirm. This will restart the application and save your Databricks Spark integration. (It is normal for this restart to take some time.)
The Databricks Spark integration will not do anything until your cluster policies are configured, so even though your integration is saved, continue to the next section to configure your cluster policies so the Spark plugin can manage authorization on the Databricks cluster.
Click Configure Cluster Policies.
Select one or more cluster policies in the matrix. Clusters running Immuta with Databricks Runtime 14.3 can only use Python and SQL. You can make changes to the policy by clicking Additional Policy Changes and editing the environment variables in the text field or by downloading it. See the Spark environment variables reference guide for information about each variable and its default value. Some common settings are linked below:
(you can also )
Select your Databricks Runtime.
Use one of the two installation types described below to apply the policies to your cluster:
Automatically push cluster policies: This option allows you to automatically push the cluster policies to the configured Databricks workspace. This will overwrite any cluster policy templates previously applied to this workspace.
Select the Automatically Push Cluster Policies radio button.
Click Close, and then click Save and Confirm.
Apply the cluster policy generated by Immuta to the cluster with the Spark plugin installed by following the .
Give users the Can Attach To permission on the cluster.
Before you can run spark-submit jobs on Databricks, complete the following steps.
Initialize the Spark session:
Enter these settings into the R submit script to allow the R script to access Immuta data sources, scratch paths, and workspace tables: immuta.spark.acl.assume.not.privileged="true" and spark.hadoop.immuta.databricks.config.update.service.enabled="false".
Once the script is written, upload the script to a location in dbfs/S3/ABFS to give the Databricks cluster access to it.
Because of how some user properties are populated in Databricks, load the SparkR library in a separate cell before attempting to use any SparkR functions.
To create the R spark-submit job,
Go to the Databricks jobs page.
Create a new job, and select Configure spark-submit.
Set up the parameters:
Note: The path dbfs:/path/to/script.R can be in S3 or ABFS (on Azure Databricks), assuming the cluster is configured with access to that path.
Edit the cluster configuration, and change the Databricks Runtime to be a .
Configure the section as you normally would for an Immuta cluster.
Before you can run spark-submit jobs on Databricks you must initialize the Spark session with the settings outlined below.
Configure the Spark session with immuta.spark.acl.assume.not.privileged="true" and spark.hadoop.immuta.databricks.config.update.service.enabled="false".
Note: Stop your Spark session (spark.stop()) at the end of your job or the cluster will not terminate.
The spark submit job needs to be launched using a different classloader which will point at the designated user JARs directory. The following Scala template can be used to handle launching your submit code using a separate classloader:
To create the Scala spark-submit job,
Build and upload your JAR to dbfs/S3/ABFS where the cluster has access to it.
Select Configure spark-submit, and configure the parameters:
Note: The fully-qualified class name of the class whose main function will be used as the entry point for your code in the --class parameter.
Note: The path dbfs:/path/to/code.jar can be in S3 or ABFS (on Azure Databricks) assuming the cluster is configured with access to that path.
Edit the cluster configuration, and change the Databricks Runtime to a .
Include IMMUTA_INIT_ADDITIONAL_JARS_URI=dbfs:/path/to/code.jar in the "Environment Variables" (where dbfs:/path/to/code.jar is the path to your jar) so that the jar is uploaded to all the cluster nodes.
The user mapping works differently from notebooks because spark-submit clusters are not configured with access to the Databricks SCIM API. The cluster tags are read to get the cluster creator and match that user to an Immuta user.
Privileged users (Databricks admins and allowlisted users) must be tied to an Immuta user and given access through Immuta to access data through spark-submit jobs because the setting immuta.spark.acl.assume.not.privileged="true" is used.
There is an option of using the immuta.api.key setting with an Immuta API key generated on the Immuta profile page.
Currently when an API key is generated it invalidates the previous key. This can cause issues if a user is using multiple clusters in parallel, since each cluster will generate a new API key for that Immuta user. To avoid these issues, manually generate the API key in Immuta and set the immuta.api.key on all the clusters or use a specified job user for the submit job.
Oracle is configured and data is registered through connections, an Immuta feature that allows you to register your data objects through a single connection to make data registration more scalable for your organization. Instead of registering schema and databases individually, you can register them all at once and allow Immuta to monitor your data platform for changes so that data sources are added and removed automatically to reflect the state of data in your data platform.
When the connection is registered, Immuta ingests and stores connection metadata in the Immuta metadata database. In the example below, the Immuta application administrator connects the database that contains marketing-data, research-data, and cs-data tables. Immuta registers these tables as data sources and stores the table metadata in the Immuta metadata database.
Immuta presents a hierarchical view of your data that reflects the hierarchy of objects in Oracle after registration is complete:
Host
Database
Schema
Table
Beyond making the registration of your data more intuitive, connections provides more control. Instead of performing operations on individual schemas or tables, you can perform operations (such as object sync) at the connection level.
See the Connections reference guide for details about connections and how to manage them. To configure your Oracle integration and register data, see the Register an Oracle connection guide.
The privileges that the Oracle integration requires align to the least privilege security principle. The table below describes each privilege required by the setup user and the IMMUTA_SYSTEM_ACCOUNT user.
GRANT ANY ROLE or GRANT ANY PRIVILEGE system privilege
Setup user
This privilege allows the user registering the connection to assign the SELECT_CATALOG_ROLE or SELECT privileges to the Immuta system account so that it can register and manage the connection.
SELECT on all the system views listed below:
V$DATABASE
CDB_PDBS
Immuta system account
This privilege provides access to all the Oracle system views necessary to register the connection and maintain state between the Oracle database and Immuta.
The following user actions spur various processes in the Oracle integration so that Immuta data remains synchronous with data in Oracle:
Data source created or updated: Immuta registers data source metadata and stores that metadata in the Immuta metadata database.
Data source deleted: Immuta deletes the data source metadata from the metadata database and removes subscription policies from that table.
While you can author and apply subscription and data policies on Oracle data sources in Immuta, these policies will not be enforced natively in the Oracle platform.
Tables
❌
❌
✅
Views
❌
❌
✅
Materialized views
❌
Immuta will not apply policies in this integration.
The Oracle integration supports username and password authentication to register a connection. The credentials provided must be for an account with the permissions listed in the Register an Oracle connection guide.
The following Immuta features are unsupported:
Subscription and data policies
Tag ingestion
Query audit
In the Databricks Spark integration, Immuta installs an Immuta-maintained Spark plugin on your Databricks cluster. When a user queries data that has been registered in Immuta as a data source, the plugin injects policy logic into the plan Spark builds so that the results returned to the user only include data that specific user should see.
The sequence diagram below breaks down this process of events when an Immuta user queries data in Databricks.
When data owners register Databricks securables in Immuta, the securable metadata is registered and Immuta creates a corresponding data source for those securables. The data source metadata is stored in the Immuta Metadata Database so that it can be referenced in policy definitions.
The image below illustrates what happens when a data owner registers the Accounts, Claims, and Customers securables in Immuta.
Users who are subscribed to the data source in Immuta can then query the corresponding securable directly in their Databricks notebook or workspace.
See the for details about the authentication methods supported for registering data.
When schema monitoring is enabled, Immuta monitors your servers to detect when new tables or columns are created or deleted, and automatically registers (or disables) those tables in Immuta. These newly updated data sources will then have any global policies and tags that are set in Immuta applied to them. The Immuta data dictionary will be updated with any column changes, and the Immuta environment will be in sync with your data environment.
For Databricks Spark, the automatic is disabled because of the . In this case, Immuta requires you to download a schema detection job template (a Python script) and import that into your Databricks workspace.
See the for instructions on enabling schema monitoring.
In Immuta, a Databricks data source is considered ephemeral, meaning that the compute resources associated with that data source will not always be available.
Ephemeral data sources allow the use of ephemeral overrides, user-specific connection parameter overrides that are applied to Immuta metadata operations.
When a user runs a Spark job in Databricks, the Immuta plugin automatically submits ephemeral overrides for that user to Immuta. Consequently, subsequent metadata operations for that user will use the current cluster as compute.
See the for more details about ephemeral overrides and how to configure or disable them.
The Spark plugin has the capability to send ephemeral override requests to Immuta. These requests are distinct from ephemeral overrides themselves. Ephemeral overrides cannot be turned off, but the Spark plugin can be configured to not send ephemeral override requests.
Tags can be used in Immuta in a variety of ways:
Use tags for global subscription or data policies that will apply to all data sources in the organization. In doing this, company-wide data security restrictions can be controlled by the administrators and governors, while the users and data owners need only to worry about tagging the data correctly.
Generate Immuta reports from tags for insider threat surveillance or data access monitoring.
Filter search results with tags in the Immuta UI.
The Databricks Spark integration cannot ingest tags from Databricks, but you can connect any of these to work with your integration.
You can also manage tags in Immuta by to your data sources and columns. Alternatively, you can use to automatically tag your sensitive data.
Immuta allows you to author subscription and data policies to automate access controls on your Databricks data.
Subscription policies: After registering data sources in Immuta, you can control who has access to specific securables in Databricks through Immuta subscription policies or by . Data users will only see the immuta database with no tables until they are granted access to those tables as Immuta data sources. See the for a list of policy types supported.
Data policies: You can create data policies to apply fine-grained access controls (such as restricting rows or masking columns) to manage what users can see in each table after they are subscribed to a data source. See the for details about specific types of data policies supported.
The image below illustrates how Immuta enforces a subscription policy that only allows users in the Analysts group to access the yellow-table.
See the for details about the benefits of using Immuta subscription and data policies.
Once a Databricks user who is subscribed to the data source in Immuta directly in their workspace, Spark Analysis initiates and the following events take place:
Spark calls down to the Metastore to get table metadata.
Immuta intercepts the call to retrieve table metadata from the Metastore.
Immuta modifies the Logical Plan to enforce policies that apply to that user.
The image below illustrates what happens when an Immuta user who is subscribed to the Customers data source queries the securable in Databricks.
Regardless of the policies on the data source, the users will be able to read raw data on the cluster if they meet one of the criteria listed below:
Databricks administrator is tied to an Immuta account
A Databricks user is listed as an ignored user (Users can be specified in the to become ignored users.)
Generally, Immuta prevents users from seeing data unless they are explicitly given access, which blocks access to raw sources in the underlying databases.
Databricks non-admin users will only see sources to which they are subscribed in Immuta, and this can present problems if organizations have a data lake full of non-sensitive data and Immuta removes access to all of it. To address this challenge, Immuta allows administrators to change this default setting when configuring the integration so that Immuta users can access securables that are not registered as a data source. Although this is similar to how privileged users in Databricks operate, non-privileged users cannot bypass Immuta controls.
See the for details about this setting.
Immuta projects combine users and data sources under a common purpose. Sometimes this purpose is for a single user to organize their data sources or to control an entire schema of data sources through a single projects screen; however, most often this is an Immuta purpose for which the data has been approved to be used and will restrict access to data and streamline team collaboration. Consequently, data owners can restrict access to data for a specified purpose through projects.
When a user is working within the context of a project, they will only see the data in that project. This helps to prevent data leaks when users collaborate. Users can switch project contexts to access various data sources while acting under the appropriate purpose.
When users change project contexts (either through the Immuta UI or with ), queries reflect users as acting under the purposes of that project, which may allow additional access to data if there are purpose restrictions on the data source(s). This process also allows organizations to track not just whether a specific data source is being used, but why.
See the for details about how to prevent users from switching project contexts in a session.
Users can have additional write access in their integration using project workspaces. Users can integrate a single or multiple workspaces with a single Immuta tenant.
See the for more details.
Immuta policies will not be automatically enforced in MariaDB
While you can author and apply subscription and data policies on MariaDB data sources within Immuta, these policies will not be enforced natively in the MariaDB platform. You can use Immuta webhooks to be notified about changes to user access and make appropriate access updates in MariaDB using your own process.
To use this integration, contact your Immuta representative.
The MariaDB integration allows you to register data from MariaDB in Immuta. Immuta supports MariaDB on Amazon RDS.
MariaDB is configured and data is registered through , an Immuta feature that allows you to register your data objects through a single connection to make data registration more scalable for your organization. Instead of registering schema and databases individually, you can register them all at once and allow Immuta to monitor your data platform for changes so that data sources are added and removed automatically to reflect the state of data in your data platform.
When the , Immuta ingests and stores connection metadata in the Immuta metadata database. In the example below, the Immuta application administrator connects the database that contains marketing-data, research-data, and cs-data tables. Immuta registers these tables as data sources and stores the table metadata in the Immuta metadata database.
Immuta presents a hierarchical view of your data that reflects the hierarchy of objects in MariaDB after registration is complete:
Host
Database
Data object
Beyond making the registration of your data more intuitive, connections provides more control. Instead of performing operations on individual schemas or tables, you can perform operations (such as object sync) at the connection level.
See the for details about connections and how to manage them. To configure your MariaDB integration and register data, see the .
The privileges that the MariaDB integration requires align to the least privilege security principle. The table below describes each privilege required by the setup user and the IMMUTA_SYSTEM_ACCOUNT user.
The following user actions spur various processes in the MariaDB integration so that Immuta data remains synchronous with data in MariaDB:
Data source created or updated: Immuta registers data source metadata and stores that metadata in the Immuta metadata database.
Data source deleted: Immuta deletes the data source metadata from the metadata database and removes subscription policies from that table.
Immuta will not apply policies in this integration.
The MariaDB integration supports the following authentication methods to register a connection:
Access using AWS IAM role (recommended): Immuta will assume this IAM role from Immuta's AWS account to request temporary credentials that it can use to perform operations in the registered MariaDB database. This option allows you to provide Immuta with an IAM role from your AWS account that is granted a trust relationship with Immuta's IAM role.
Access using access key and secret access key: These credentials are used by Immuta to register the connection and maintain state between Immuta and MariaDB. The access key ID and secret access key provided must be for an AWS account with the privileges listed in the .
The following Immuta features are unsupported:
Subscription and data policies
Identification
Tag ingestion
Query audit
Immuta policies will not be automatically enforced in MySQL
While you can author and apply subscription and data policies on MySQL data sources within Immuta, these policies will not be enforced natively in the MySQL platform. You can use Immuta webhooks to be notified about changes to user access and make appropriate access updates in MySQL using your own process.
To use this integration, contact your Immuta representative.
The MySQL integration uses connections to register data from MySQL in Immuta. Immuta supports the following deployment methods:
Amazon Aurora with MySQL
Amazon RDS with MySQL
MySQL is configured and data is registered through , an Immuta feature that allows you to register your data objects through a single connection to make data registration more scalable for your organization. Instead of registering schema and databases individually, you can register them all at once and allow Immuta to monitor your data platform for changes so that data sources are added and removed automatically to reflect the state of data in your data platform.
When the , Immuta ingests and stores connection metadata in the Immuta metadata database. In the example below, the Immuta application administrator connects the database that contains marketing-data, research-data, and cs-data tables. Immuta registers these tables as data sources and stores the table metadata in the Immuta metadata database.
Immuta presents a hierarchical view of your data that reflects the hierarchy of objects in MySQL after registration is complete:
Host
Database
Data object
Beyond making the registration of your data more intuitive, connections provides more control. Instead of performing operations on individual schemas or tables, you can perform operations (such as object sync) at the connection level.
See the for details about connections and how to manage them. To configure your MySQL connection, see the .
The privileges that the MySQL integration requires align with the least privilege security principle. The table below describes each privilege required by the setup user and the IMMUTA_SYSTEM_ACCOUNT user.
The following user actions spur various processes in the MySQL integration so that Immuta data remains synchronous with data in MySQL:
Data source created or updated: Immuta registers data source metadata and stores that metadata in the Immuta metadata database.
Data source deleted: Immuta deletes the data source metadata from the metadata database.
While you can author and apply subscription and data policies on MySQL data sources in Immuta, these policies will not be enforced natively in the MySQL platform.
Immuta will not apply policies in this integration.
The MySQL integration supports the following authentication methods when registering a connection:
Access using AWS IAM role (recommended): Immuta will assume this IAM role from Immuta's AWS account to request temporary credentials that it can use to perform operations in the registered MySQL database. This option allows you to provide Immuta with an IAM role from your AWS account that is granted a trust relationship with Immuta's IAM role.
Access using access key and secret access key: These credentials are used by Immuta to register the connection and maintain state between Immuta and MySQL. The access key ID and secret access key provided must be for an AWS account with the privileges listed in the .
The following Immuta features are unsupported:
Subscription and data policies
Identification
Tag ingestion
Query audit
[
"--conf","spark.driver.extraJavaOptions=-Djava.security.manager=com.immuta.security.ImmutaSecurityManager -Dimmuta.security.manager.classes.config=file:///databricks/immuta/allowedCallingClasses.json -Dimmuta.spark.encryption.fpe.class=com.immuta.spark.encryption.ff1.ImmutaFF1Service",
"--conf","spark.executor.extraJavaOptions=-Djava.security.manager=com.immuta.security.ImmutaSecurityManager -Dimmuta.security.manager.classes.config=file:///databricks/immuta/allowedCallingClasses.json -Dimmuta.spark.encryption.fpe.class=com.immuta.spark.encryption.ff1.ImmutaFF1Service",
"--conf","spark.databricks.repl.allowedLanguages=python,sql,scala,r",
"dbfs:/path/to/script.R",
"arg1", "arg2", "..."
]package com.example.job
import java.net.URLClassLoader
import java.io.File
import org.apache.spark.sql.SparkSession
object ImmutaSparkSubmitExample {
def main(args: Array[String]): Unit = {
val jarDir = new File("/databricks/immuta/jars/")
val urls = jarDir.listFiles.map(_.toURI.toURL)
// Configure a new ClassLoader which will load jars from the additional jars directory
val cl = new URLClassLoader(urls)
val jobClass = cl.loadClass(classOf[ImmutaSparkSubmitExample].getName)
val job = jobClass.newInstance
jobClass.getMethod("runJob").invoke(job)
}
}
class ImmutaSparkSubmitExample {
def getSparkSession(): SparkSession = {
SparkSession.builder()
.appName("Example Spark Submit")
.enableHiveSupport()
.config("immuta.spark.acl.assume.not.privileged", "true")
.config("spark.hadoop.immuta.databricks.config.update.service.enabled", "false")
.getOrCreate()
}
def runJob(): Unit = {
val spark = getSparkSession
try {
val df = spark.table("immuta.<YOUR DATASOURCE>")
// Run Immuta Spark queries...
} finally {
spark.stop()
}
}
} [
"--conf","spark.driver.extraJavaOptions=-Djava.security.manager=com.immuta.security.ImmutaSecurityManager -Dimmuta.security.manager.classes.config=file:///databricks/immuta/allowedCallingClasses.json -Dimmuta.spark.encryption.fpe.class=com.immuta.spark.encryption.ff1.ImmutaFF1Service",
"--conf","spark.executor.extraJavaOptions=-Djava.security.manager=com.immuta.security.ImmutaSecurityManager -Dimmuta.security.manager.classes.config=file:///databricks/immuta/allowedCallingClasses.json -Dimmuta.spark.encryption.fpe.class=com.immuta.spark.encryption.ff1.ImmutaFF1Service",
"--conf","spark.databricks.repl.allowedLanguages=python,sql,scala,r",
"--class","org.youorg.package.MainClass",
"dbfs:/path/to/code.jar",
"arg1", "arg2", "..."
]subOpt to fill out the Resource field with a URI of the resource where the requested token will be used.
Enter the x509 Certificate Thumbprint. This identifies the corresponding key to the token and is often abbreviated as x5t or is called kid (key identifier).
Upload the PEM Certificate, which is the client certificate that is used to sign the authorization request.
GRANT TEMP ON DATABASE
REVOKE ALL PRIVILEGES ON DATABASE
Complete the Host and Port fields.
Enter an Immuta Database. This is a new database where all secure schemas and Immuta created views will be stored.
Opt to check the Enable Impersonation box and customize the Impersonation Role name as needed. This will allow users to natively impersonate another user.
Select Manual and download both of the bootstrap scripts from the Setup section. The specified role used to run the bootstrap needs to have the permissions listed above for a new database.
Run the bootstrap script (initial database) in the Redshift initial database.
Run the bootstrap script (Immuta database) in the new Immuta Database in Redshift.
Choose username and password as your authentication method, and enter the credentials from the bootstrap script for the Immuta_System_Account.
Click Save.
Then, add your external tables to the Immuta Database.
SYS.DBA_USERSSYS.DBA_TABLES
SYS.DBA_VIEWS
SYS.DBA_MVIEWS
SYS.DBA_TAB_COLUMNS
SYS.DBA_OBJECTS
SYS.DBA_CONSTRAINTS
SYS.DBA_CONS_COLUMNS
❌
✅

The Physical Plan is applied and filters out and transforms raw data coming back to the user.
The user sees policy-enforced data.




Username and password: These credentials are used by Immuta to register the connection and maintain state between Immuta and MariaDB. The credentials provided must be for a MariaDB user account with the privileges listed in the Register a MariaDB connection guide.
Root user or GRANT OPTION privilege
Setup user
This privilege is required so that the setup user can grant privileges to the Immuta system account.
SHOW DATABASES on all databases in the server
Immuta system account
This privilege allows the Immuta system account to discover new databases to keep data in MariaDB and Immuta in sync.
SHOW VIEW on all views in the server
Immuta system account
This privilege allows the Immuta system account to access view definitions.
SELECT on all databases, tables, and views in the server
Immuta system account
Base tables
❌
❌
✅
Views
❌
❌
✅

This privilege allows the Immuta system account to connect to MariaDB and register the databases and their objects.
Username and password: These credentials are used by Immuta to register the connection and maintain state between Immuta and MySQL. The credentials provided must be for a MySQL user account with the privileges listed in the Register a MySQL connection guide.
Root user or GRANT OPTION privilege
Setup user
This privilege is required so that the setup user can grant privileges to the Immuta system account.
SHOW DATABASES on all databases in the server
Immuta system account
This privilege allows the Immuta system account to discover new databases to keep data in MySQL and Immuta in sync.
SHOW VIEW on all views in the server
Immuta system account
This privilege allows the Immuta system account to access view definitions.
SELECT on all databases, tables, and views in the server
Immuta system account
Base tables
❌
❌
✅
Views
❌
❌
✅

This privilege allows the Immuta system account to list columns required for collecting metadata about the data objects.
SQL
Click Save.
Enter your Admin Token. This token must be for a user who has the required Databricks privilege. This will give Immuta temporary permission to push the cluster policies to the configured Databricks workspace and overwrite any cluster policy templates previously applied to the workspace.
Click Apply Policies.
Manually push cluster policies: Enabling this option allows you to manually push the cluster policies and the init script to the configured Databricks workspace.
Select the Manually Push Cluster Policies radio button.
Click Download Init Script and set the Immuta plugin init script as a cluster-scoped init script in Databricks by following the Databricks documentation.
Click Download Policies, and then workspace.
Ensure that the init_scripts.0.workspace.destination in the policy matches the file path to the init script you configured above.
The Immuta cluster policy references Databricks Secrets for several of the sensitive fields. These secrets must be manually created if the cluster policy is not automatically pushed. Use Databricks API or CLI to push the proper secrets.
Databricks service principal with the following privileges. For instructions on setting up this user, see the Creating the Databricks service principal section:
USE CATALOG and MANAGE on all catalogs containing securables you want registered as Immuta data sources.
USE SCHEMA on all schemas containing securables you want registered as Immuta data sources.
MODIFY and SELECT on all securables you want registered as Immuta data sources.
Additional privileges are required for query audit:
USE CATALOG on the system catalog
USE SCHEMA on the system.access
Databricks user to run the script to register the connection with the following privileges:
Metastore admin and account admin
CREATE CATALOG privilege on the Unity Catalog metastore to create an Immuta-owned catalog and tables
See the Databricks documentation for more details about Unity Catalog privileges and securable objects.
Unity Catalog metastore created and attached to a Databricks workspace.
Unity Catalog enabled on your Databricks cluster or SQL warehouse. All SQL warehouses have Unity Catalog enabled if your workspace is attached to a Unity Catalog metastore. Immuta recommends linking a SQL warehouse to your Immuta tenant rather than a cluster for both performance and availability reasons.
Create a separate Immuta catalog for each Immuta tenant
If multiple Immuta tenants are connected to your Databricks environment, create a separate Immuta catalog for each of those tenants. Having multiple Immuta tenants use the same Immuta catalog causes failures in policy enforcement.
Click Data and select the Connections tab in the navigation menu.
Click the + Add Connection button.
Select the Databricks data platform tile.
Enter the connection information:
Host: The hostname of your Databricks workspace.
Port: Your Databricks port.
HTTP Path: The HTTP path of your Databricks cluster or SQL warehouse.
Click Next.
Select your authentication method from the dropdown:
Access Token: Enter the Access Token in the Immuta System Account Credentials section. This is the access token for the Immuta service principal, which can be . This service principal must have the metastore for the metastore associated with the Databricks workspace. If this token is configured to expire, update this field regularly for the connection to continue to function. This authentication information will be included in the script populated later on the page.
OAuth M2M:
Copy the provided script and run it in Databricks as a user with the .
Click Validate Connection.
If the connection is successful, click Next. If there are any errors, check the connection details and credentials to ensure they are correct and try again.
Ensure all the details are correct in the summary and click Complete Setup.
Private preview: This feature is only available to select accounts. Contact your Immuta representative to enable this feature.
Requirements:
A configured Databricks Unity Catalog connection
Fewer than 10,000 Databricks Unity Catalog data sources registered in Immuta
To allow Immuta to automatically import table and column tags from Databricks Unity Catalog, enable Databricks Unity Catalog tag ingestion in the external catalog section of the Immuta app settings page.
Navigate to the App Settings page.
Scroll to 2 External Catalogs, and click Add Catalog.
Enter a Display Name and select Databricks Unity Catalog from the dropdown menu.
Click Save and confirm your changes.
If you need instruction for setting up your Databricks service principal before registering your connection, see the steps below.
In Databricks, create a service principal with the privileges listed below. Immuta uses this service principal continuously to orchestrate Unity Catalog policies and maintain state between Immuta and Databricks.
USE CATALOG and MANAGE on all catalogs containing securables you want registered as Immuta data sources.
USE SCHEMA on all schemas containing securables you want registered as Immuta data sources.
MODIFY and SELECT on all securables you want registered as Immuta data sources. The MODIFY privilege is not required for materialized views registered as Immuta data sources, since MODIFY is not a supported privilege on that object type in .
MANAGE and MODIFY are required so that the service principal can apply row filters and column masks on the securable; to do so, the service principal must also have SELECT on the securable as well as USE CATALOG on its parent catalog and USE SCHEMA on its parent schema. Since privileges are inherited, you can grant the service principal the MODIFY and SELECT privilege on all catalogs or schemas containing Immuta data sources, which automatically grants the service principal the MODIFY and SELECT privilege on all current and future securables in the catalog or schema. The service principal also inherits MANAGE from the parent catalog for the purpose of applying row filters and column masks, but that privilege must be set directly on the parent catalog in order for grants to be fully applied.
See the Databricks documentation for more details about Unity Catalog privileges and securable objects.
Audit is enabled by default on all Databricks Unity Catalog connections. If you need to turn audit off, create the connection with the connections API and set audit to false in the payload.
Grant the service principal access to the Databricks Unity Catalog system tables. For Databricks Unity Catalog audit to work, Immuta must have, at minimum, the following access.
USE CATALOG on the system catalog
USE SCHEMA on the system.access and system.query schemas
SELECT on the following system tables:
system.access.table_lineage
system.access.column_lineage
system.access.audit
Access to system tables is governed by Unity Catalog. No user has access to these system schemas by default. To grant access, a user that is both a metastore admin and an account admin must grant USE_SCHEMA and SELECT privileges on the system schemas to the service principal. See .
Immuta supports the following PostgreSQL versions:
PostgreSQL 16
PostgreSQL 17
Data consumers must access data directly through PostgreSQL. Immuta governs PostgreSQL data for consumers accessing data directly through PostgreSQL. Transactional use cases where users access data through downstream applications that are writing data from PostgreSQL are outside of the scope of Immuta’s governance.
The user registering the connection must have the permissions below.
APPLICATION_ADMIN Immuta permission
The account credentials you provide to register the connection must have these PostgreSQL privileges:
Database superuser OR all of the privileges listed below.
CREATEROLE
CONNECT on the databases to be protected WITH GRANT OPTION
USAGE on the schemas to be protected WITH GRANT OPTION
The following privileges on tables to be protected WITH GRANT OPTION:
SELECT
DELETE
For descriptions and explanations of privileges Immuta needs to enforce policies and maintain state in PostgreSQL, see the .
In your PostgreSQL environment, create an Immuta database that Immuta can use to connect to your PostgreSQL instance to register the connection and maintain state with PostgreSQL.
Having this separate database for Immuta prevents custom ETL processes or jobs deleting the database you use to register the connection, which would break the connection.
In Immuta, click Data and select Connections in the navigation menu.
Click the + Add Connection button.
Select the PostgreSQL tile.
Select your deployment type:
Self-Managed
Aurora
Enter the host connection information:
Display Name: This is the name of your new connection. This name will be used in the API (connectionKey), in data source names from the host, and on the connections page. Avoid the use of periods (.) or
Enter privileged credentials to register the connection. Select your deployment method below for guidance.
Select an authentication method from the dropdown menu.
Provide the access key ID and secret access key for an AWS account with the PostgreSQL privileges outlined above.
Immuta will assume this IAM role from Immuta's AWS account in order to perform any operations in your AWS account.
Create an IAM policy with the following permissions. You will attach this to your service principal once created.
and select AWS Account as the trusted entity type. This role will be used by Immuta to set up the connection and orchestrate policies.
Add the IAM policy from step 1 to your service principal. These permissions will allow the service principal to register data sources and apply policies on Immuta's behalf. Before proceeding, contact your Immuta representative and provide your service principal's IAM role. Immuta will allowlist the service principal so that Immuta can successfully assume that role. Your Immuta representative will provide the account and external ID to add to your trust relationship. Then, complete the steps below.
Add a trust policy to your service principal, replacing <AWS ACCOUNT ID>
In Immuta, enter the IAM role name of your service principal in the Username field.
Enter the Role ARN. Immuta will assume this role when interacting with AWS.
Enter the External ID provided by Immuta.
Username and Password
Enter the credentials for an AWS account with the .
Enter the Username and Password of a PostgreSQL account with the PostgreSQL privileges outlined above.
Click Save connection.
Requirement: USER_ADMIN Immuta permission
Map AWS IAM principals or PostgreSQL usernames to each Immuta user account to ensure Immuta properly enforces policies.
The instructions below illustrate how to do this for individual users, but you can also configure user mapping in your IAM connection on the app settings page.
Click People and select Users in the navigation menu.
Click the user's name to navigate to their page and scroll to the External User Mapping section.
Select your deployment method below for guidance on mapping users.
If your PostgreSQL users are using AWS IAM roles or AWS IDC users,
Click Edit in the AWS User row.
Use the dropdown menu to select the User Type. User and role names are case-sensitive. See the for details.
: Only a single Immuta user can be mapped to an IAM role. This restriction prohibits enforcing policies on AWS users who could assume that role. Therefore, if using role principals, create a new user in Immuta that represents the role so that the role then has the permissions applied specifically to it.
AWS Identity Center user IDs: You must use the numeric User ID
Click Save.
See the for details about supported principals.
If your PostgreSQL users are using PostgreSQL usernames,
Click Edit in the PostgreSQL row.
Select one of the following options from the dropdown:
Select PostgreSQL Username to map the PostgreSQL username to the Immuta user and enter the PostgreSQL username in the field. Username mapping is case insensitive.
Click Edit in the PostgreSQL row.
Select one of the following options from the dropdown:
Select PostgreSQL Username to map the PostgreSQL username to the Immuta user and enter the PostgreSQL username in the field. Username mapping is case insensitive.
Select Unset (fallback to Immuta username) to use the Immuta username as the assumed PostgreSQL username. Use this option if the user's PostgreSQL username exactly matches the user's Immuta username. Username mapping is case insensitive.
Select None (user does not exist in PostgreSQL) if this is an Immuta-only user. This option will improve performance for Immuta users who do not have a mapping to PostgreSQL users and will be automatically selected by Immuta if an Immuta user is not found in PostgreSQL. To ensure your PostgreSQL users have policies correctly applied, manually map their usernames using the first option above.
Click Save.
Public preview: This integration is available to all accounts that request to enable it for their tenant. Contact your Immuta representative to enable it.
The Amazon Redshift integration allows you to configure your integration and register data from Amazon Redshift in Immuta in a single step. Once data is registered, Immuta can enforce policies on that data.
The Amazon Redshift integration is configured and data is registered through , an Immuta feature that allows you to register your data objects through a single connection to make data registration more scalable for your organization. Instead of registering schema and databases individually, you can register them all at once and allow Immuta to monitor your data platform for changes so that data sources are added and removed automatically to reflect the state of data in your data platform.
When the , Immuta ingests and stores connection metadata in the Immuta metadata database. In the example below, the Immuta application administrator connects the database that contains marketing-data, research-data, and cs-data tables. Immuta registers these tables as data sources and stores the table metadata in the Immuta metadata database.
Immuta presents a hierarchical view of your data that reflects the hierarchy of objects in Amazon Redshift after registration is complete:
Host
Database
Schema
Table or view
Beyond making the registration of your data more intuitive, connections provides more control. Instead of performing operations on individual schemas or tables, you can perform operations (such as object sync) at the connection level.
See the for details about connections and how to manage them. To configure your Amazon Redshift integration and register data, see the .
Immuta enforces read and write on Amazon Redshift tables by issuing SQL statements in Amazon Redshift that grant and revoke access to tables according to the policy.
When a user is subscribed to a data object registered in Immuta,
Immuta creates a role for that user in Amazon Redshift, if one doesn't already exist.
Amazon Redshift stores that role in its internal system catalog.
Immuta issues grants to that user's role in Amazon Redshift to enforce policy. The provides an example of this policy enforcement.
You can author in Immuta to enforce fine-grained access controls on Amazon Redshift data objects registered as Immuta data sources.
Once a data policy is applied to an Amazon Redshift data source in Immuta,
Immuta generates a masking or row-level policy in Amazon Redshift and attaches the policy to the data object it applies to.
When users query that data source in Amazon Redshift, the policy will dynamically apply to that data object so that users see policy-enforced data.
See the for a list of data policies supported for this integration.
See the for details about the Amazon Redshift privileges granted to users when they are subscribed to a data source protected by a subscription policy.
The privileges that the Amazon Redshift integration requires align to the least privilege security principle. The table below describes each privilege required by the setup user and the IMMUTA_SYSTEM_ACCOUNT user.
The following user actions initiate processes that keep Immuta data synchronous with data in Amazon Redshift:
Data source created or updated: Immuta registers data source metadata and stores that metadata in the Immuta metadata database.
Data source deleted: Immuta deletes the data source metadata from the metadata database.
: When a user account is mapped to Immuta, their metadata is stored in the metadata database.
Datashares privilege requirement
To allow Immuta to enforce access controls on datashares, you must include the WITH PERMISSIONS clause when creating the database from the datashare. You cannot add the WITH PERMISSIONS Amazon Redshift privilege after the database has been created. See the .
The Amazon Redshift integration allows users to author and to enforce access controls.
The following data policies are supported:
See the for details about policy enforcement.
The Amazon Redshift role configured as the policy exemption role in Immuta will be exempt from Immuta data policy enforcement. This role is created and managed in Amazon Redshift, not in Immuta.
If you have service or system accounts that need to be exempt from masking and row-level policy enforcement, add them to this role in Amazon Redshift. Then, role members will be exempt from having data policies applied to them when they query Immuta-protected tables in Amazon Redshift.
Typically, service or system accounts that perform the following actions are added to an exemption role in Amazon Redshift:
Automated queries
ETL
Report generation
The system account used to register data sources in Immuta will be automatically added to the exemption role for the Amazon Redshift securables it registers.
The Amazon Redshift integration supports username and password authentication to register a connection. The credentials provided must be for an account with the permissions listed in the .
The built-in Immuta IAM can be used as a complete solution for authentication and user entitlement. However, you can connect your existing identity management provider to Immuta to use that system for authentication and user entitlement instead. Each of the includes a set of configuration options that enable Immuta to communicate with the IAM system and map the users, permissions, groups, and attributes into Immuta.
For policies to impact the right users, the user account in Immuta must be mapped to the user account in Amazon Redshift. You can ensure these accounts are mapped correctly in the following ways:
: If usernames in Amazon Redshift align with usernames in the external IAM and those accounts align with an IAM attribute, you can enter that IAM attribute on the app settings page to automatically map user IDs in Immuta to Amazon Redshift.
: You can manually map user IDs for individual users.
For guidance on connecting your IAM to Immuta, see the .
The following Immuta features are unsupported:
Amazon Redshift Spectrum: See the for details about registering Amazon Redshift Spectrum data sources in Immuta. However, if you are using data policies on your Redshift Spectrum data sources, you cannot use the AWS Lake Formation integration. Instead, use the .
Several data policy types are unsupported. See the section for a list of supported data policies.
Impersonation
Deprecation notice
Support for configuring the Snowflake integration using this legacy workflow has been deprecated. Instead, configure your integration and register your data using .
Public preview: This integration is available to all accounts that request to enable it for their tenant. Contact your Immuta representative to enable it.
The PostgreSQL integration allows you to register data from PostgreSQL in Immuta and enforce subscription policies on that data. Immuta supports the following deployment methods:
Amazon Aurora with PostgreSQL
INSERT
TRUNCATE
UPDATE
Neon
Hostname
Port
Database: Enter the name of the Immuta database you created in your PostgreSQL environment. Immuta will register all supported databases and data objects through this database connection.
Unset (fallback to Immuta username): When selecting this option, the AWS username is assumed to be the same as the Immuta username.
Select Unset (fallback to Immuta username) to use the Immuta username as the assumed PostgreSQL username. Use this option if the user's PostgreSQL username exactly matches the user's Immuta username. Username mapping is case insensitive.
Select None (user does not exist in PostgreSQL) if this is an Immuta-only user. This option will improve performance for Immuta users who do not have a mapping to PostgreSQL users and will be automatically selected by Immuta if an Immuta user is not found in PostgreSQL. To ensure your PostgreSQL users have policies correctly applied, manually map their usernames using the first option above.
Click Save.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "rds-db:connect",
"Resource": "arn:aws:rds-db:<REGION>:<AWS_ACCOUNT_ID>:dbuser:<RDS_DB_RESOURCE_ID>/<DB_USERNAME>"
}
]
}{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "assumeRole",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<AWS ACCOUNT ID>:root"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "<EXTERNAL ID>"
}
}
}
]
}system.querySELECT on the following system tables:
system.access.table_lineage
system.access.column_lineage
system.access.audit
system.query.history
Display Name: The display name represents the unique name of your connection and will be used as prefix in the name for all data objects associated with this connection. It will also appear as the display name in the UI and will be used in all API calls made to update or delete the connection. Avoid the use of periods (.) or restricted words in your connection name.
AWS Databricks:
Follow Databricks documentation to create a client secret for the Immuta service principal and assign this service principal the privileges listed for the metastore associated with the Databricks workspace.
Fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.cloud.databricks.com/oidc/v1/token.
Fill out the Client ID. This is a combination of letters, numbers, or symbols, used as a public identifier and is the .
Enter the Scope (string). The scope limits the operations and roles allowed in Databricks by the access token. See the for details about scopes.
Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
Azure Databricks:
Follow to create a service principal within Azure and then populate to your Databricks account and workspace.
Assign this service principal the for the metastore associated with the Databricks workspace.
system.query.history
immuta_<username> role, which allows them to use the privileges granted to that role by Immuta.The following privileges WITH GRANT OPTION on objects registered in Immuta:
DELETE
INSERT
Immuta system account
These privileges allow Immuta to apply read and write subscription policies on tables registered in Immuta.
User subscribed to a data source: When a user is added to a data source by a data owner or through a subscription policy, Immuta creates a role for that user (if a role for them does not already exist) and grants Amazon Redshift privileges to that role.
Automatic subscription policy applied to or updated on a data source: Immuta calculates the users and data sources affected by the policy change and grants or revokes users' privileges on the data. See the Protecting data page for details about this process.
Subscription policy deleted: Immuta revokes privileges from the affected roles.
Data policy created or updated: Immuta calculates the users and data sources affected by the data policy change and attaches the policy to the data object in Amazon Redshift.
Data policy deleted: Immuta removes the policy from the data object in Amazon Redshift.
User removed from a data source: Immuta revokes privileges from the user's role.
✅
Only show rows (matching)
Query audit
Tag ingestion
Database superuser or the following privileges:
CREATEDB
CREATE USER
sys:secadmin role
USAGE on all databases and schemas that contain data you want to register
The following privileges WITH GRANT OPTION on objects registered in Immuta:
DELETE
INSERT
CREATE ROLE
Setup user
These privileges allow the user registering the connection to
assign the required roles and privileges to the Immuta system account so that it can register the connection and manage the integration.
create an Immuta database that Immuta will use to connect to the Amazon Redshift instance and maintain state with the registered databases.
create a policy exemption role.
USAGE on all the databases and schemas that will be registered
Immuta system account
This privilege allows Immuta to crawl the database and discover database objects so it can register the Amazon Redshift data objects.
CREATE ROLE
Immuta system account
This privilege is required so that Immuta can create Redshift roles to enforce access controls.
Database superuser or have the sys:secadmin role
Immuta system account
Tables
✅
✅
✅
Views
✅
✅
✅
Datashares
✅

This role allows Immuta to apply masking and row-level policies to Redshift securables.
✅
Before configuring the integration, review the Warehouse sizing recommendations guide to ensure that you use Snowflake compute resources cost effectively.
The permissions outlined in this section are the Snowflake privileges required for a basic configuration. See the Snowflake reference guide for a list of privileges necessary for additional features and settings.
APPLICATION_ADMIN Immuta permission
The Snowflake user running the installation script must have the following privileges:
CREATE DATABASE ON ACCOUNT WITH GRANT OPTION
CREATE ROLE ON ACCOUNT WITH GRANT OPTION
CREATE USER ON ACCOUNT WITH GRANT OPTION
MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION
APPLY MASKING POLICY ON ACCOUNT WITH GRANT OPTION
APPLY ROW ACCESS POLICY ON ACCOUNT WITH GRANT OPTION
The Snowflake user must have the following privileges on all securables:
USAGE on all databases and schemas with registered data sources
REFERENCES on all tables and views registered in Immuta
Different accounts
The setup account used to enable the integration must be different from the account used to register data sources in Immuta.
Snowflake resource names: Use uppercase for the names of the Snowflake resources you create below.
Click the App Settings icon in the navigation menu.
Click the Integrations tab.
Click the +Add Integration button and select Snowflake from the dropdown menu.
Complete the Host, Port, and Default Warehouse fields.
Opt to check the Enable Project Workspace box. This will allow for managed write access within Snowflake. Note: Project workspaces still use Snowflake views, so the default role of the account used to create the data sources in the project must be added to the Excepted Roles List. This option is unavailable when is enabled.
Opt to check the Enable Impersonation box and customize the Impersonation Role to allow Immuta users to impersonate another user. You cannot edit this choice after you configure the integration. Once you finish configuring the integration, you can grant the IMPERSONATE_USER permission to Immuta users. See the for instructions.
is enabled by default.
Configure the by scrolling to Integrations Settings and find the Snowflake Audit Sync Schedule section.
Enter how often, in hours, you want Immuta to ingest audit events from Snowflake as an integer between 1 and 24.
Altering parameters in Snowflake at the account level may cause unexpected behavior of the Snowflake integration in Immuta
The QUOTED_IDENTIFIERS_IGNORE_CASE parameter must be set to false (the default setting in Snowflake) at the account level. Changing this value to true causes unexpected behavior of the Snowflake integration.
You have two options for configuring your Snowflake environment:
Automatic setup: Grant Immuta one-time use of credentials to automatically configure your Snowflake environment and the integration.
Manual setup: Run the Immuta script in your Snowflake environment yourself to configure your Snowflake environment and the integration.
Required permissions: When performing an automatic setup, the credentials provided must have the permissions listed above.
The setup will use the provided credentials to create a user called IMMUTA_SYSTEM_ACCOUNT and grant the following privileges to that user:
CREATE ROLE ON ACCOUNT WITH GRANT OPTION
APPLY MASKING POLICY ON ACCOUNT WITH GRANT OPTION
APPLY ROW ACCESS POLICY ON ACCOUNT WITH GRANT OPTION
MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION
Alternatively, you can use the manual setup method and edit the provided script to grant the Immuta system account OWNERSHIP on the objects that Immuta will secure, instead of granting MANAGE GRANTS ON ACCOUNT. The current role that has OWNERSHIP on the securables will need to be granted to the Immuta system role. However, if granting OWNERSHIP instead of MANAGE GRANTS ON ACCOUNT, Immuta will not be able to manage the role that is granted to the account, so it is recommended to run the script as-is, without changes.
These credentials will be used to create and configure a new IMMUTA database within the specified Snowflake instance. The credentials are not stored or saved by Immuta, and Immuta doesn’t retain access to them after initial setup is complete.
You can create a new account for Immuta to use that has these privileges, or you can grant temporary use of a pre-existing account. By default, the pre-existing account with appropriate privileges is ACCOUNTADMIN. If you create a new account, it can be deleted after initial setup is complete.
From the Select Authentication Method Dropdown, select one of the following authentication methods:
Username and Password (Not recommended): Complete the Username, Password, and Role fields.
Complete the Username field. This user must be assigned the public key in Snowflake.
When using an encrypted private key, enter the private key file password in the Additional Connection String Options. Use the following format: PRIV_KEY_FILE_PWD=<your_pw>
Click Key Pair (Required), and upload a Snowflake private key pair file.
Complete the Role field.
Required permissions: When performing a manual setup, the Snowflake user running the script must have the permissions listed above.
It will create a user called IMMUTA_SYSTEM_ACCOUNT, and grant the following privileges to that user:
CREATE ROLE ON ACCOUNT WITH GRANT OPTION
APPLY MASKING POLICY ON ACCOUNT WITH GRANT OPTION
APPLY ROW ACCESS POLICY ON ACCOUNT WITH GRANT OPTION
MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION
Alternatively, you can grant the Immuta system account OWNERSHIP on the objects that Immuta will secure, instead of granting MANAGE GRANTS ON ACCOUNT. The current role that has OWNERSHIP on the securables will need to be granted to the Immuta system role. However, if granting OWNERSHIP instead of MANAGE GRANTS ON ACCOUNT, Immuta will not be able to manage the role that is granted to the account, so it is recommended to run the script as-is, without changes.
Select Manual.
Use the Dropdown Menu to select your Authentication Method:
Username and password (Not recommended): Enter the Username and Password and set them in the bootstrap script for the Immuta system account credentials.
: Upload the Key Pair file and when using an encrypted private key, enter the private key file password in the Additional Connection String Options. Use the following format: PRIV_KEY_FILE_PWD=<your_pw>
Snowflake External OAuth:
. Note that if you have an existing security integration, . The Immuta system role will be the Immuta database provided above with _SYSTEM. If you used the default database name it will be IMMUTA_SYSTEM.
In the Setup section, click bootstrap script to download the script. Then, fill out the appropriate fields and run the bootstrap script in Snowflake.
If you enabled a Snowflake workspace, select Warehouses from the dropdown menu that will be available to project owners when creating Snowflake workspaces. Select from a list of all the warehouses available to the privileged account entered above. Note that any warehouse accessible by the PUBLIC role does not need to be explicitly added.
Enter the Excepted Roles/User List. Each role or username (both case-sensitive) in this list should be separated by a comma. Wildcards are unsupported.
Excepted roles/users will have no policies applied to queries
Any user with the username or acting under the role in this list will have no policies applied to them when querying Immuta protected Snowflake tables in Snowflake. Therefore, this list should be used for service or system accounts and the default role of the account used to create the data sources in the Immuta projects (if you have Snowflake workspace enabled).
Click Save.
To allow Immuta to automatically import table and column tags from Snowflake, enable Snowflake tag ingestion in the external catalog section of the Immuta app settings page.
Requirements:
A configured Snowflake integration or connection
The Snowflake user configuring the Snowflake tag ingestion must have the following privileges and should be able to access all securables registered as data sources:
IMPORTED PRIVILEGES ON DATABASE snowflake
APPLY TAG ON ACCOUNT
Navigate to the App Settings page.
Scroll to 2 External Catalogs, and click Add Catalog.
Enter a Display Name and select Snowflake from the dropdown menu.
Enter the Account.
Enter the Authentication information based on your authentication method:
Username and password: Fill out Username and Password.
Key pair:
Enter the additional Snowflake details: Port, Default Warehouse, and Role.
Opt to enter the Proxy Host and Proxy Port.
Click the Test Connection button.
Click the Test Data Source Link.
Once both tests are successful, click Save.
Amazon RDS with PostgreSQL
Crunchy Data
Neon
Self-managed PostgreSQL
The sequence diagram below outlines the events that occur when an Immuta user who is subscribed to a data source queries that data in PostgreSQL.
PostgreSQL is configured and data is registered through connections, an Immuta feature that allows you to register your data objects through a single connection to make data registration more scalable for your organization. Instead of registering schema and databases individually, you can register them all at once and allow Immuta to monitor your data platform for changes so that data sources are added and removed automatically to reflect the state of data in your data platform.
During connection registration, you provide Immuta credentials with the privileges outlined on the Register a PostgreSQL connection page. When the connection is registered, Immuta ingests and stores connection metadata in the Immuta metadata database.
In the example below, the Immuta application administrator connects the database that contains marketing-data, research-data, and cs-data tables. Immuta registers these tables as data sources and stores the table metadata in the Immuta metadata database.
Immuta presents a hierarchical view of your data that reflects the hierarchy of objects in PostgreSQL after registration is complete:
Host
Database
Schema
Table
Beyond making the registration of your data more intuitive, connections provides more control. Instead of performing operations on individual schemas or tables, you can perform operations (such as object sync) at the connection level.
See the Connections reference guide for details about connections and how to manage them. To configure your PostgreSQL integration and register data, see the Register a PostgreSQL connection guide.
Immuta enforces read and write subscription policies on PostgreSQL tables by issuing SQL statements in PostgreSQL that grant and revoke access to tables according to the policy.
When a user is subscribed to a table registered in Immuta,
Immuta creates a role for that user in PostgreSQL, if one doesn't already exist.
PostgreSQL stores that role in its internal system catalog.
Immuta issues grants to that user's role in PostgreSQL to enforce policy. The Protecting data page provides an example of this policy enforcement.
See the Subscription policy access types page for details about the PostgreSQL privileges granted to users when they are subscribed to a data source protected by a subscription policy.
The privileges that the PostgreSQL integration requires align to the least privilege security principle. The table below describes each privilege required by the IMMUTA_SYSTEM_ACCOUNT user.
Database superuser OR all of the privileges listed below
This privilege is required so that the setup user can create and grant permissions to the Immuta system account role.
CONNECT on the database Immuta will protect WITH GRANT OPTION
This privilege allows Immuta to connect to the PostgreSQL database that contains the tables Immuta will protect.
USAGE on the schema Immuta will protect WITH GRANT OPTION
This privilege allows the Immuta system account to access schemas that contain tables it will protect.
CREATEROLE
Because PostgreSQL privileges are granted to roles, this privilege is required so that Immuta can create PostgreSQL roles and manage role membership to enforce access controls.
The following privileges WITH GRANT OPTION on tables registered in Immuta:
SELECT
INSERT
These privileges allow Immuta to apply read and write subscription policies on tables registered in Immuta. The ALTER TABLE privilege allows Immuta to enforce row-level policies, which will be available in a subsequent release.
The following user actions spur various processes in the PostgreSQL integration so that Immuta data remains synchronous with data in PostgreSQL:
Data source created: Immuta registers data source metadata and stores that metadata in the Immuta metadata database.
Data source deleted: Immuta deletes the data source metadata from the metadata database and removes subscription policies from that table.
User account is mapped to Immuta: When a user account is mapped to Immuta, their metadata is stored in the metadata database.
User subscribed to a data source: When a user is added to a data source by a data owner or through a subscription policy, Immuta creates a role for that user (if a role for them does not already exist) and grants PostgreSQL privileges to their role.
Automatic subscription policy applied to or updated on a data source: Immuta calculates the users and data sources affected by the policy change and grants or revokes users' privileges on the PostgreSQL table. See the for details about this process.
Subscription policy deleted: Immuta revokes privileges from the affected roles.
User removed from a data source: Immuta revokes privileges from the user's role.
The PostgreSQL integration allows users to author subscription policies to enforce access controls. Data policies are unsupported.
See the applying policies section for details about policy enforcement.
Table
✅
❌
✅
View
✅
❌
✅
Materialized view
✅
Immuta will not ingest the following objects:
The default postgres database and its objects will be ignored by object sync and will not be ingested into Immuta.
The PostgreSQL integration allows users to author subscription policies to enforce access controls. Data policies are unsupported.
See the applying policies section for details about policy enforcement.
The PostgreSQL integration supports the following authentication methods to register a connection:
Amazon Aurora and Amazon RDS deployments
Access using AWS IAM role (recommended): Immuta will assume this IAM role from Immuta's AWS account when interacting with the AWS API to perform any operations in your AWS account. This option allows you to provide Immuta with an IAM role from your AWS account that is granted a trust relationship with Immuta's IAM role.
Access using access key and secret access key: These credentials are used temporarily by Immuta to register the connection. The access key ID and secret access key provided must be for an AWS account with the permissions listed in the .
Neon and PostgreSQL deployments
Username and password: These credentials are used temporarily by Immuta to register the connection. The credentials provided must be for an account with the permissions listed in the .
The built-in Immuta IAM can be used as a complete solution for authentication and user entitlement. However, you can connect your existing identity management provider to Immuta to use that system for authentication and user entitlement instead. Each of the supported IAM protocols includes a set of configuration options that enable Immuta to communicate with the IAM system and map the users, permissions, groups, and attributes into Immuta.
For policies to impact the right users, the user account in Immuta must be mapped to the user account in PostgreSQL or AWS. You can ensure these accounts are mapped correctly in the following ways:
Automatically: If usernames in PostgreSQL or AWS align with usernames in the external IAM and those accounts align with an IAM attribute, you can enter that IAM attribute on the app settings page to automatically map user IDs in Immuta to PostgreSQL.
Manually: You can manually map user IDs for individual users.
For guidance on connecting your IAM to Immuta, see the how-to guide for your protocol.
Access can be managed in AWS using IAM users, roles, or Identity Center (IDC). Immuta supports all three methods for user provisioning in the Amazon Aurora or Amazon RDS with PostgreSQL deployments.
However, if you manage access in AWS through IAM roles instead of users, user provisioning in Immuta must be done using IAM role principals. This means that if users share IAM roles, you could end up in a situation where you over-provision access to everyone in the IAM role.
See the guidelines below for the best practices to avoid this behavior if you currently use IAM roles to manage access.
Enable AWS IAM Identity Center (IDC) (recommended): IDC is the best approach for user provisioning because it treats users as users, not users as roles. Consequently, access controls are enforced for the querying user, nothing more. This approach eliminates over-provisioning and permits granular access control. Furthermore, IDC uses trusted identity propagation, meaning AWS propagates a user's identity wherever that user may operate within the AWS ecosystem. As a result, a user's identity always remains known and consistent as they navigate across AWS services, which is a key requirement for organizations to properly govern that user. Enabling IDC does not impact any existing access controls; it is additive. Immuta will manage the GRANTs for you using IDC if it is enabled and configured in Immuta. See the map users section for instructions on mapping users from AWS IDC to user accounts in Immuta.
Create an IAM role per user: If you do not have IDC enabled, create an IAM role per user that is unique to that user and assign that IAM role to each corresponding user in Immuta. Ensure that the IAM role cannot be shared with other users. This approach can be a challenge because there is an IAM role max limit of 5,000 per AWS account.
Request on behalf of IAM roles (not recommended): Create users in Immuta that map to each of your existing IAM roles. Then, when users request access to data, they of the IAM role user rather than themselves. This approach is not recommended because everyone in that role will gain access to data when granted access through a policy, and adding future users to that role will also grant access. Furthermore, it requires policy authors and approvers to understand what role should have access to what data.
Names are case-sensitive
The IAM role name and IAM user name are case-sensitive. See the AWS documentation for details.
Immuta supports mapping an Immuta user to AWS in one of the following ways:
IAM role principals: Only a single Immuta user can be mapped to an IAM role. This restriction prohibits enforcing policies on AWS users who could assume that role. Therefore, if using role principals, create a new user in Immuta that represents the role so that the role then has the permissions applied specifically to it.
See the map users section for instructions on mapping principals to user accounts in Immuta.
The following Immuta features are unsupported:
Data policies
Impersonation
Tag ingestion
Query audit
Public preview: This feature is available to all accounts.
. The account in which this is set up is referred to as the admin account. This is the account that you will use to initially configure IAM and AWS Lake Formation permissions to give the Immuta service principal access to perform operations. The user in this account must be able to manage IAM permissions and Lake Formation permissions for all data in the Glue Data Catalog.
No AWS Lake Formation connections configured in the same Immuta instance for the same Glue Data Catalog.
The databases and tables you want Immuta to govern must be . Immuta cannot govern resources that use IAM access control or hybrid access mode. To ensure Immuta can govern your resources, verify that the default Data Catalog settings in AWS are unchecked. See the screenshot below and for instructions on changing these settings:
Enable AWS IAM Identity Center (IDC) (recommended): is the best approach for user provisioning because it treats users as users, not users as roles. Consequently, access controls are enforced for the querying user, nothing more. This approach eliminates over-provisioning and permits granular access control. Furthermore, IDC uses trusted identity propagation, meaning AWS propagates a user's identity wherever that user may operate within the AWS ecosystem. As a result, a user's identity always remains known and consistent as they navigate across AWS services, which is a key requirement for organizations to properly govern that user. Enabling IDC does not impact any existing access controls; it is additive. See the for instructions on mapping users from AWS IDC to user accounts in Immuta.
These are permissions that the user registering the connection must have in order to successfully complete setup.
APPLICATION_ADMIN Immuta permission to register the connection
Create LF-Tag AWS permission
The Immuta service principal is the AWS IAM role that Immuta will assume to perform operations in your AWS account. This role must have all the necessary permissions in AWS Glue and AWS Lake Formation to allow Immuta to register data sources and apply policies.
Create an IAM policy with the following AWS Lake Formation and AWS Glue permissions. You will attach this to your service principal once created.
and select AWS Account as the trusted entity type. This role will be used by Immuta to set up the connection and orchestrate AWS Lake Formation policies. Immuta will assume this IAM role from Immuta's AWS account in order to perform any operations in your AWS account.
Add the IAM policy from step 1 to your service principal. These permissions will allow the service principal to register data sources and apply policies on Immuta's behalf.
Add the service principal as an .
This method follows the principle of least privilege and is the most flexible way of granting permissions to the service principal. LF-Tags cascade down from databases to tables, while allowing for exceptions. This means that when you apply this tag to a database, it will automatically apply to all tables within that database and allow you to remove it from any tables if those should be out of the scope of Immuta’s governance.
Create a new LF-Tag, giving yourself permissions to grant that tag to a user, which will ultimately be your service principal.
In the Lake Formation console, navigate to LF-Tags and permissions and click Add LF-Tag. You will need the Create LF-Tag
Click Data and select Connections in the navigation menu.
Click the + Add Connection button.
Select the AWS Lake Formation tile.
Requirement: USER_ADMIN Immuta permission
Map AWS IAM principals to each Immuta user to ensure Immuta properly enforces policies.
Click People and select Users in the navigation menu.
Click the user's name to navigate to their page and scroll to the External User Mapping section.
Click Edit in the AWS User row.
See the for details about supported principals.
Public preview: This feature is available to all accounts.
In the AWS Lake Formation integration, Immuta orchestrates Lake Formation access controls on data registered in the Glue Data Catalog. Then, Immuta users who have been granted access to the Glue Data Catalog table or view can query it using one of these analytic engines:
Amazon Athena
Amazon EMR Spark
Amazon Redshift Spectrum
The sequence diagram below outlines the events that occur when an Immuta user who is subscribed to a data source submits a query in their AWS analytic engine.
See the for more details about Lake Formation access controls.
AWS Lake Formation is configured and data is registered through , an Immuta feature that allows you to register your data objects in a technology through a single connection to make data registration more scalable for your organization. Instead of registering schema and databases individually, you can register them all at once and allow Immuta to monitor your data platform for changes so that data sources are added and removed automatically to reflect the state of data on your data platform.
After you set up a data lake in Lake Formation and a Glue Data Catalog, you provide Immuta an AWS IAM role with the to register the Lake Formation connection.
Once the connection is registered in Immuta, Immuta ingests and stores connection metadata in the Immuta metadata database.
In the example below, the Immuta application administrator connects the Glue Data Catalog that contains marketing-data, research-data, and cs-data metadata. Immuta registers these tables as data sources and stores the table metadata in the Immuta metadata database.
Immuta presents a hierarchical view of your data that reflects the hierarchy of objects in the Glue Data Catalog. Beyond making the registration of your data more intuitive, connections provides more control. Instead of performing operations on individual schemas or tables, you can perform operations (such as object sync) at the connection level.
See the for details about connections and how to manage them. To configure your Lake Formation integration and register data, see the .
When an Immuta subscription policy is applied to data sources, Immuta calculates and stores the policy logic in the Immuta metadata database and generates an LF-Tag key and value that is applied to the table in AWS. When users are subscribed to the data source, Immuta issues grants either directly to the table (if they are manually subscribed to the data source by a data owner) or to the LF-Tag (if they are subscribed by an automatic subscription policy). See the for details about these policy types.
The table below outlines how two different automatic subscription policies authored in Immuta are orchestrated in Lake Formation.
The privileges Immuta issues to users when they are subscribed to a data source vary depending on the object type. See an outline of privileges granted by Immuta on the .
The following user actions spur various processes in the Lake Formation integration so that Immuta data remains synchronous with data in Lake Formation. The list below provides an overview of each process:
Data source created: Immuta registers data source metadata and stores that metadata in the Immuta metadata database.
Data source deleted: Immuta deletes the data source metadata from the metadata database and removes LF-Tags from that AWS resource.
Automatic subscription policy applied to or updated on a data source: Immuta calculates the users and data sources affected by the policy change and generates an LF-Tag key and value.
The image below illustrates these processes.
Private preview: This feature is only available to select accounts. Contact your Immuta representative to enable this feature.
When registering an AWS Lake Formation connection, you can opt to ingest Lake Formation Tags. If this option is enabled, then every data source in Immuta will have the Lake Formation Tags pulled in and automatically applied. Immuta will check every 24 hours for any relevant metadata changes in AWS Lake Formation.
Tag ingestion is an integration-wide setting and, once enabled, cannot be disabled on a data-source-by-data-source basis. Additionally, if enabled, no other external catalog can be linked to the AWS Lake Formation data sources.
The AWS Lake Formation integration allows users to author to enforce access controls. Data policies are unsupported.
See the for details about subscription policy enforcement.
The Lake Formation integration supports the following authentication methods to register a connection:
Access using AWS IAM role (recommended): Immuta will assume this IAM role from Immuta's AWS account when interacting with the AWS API to perform any operations in your AWS account. This option allows you to provide Immuta with an IAM role from your AWS account that is granted a trust relationship with Immuta's IAM role. Contact your Immuta representative for the AWS account to add to your trust policy.
Access using access key and secret access key: These credentials are used temporarily by Immuta to register the connection. The access key ID and secret access key provided must be for an AWS account with the AWS permissions listed in the .
Access can be managed in AWS using IAM users, roles, or Identity Center (IDC). Immuta for user provisioning in the Lake Formation integration.
However, if you manage access in AWS through IAM roles instead of users, user provisioning in Immuta must be done using IAM role principals. This means that if users share IAM roles, you could end up in a situation where you over-provision access to everyone in the IAM role.
See the guidelines below for the best practices to avoid this behavior if you currently use IAM roles to manage access.
Enable (recommended): IDC is the best approach for user provisioning because it treats users as users, not users as roles. Consequently, access controls are enforced for the querying user, nothing more. This approach eliminates over-provisioning and permits granular access control. Furthermore, IDC uses trusted identity propagation, meaning AWS propagates a user's identity wherever that user may operate within the AWS ecosystem. As a result, a user's identity always remains known and consistent as they navigate across AWS services, which is a key requirement for organizations to properly govern that user. Enabling IDC does not impact any existing access controls; it is additive. Immuta will manage the GRANTs for you using IDC if it is enabled and configured in Immuta. See the for instructions on mapping users from AWS IDC to user accounts in Immuta.
Create an IAM role per user: If you do not have IDC enabled, create an IAM role per user that is unique to that user and assign that IAM role to each corresponding user in Immuta. Ensure that the IAM role cannot be shared with other users. This approach can be a challenge because there is an .
Names are case-sensitive
The IAM role name and IAM user name are case-sensitive. See the for details.
Immuta supports mapping an Immuta user to AWS in one of the following ways:
: Only a single Immuta user can be mapped to an IAM role. This restriction prohibits enforcing policies on AWS users who could assume that role. Therefore, if using role principals, create a new user in Immuta that represents the role so that the role then has the permissions applied specifically to it.
See the for instructions on mapping principals to user accounts in Immuta.
Existing Amazon S3 integrations have no impact on AWS Lake Formation integrations; they can be used in tandem.
While the offers access control for raw object storage, the Lake Formation integration provides access control for Glue Data Catalog views and tables. Together, they offer support for every cloud-native data warehouse and lakehouse for AWS users.
You cannot use the AWS Lake Formation integration if you are using data policies on Redshift Spectrum data sources, since granting access to the underlying Glue table via the AWS Lake Formation integration would allow a user to bypass the row- and column-level security of the Immuta-managed view by querying the Glue table directly. Instead, use the .
Impersonation
User query audit
Private preview: This integration is available to all accounts.
The Google BigQuery integration allows users to query policy protected data directly in BigQuery as secure views within an Immuta-created dataset. Immuta controls who can see what within the views, allowing data governors to create complex ABAC policies and data users to query the right data within the BigQuery console.
Google BigQuery is configured through the Immuta console and a script provided by Immuta. While you can complete some steps within the BigQuery console, it is easiest to install using gcloud and the Immuta script.
Once Google BigQuery has been configured, BigQuery admins can start creating subscription and data policies to meet compliance requirements and users can start querying policy protected data directly in BigQuery.
Create a global or .
What permissions will Immuta have in my BigQuery environment?
You can find a list of the permissions the custom Immuta role has .
What integration features will Immuta support for BigQuery?
In this policy push integration, Immuta creates views that contain all policy logic. Each view has a 1-to-1 relationship with the original table. Access controls are applied in the view, allowing users to leverage Immuta’s powerful set of attribute-based policies and query data directly in BigQuery.
BigQuery is organized by projects (which can be thought of as databases), datasets (which can be compared to schemas), tables, and views. When you enable the integration, an Immuta dataset is created in BigQuery that contains the Immuta-required user entitlements information. These objects within the Immuta dataset are intended to only be used and altered by the Immuta application.
After data sources are registered, Immuta uses the custom user and role, created before the integration is enabled, to push the Immuta data sources as views into a mirrored dataset of the original table. Immuta manages grants on the created view to ensure only users subscribed to the Immuta data source will see the data.
The Immuta integration uses a mirrored dataset approach. That is, if the source dataset is named mydataset, Immuta will create a dataset named mydataset_secure, assuming that _secure is the specified Immuta dataset suffix. This mirrored dataset is an , allowing it to access the data of the original dataset. It will contain the Immuta-managed views, which have identical names to the original tables they’re based on.
Following the principle of least privilege, Immuta does not have permission to manage Google Cloud Platform users, specifically in granting or denying access to a project and its datasets. This means that data governors should limit user access to original datasets to ensure data users are accessing the data through the Immuta created views and not the backing tables. The only users who need to have access to the backing tables are the credentials used to register the tables in Immuta.
Additionally, a data governor must grant users access to the mirrored datasets that Immuta will create and populate with views. Immuta and BigQuery’s best practice recommendation is to grant access via groups in Google Cloud Platform. Because users still must be registered in Immuta and subscribed to an Immuta data source to be able to query Immuta views, all Immuta users can be granted access to the mirrored datasets that Immuta creates.
The definitions for each status and the state of configured data platform integrations is available in the .
This integration can only be enabled through a manual bootstrap using the Immuta API.
This integration can only be enabled to work in a single region.
BigQuery does not allow views partitioned by pseudo-columns. If you would like to partition a table by a pseudo-column and have Immuta govern it, take the following steps:
This integration supports the following policy types:
Column masking
Mask using hashing (SHA256())
Mask by making NULL
See the resources below to start implementing and using the BigQuery integration:
Building global and to govern data
Follow this guide to connect your Google BigQuery data warehouse to Immuta.
The Google BigQuery integration requires you to create a Google Cloud service account and role that will be used by Immuta to
create a Google BigQuery dataset that will be used to store a table of user entitlements, UDFs for policy enforcement, etc.
manage the table of user entitlements via updates when entitlements change in Immuta.
create datasets and secure views with access control policies enforced, which mirror tables inside of datasets you ingest as Immuta data sources.
You have two options to create the required Google Cloud service account and role:
The bootstrap.sh script is a shell script provided by Immuta that creates prerequisite Google Cloud IAM objects for the integration to connect. When you run this script from your command line, it will create the following items, scoped at the project-level:
A new Google Cloud IAM role
A new Google Cloud service account, which will be granted the newly-created role
A JSON keyfile for the newly-created service account
You will need to use the objects created in these steps to .
Google Cloud IAM roles required to run the script
To execute bootstrap.sh from your command line, you must be authenticated to the gcloud CLI utility as a user with all of the following roles:
roles/iam.roleAdmin
roles/iam.serviceAccountAdmin
roles/serviceusage.serviceUsageAdmin
Having these three roles is the least-privilege set of Google Cloud IAM roles required to successfully run the bootstrap.sh script from your command line. However, having either of the following Google Cloud IAM roles will also allow you to run the script successfully:
roles/editor
roles/owner
Install .
Set the account property in the core section for Google Cloud CLI to the account gcloud should use for authentication. (You can run gcloud auth list to see your currently available accounts):
In Immuta, navigate to the App Settings page and click the Integrations tab.
Alternatively, you may use the Google Cloud Console to create the prerequisite role, service account, and private key file for the integration to connect to Google BigQuery.
with the following privileges:
bigquery.datasets.create
bigquery.datasets.delete
Once the Google Cloud IAM custom role and service account are created, you can enable the Google BigQuery integration. This section illustrates how to enable the integration on the Immuta app settings page. To configure this integration via the Immuta API, see the .
In Immuta, navigate to the App Settings page and click the Integrations tab.
Click Add Integration and select Google BigQuery from the dropdown menu.
Click Select Authentication Method and select Key File.
GCP location must match dataset region
The region set for the GCP location must match the region of your datasets. Set GCP location to a general region (for example, US) to include child regions.
You can disable the Google BigQuery integration automatically or manually.
Click the App Settings icon, and then click the Integrations tab.
Select the Google BigQuery integration you would like to disable, and select the Disable Integration checkbox.
Click Save.
The privileges required to run the cleanup script are the same as the Google Cloud IAM roles required to run the bootstrap.sh script.
Click the App Settings icon, and then click the Integrations tab.
Select the Google BigQuery integration you would like to disable, and click Download Scripts.
Click Save. Wait until Immuta has finished saving your configuration changes before proceeding.
Build and
to securely collaborate on analytical workloads
SELECT
TRUNCATE
UPDATE
SELECTTRUNCATE
UPDATE
Within Immuta, fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.azuredatabricks.net/oidc/v1/token.
Fill out the Client ID. This is a combination of letters, numbers, or symbols, used as a public identifier and is the client ID displayed in Databricks when creating the client secret for the service principal (note that Azure Databricks uses the Azure SP Client ID; it will be identical).
Enter the Scope (string). The scope limits the operations and roles allowed in Databricks by the access token. See the OAuth 2.0 documentation for details about scopes.
Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
UPDATETRUNCATE
DELETE
ALTER TABLE
❌
✅
Partitioned table
❌
❌
✅


Immuta creates an LF-Tag key and value.
Immuta_policy=1234
Immuta_policy=5678
Immuta assigns the LF-Tag to the AWS resource in Lake Formation.
Assign tag Immuta_policy=1234 to research-data and marketing-data
Assign tag Immuta_policy=5678 to cs-data
Immuta grants the LF-Tag to users in Lake Formation.
GRANT (SELECT) on tag Immuta_policy=1234TO arn:aws:iam::123456:user/Alex
GRANT (SELECT) on tag Immuta_policy=1234 TO arn:aws:iam::123456:user/Taylor
GRANT (SELECT) on tag Immuta_policy=1234 TO arn:aws:iam::123456:user/Deepu
GRANT (SELECT) on tag Immuta_policy=5678 TO arn:aws:iam::123456:user/Casey
GRANT (SELECT) on tag Immuta_policy=5678 TO arn:aws:iam::123456:user/Mary
GRANT (SELECT) on tag Immuta_policy=5678 TO arn:aws:iam::123456:user/Catherine
User manually subscribed to a data source: When a user is manually added to a data source by a data owner, Immuta grants the user direct access to the table in Lake Formation.
Automatic subscription policy deleted: Immuta deletes the LF-Tag key and values.
AWS user account is mapped to Immuta: When a user account is mapped to Immuta, their metadata is stored in the metadata database.
User removed from a data source: Immuta revokes the user's access to the table or the LF-Tag.
Request on behalf of IAM roles (not recommended): Create users in Immuta that map to each of your existing IAM roles. Then, when users request access to data, they request on behalf of the IAM role user rather than themselves. This approach is not recommended because everyone in that role will gain access to data when granted access through a policy, and adding future users to that role will also grant access. Furthermore, it requires policy authors and approvers to understand what role should have access to what data.
50 tag limit per resource
1000 tag limit total
1000 values per tag
See the AWS documentation for details.
Immuta is actively making improvements to the AWS Lake Formation integration throughout the preview phases. Be aware of these temporary limitations during the early preview stages:
Immuta will only synchronize policies on a 1-minute schedule, so it could be up to 1 minute from you taking an action in Immuta until Immuta starts synchronizing policies. Note that this 1-minute schedule refers to Immuta starting to synchronize, not the time it will take to complete synchronization.
LF-Tags created for automatic subscription policies are not removed when no longer applicable. This can result in growth of the LF-Tag value space and may hit quotas if many policy changes occur over time. LF-Tags can be manually removed to free up space if quota is a concern.
Scale constraints are limited to 2000 data sources and 100 users.
Multiple AWS Lake Formation integrations are not supported on a single Immuta tenant.
Immuta does not ingest existing LF-Tags.
Governor authors a global policy in Immuta.
"Users may subscribe to data sources tagged Research when they are members of group Research."
"Users may subscribe to data sources tagged CS when they have the attribute training.complete."
Immuta calculates data sources affected
research-data
marketing-data
cs-data
Immuta calculates users affected.
Alex
Taylor
Deepu
Casey
Mary
Catherine
Immuta generates a group identifier for the users and data sources affected.
1234
Table
✅
❌
✅
View
✅
❌
✅



5678
SELECT on all tables and views registered in Immuta
Continue with your integration configuration.
Fill out the Client ID. This is the subject of the generated token.
Select the method Immuta will use to obtain an access token:
Certificate
Keep the Use Certificate checkbox enabled.
Opt to fill out the Resource field with a URI of the resource where the requested token will be used.
Enter the x509 Certificate Thumbprint. This identifies the corresponding key to the token and is often abbreviated as x5t or is called sub (Subject).
Upload the PEM Certificate, which is the client certificate that is used to sign the authorization request.
Client secret
Uncheck the Use Certificate checkbox.
Enter the Scope (string). The scope limits the operations and roles allowed in Snowflake by the access token. See the for details about scopes.
Fill out Username.
Click Upload Certificates to enter in the Certificate Authority, Certificate File, and Key File.
Close the modal and opt to enter the Encrypted Key File Passphrase.
All databases that should be registered in the connection
All tables that should be registered in the connection
Any LF-Tags you are using on the resources that should be registered in the connection
The AWS account credentials or AWS IAM role you provide for the Immuta service principal must have permissions to perform the following actions to register data and apply policies:
glue:GetDatabase
glue:GetTables
glue:GetDatabases
glue:GetTable
lakeformation:ListPermissions
lakeformation:BatchGrantPermissions
DESCRIBE Lake Formation permission on any LF-Tags you want to have pulled into Immuta and applied to data sources through tag ingestion
In the Lake Formation console, navigate to Permissions.
Select LF-Tags and permissions.
Select LF-Tag creators, and then Add LF-Tag creators.
Enter your service principal, and grant it the Create LF-Tag permission and grantable permission.
Click Add to save your changes.
Grant the service principal permissions on any tables that will be registered in Immuta. There are two ways to give the service principal these permissions: either make a new LF-Tag that gives the appropriate permissions and apply it to all databases or tables that Immuta will manage, or make the role a superuser in Lake Formation.
Create a single tag key with one tag value. For example,
Tag key: immuta_governed
Tag value: true
On the LF-Tag key-value pair, grant the ASSOCIATE LF-Tag permission to your own IAM principal.
Grant this tag to the Immuta service principal.
In the Lake Formation console, navigate to Data permissions and click Grant.
Enter the service principal’s IAM role.
Add the key-value pair of the tag you created in step 1.
Under Table Permissions, select the following grantable permissions: SELECT, DESCRIBE, INSERT, DELETE.
Click Grant.
Apply this tag to the resources you would like Immuta to govern. The Immuta service principal will now have the minimum required permissions on these resources. If new resources are created in AWS, you must repeat this process of applying this tag to those resources if you want Immuta to govern them.
This option enables all Lake Formation operations on all data in the Glue Data Catalog. This is highly privileged and runs the risk of managing permissions on data you did not intend to.
This method will grant all necessary permissions to the service principal, but grants more than the service principal needs without being as flexible, since it does not allow for exceptions like the LF-Tag method. You can make the service principal a superuser on the entire catalog or specify individual resources.
In the Lake Formation console, navigate to Data permissions and click Grant.
Enter your service principal’s IAM role.
Select Named Data Catalog resources, and input the Glue Data Catalog ID and any databases or tables you wish to specify.
Under Grantable permissions, select Super and click Grant.
Follow the to grant ALL permissions to the DataLakePrincipalIdentifier for the Immuta service principal ARN.
Display Name: This is the name of your new connection. This name will be used in the API (connectionKey), in data source names from the host, and on the connections page. Avoid the use of periods (.) or restricted words in your connection name.
AWS Account ID: The ID of the AWS account associated with the Glue Data Catalog.
AWS Region: The region of the AWS account associated with the Glue Data Catalog.
Opt to enable Immuta to Ingest Lake Formation Tags (private preview): This will ensure your Lake Formation Tags are applied to your data sources in Immuta.
Click Next.
Select an authentication method from the dropdown menu.
AWS Access Key and Secret Access Key: Provide the access key ID and secret access key for an AWS account with the AWS permissions listed in the set up the Immuta service principal section.
AWS IAM Role (recommended): Immuta will assume this IAM role from Immuta's AWS account in order to perform any operations in your AWS account. Before proceeding, contact your Immuta representative and provide your service principal's IAM role. Immuta will allowlist the service principal's IAM role so that Immuta can successfully assume that role. Then, complete the steps below.
Enter your service principal's role ARN in the AWS IAM Role field. Immuta will assume this role when interacting with AWS.
Copy the trust policy displayed below the AWS IAM Role field. You will paste this in your service principal's trust policy in the next step.
In AWS, navigate to your service principal's Trust Relationships tab and edit the existing trust relationship. Paste the trust policy you copied from the Immuta UI and save your changes.
In Immuta, ensure that you have the correct permissions and click Validate Connection.
If the connection is successful, click Next. If there are any errors, check the connection details and credentials to ensure they are correct and try again.
Ensure all the details are correct in the summary and click Complete Setup.
Use the dropdown menu to select the User Type. User and role names are case-sensitive. See the AWS documentation for details.
AWS IAM role: Only a single Immuta user can be mapped to an IAM role. This restriction prohibits enforcing policies on AWS users who could assume that role. Therefore, if using role principals, create a new user in Immuta that represents the role so that the role then has the permissions applied specifically to it.
AWS Identity Center user IDs: You must use the numeric User ID value found in AWS IAM Identity Center, not the user's email address.
Unset (fallback to Immuta username): When selecting this option, the AWS username is assumed to be the same as the Immuta username.
Click Save.

Users query data from the Immuta created datasets directly in BigQuery.
Immuta can enforce specific policies on data in a single BigQuery project. At this time, workspaces, tag ingestion, user impersonation, query audit, and multiple integrations are not supported.
Create a view in BigQuery of the partitioned table, with the pseudo-column aliased. For example,
Register this view as a BigQuery data source in Immuta.
Immuta will then be able to create Immuta-managed views off of this view with the pseudo-column aliased.
Mask using a regular expression
Mask by date rounding
Mask by numeric rounding
Mask using custom functions
Row-level masking
Row visibility based on user attributes and/or object attributes
Only show rows that fall within a given time window
Minimize rows
Filter rows using custom WHERE clause
Always hide rows
Click Select Authentication Method and select Key File.
Click Download Script(s).
Before you run the script, update your permissions to execute it:
Run the script, where
PROJECT_ID is the Google Cloud Platform project to operate on.
ROLE_ID is the name of the custom role to create.
NAME will create a service account with the provided name.
OUTPUT_FILE is the path where the resulting private key should be written. File system write permission will be checked on the specified path prior to the key creation.
undelete-role (optional) will undelete the custom role from the project. Roles that have been deleted for a long time can't be undeleted. This option can fail for the following reasons:
The role specified does not exist.
The active user does not have permission to access the given role.
enable-api (optional) provided you’ve been granted access to enable the Google BigQuery API, will enable the service.
bigquery.datasets.get
bigquery.datasets.update
bigquery.jobs.create
bigquery.jobs.get
bigquery.jobs.list
bigquery.jobs.listAll
bigquery.routines.create
bigquery.routines.delete
bigquery.routines.get
bigquery.routines.list
bigquery.routines.update
bigquery.tables.create
bigquery.tables.delete
bigquery.tables.export
bigquery.tables.get
bigquery.tables.getData
bigquery.tables.list
bigquery.tables.setCategory
bigquery.tables.update
bigquery.tables.updateData
bigquery.tables.updateTag
Create a service account and grant it the custom role you just created.
Upload your GCP Service Account Key File. This is the private key file generated in create a Google Cloud service account and role for Immuta to use to connect to Google BigQuery. Uploading this file will auto-populate the following fields:
Project Id: The Google Cloud Platform project to operate on, where your Google BigQuery data warehouse is located. A new dataset will be provisioned in this Google BigQuery project to store the integration configuration.
Service Account: The service account you created in create a Google Cloud service account and role for Immuta to use to connect to Google BigQuery.
Complete the following fields:
Immuta Dataset: The name of the Google BigQuery dataset to provision inside of the project. Important: if you are using multiple environments in the same Google BigQuery project, this dataset to provision must be unique across environments.
Immuta Role: The custom role you created in create a Google Cloud service account and role for Immuta to use to connect to Google BigQuery.
Dataset Suffix: The suffix that will be postfixed to the name of each dataset created to store secure views, one per dataset that you ingest a table for as a data source in Immuta. Important: if you are using multiple environments in the same Google BigQuery project, this suffix must be unique across environments.
GCP Location: The dataset’s location. After a dataset is created, the location can't be changed. Note that
If you choose EU for the dataset location, your Core BigQuery Customer Data resides in the EU.
Click Test Google BigQuery Integration.
Click Save.
Before you run the script, update your permissions to execute it:
Run the cleanup script.

You can customize the Databricks Spark integration settings using these components Immuta provides:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"glue:GetDatabase",
"glue:GetTables",
"glue:GetDatabases",
"glue:GetTable",
"lakeformation:ListPermissions",
"lakeformation:BatchGrantPermissions",
"lakeformation:BatchRevokePermissions",
"lakeformation:CreateLFTag",
"lakeformation:UpdateLFTag",
"lakeformation:DeleteLFTag",
"lakeformation:AddLFTagsToResource",
"lakeformation:RemoveLFTagsFromResource",
"lakeformation:GetResourceLFTags",
"lakeformation:ListLFTags",
"lakeformation:GetLFTag",
"lakeformation:SearchTablesByLFTags",
"lakeformation:SearchDatabasesByLFTags"
],
"Resource": "*"
}
]
}create view `sales`.`emea`.`sales_view` as SELECT *, _PARTITIONTIME as __partitiontime from `sales`.`emea`.`sales`chmod 755 <path to downloaded script>chmod 755 <path to downloaded script>gcloud config set account ACCOUNTlakeformation:BatchRevokePermissions
lakeformation:CreateLFTag
lakeformation:UpdateLFTag
lakeformation:DeleteLFTag
lakeformation:AddLFTagsToResource
lakeformation:RemoveLFTagsFromResource
lakeformation:GetResourceLFTags
lakeformation:ListLFTags
lakeformation:GetLFTag
lakeformation:SearchTablesByLFTags
lakeformation:SearchDatabasesByLFTags
$ bootstrap.sh \
--project PROJECT_ID \
--role ROLE_ID \
--service_account NAME \
--keyfile OUTPUT_FILE \
[--undelete-role] \
[--enable-api]Immuta provides cluster policies that set the Spark environment variables and configuration on your Databricks cluster once you apply that policy to your cluster. These policies generated by Immuta must be applied to your cluster manually. The Configure a Databricks Spark integration guide includes instructions for generating and applying these cluster policies. Each cluster policy is described below.
This is the most performant policy configuration.
In this configuration, Immuta is able to rely on Databricks-native security controls, reducing overhead. The key security control here is the enablement of process isolation. This prevents users from obtaining unintentional access to the queries of other users. In other words, masked and filtered data is consistently made accessible to users in accordance with their assigned attributes. This Immuta cluster configuration relies on Py4J security being enabled. Consequently, the following Databricks features are unsupported:
Many Python ML classes (such as LogisticRegression, StringIndexer, and DecisionTreeClassifier)
dbutils.fs
Databricks Connect client library
For full details on Databricks’ best practices in configuring clusters, read their .
Additional overhead: Compared to the Python and SQL cluster policy, this configuration trades some additional overhead for added support of the R language.
In this configuration, you are able to rely on the Databricks-native security controls. The key security control here is the enablement of process isolation. This prevents users from obtaining unintentional access to the queries of other users. In other words, masked and filtered data is consistently made accessible to users in accordance with their assigned attributes.
Like the Python & SQL configuration, Py4j security is enabled for the Python & SQL & R configuration. However, because R has been added Immuta enables the Security Manager, in addition to Py4J security, to provide more security guarantees. For example, by default all actions in R execute as the root user; among other things, this permits access to the entire filesystem (including sensitive configuration data), and, without iptable restrictions, a user may freely access the cluster’s cloud storage credentials. To address these security issues, Immuta’s initialization script wraps the R and Rscript binaries to launch each command as a temporary, non-privileged user with limited filesystem and network access and installs the Immuta Security Manager, which prevents users from bypassing policies and protects against the above vulnerabilities from within the JVM.
Consequently, the cost of introducing R is that the Security Manager incurs a small increase in performance overhead; however, average latency will vary depending on whether the cluster is homogeneous or heterogeneous. (In homogeneous clusters, all users are at the same level of groups/authorizations; this is enforced externally, rather than directly by Immuta.)
When users install third-party Java/Scala libraries, they will be denied access to sensitive resources by default. However, cluster administrators can specify which of the installed Databricks libraries should be by Immuta.
The following Databricks features are unsupported when this cluster policy is applied:
Many Python ML classes (such as LogisticRegression, StringIndexer, and DecisionTreeClassifier)
dbutils.fs
Databricks Connect client library
For full details on Databricks’ best practices in configuring clusters, read their .
Py4J security disabled: In addition to support for Python, SQL, and R, this configuration adds support for additional Python libraries and utilities by disabling Databricks-native Py4J security.
This configuration does not rely on Databricks-native Py4J security to secure the cluster, while process isolation is still enabled to secure filesystem and network access from within Python processes. On an Immuta-enabled cluster, once Py4J security is disabled the Immuta Security Manager is installed to prevent nefarious actions from Python in the JVM. Disabling Py4J security also allows for expanded Python library support, including many Python ML classes (such as LogisticRegression, StringIndexer, and DecisionTreeClassifier) and dbutils.fs.
By default, all actions in R will execute as the root user. Among other things, this permits access to the entire filesystem (including sensitive configuration data). And without iptable restrictions, a user may freely access the cluster’s cloud storage credentials. To properly support the use of the R language, Immuta’s initialization script wraps the R and Rscript binaries to launch each command as a temporary, non-privileged user. This user has limited filesystem and network access. The Immuta Security Manager is also installed to prevent users from bypassing policies and protects against the above vulnerabilities from within the JVM.
The Security Manager will incur a small increase in performance overhead; average latency will vary depending on whether the cluster is homogeneous or heterogeneous. (In homogeneous clusters, all users are at the same level of groups/authorizations; this is enforced externally, rather than directly by Immuta.)
When users install third-party Java/Scala libraries, they will be denied access to sensitive resources by default. However, cluster administrators can specify which of the installed Databricks libraries should be by Immuta.
A homogeneous cluster is recommended for configurations where Py4J security is disabled. If all users have the same level of authorization, there would not be any data leakage, even if a nefarious action was taken.
For full details on Databricks’ best practices in configuring clusters, read their .
Scala clusters: This configuration is for Scala-only clusters.
Where Scala language support is needed, this configuration can be used in the Custom access mode.
According to Databricks’ cluster type support documentation, Scala clusters are intended for single users only. However, nothing inherently prevents a Scala cluster from being configured for multiple users. Even with the Immuta Security Manager enabled, there are limitations to user isolation within a Scala job.
For a secure configuration, it is recommended that clusters intended for Scala workloads are limited to Scala jobs only and are made homogeneous through the use of project equalization or externally via convention/cluster ACLs. (In homogeneous clusters, all users are at the same level of groups/authorizations; this is enforced externally, rather than directly by Immuta.)
For full details on Databricks’ best practices in configuring clusters, read their .
Single-user clusters recommended: Like Databricks, Immuta recommends single-user clusters for sparklyr when user isolation is required. A single-user cluster can either be a job cluster or a cluster with credential passthrough enabled. Note: spark-submit jobs are not currently supported.
Two cluster types can be configured with sparklyr: Single-User Clusters (recommended) and Multi-User Clusters (discouraged).
Single-User Clusters: Credential Passthrough (required on Databricks) allows a single-user cluster to be created. This setting automatically configures the cluster to assume the role of the attached user when reading from storage. Because Immuta requires that raw data is readable by the cluster, the instance profile associated with the cluster should be used rather than a role assigned to the attached user.
: Because Immuta cannot guarantee user isolation in a multi-user sparklyr cluster, it is not recommended to deploy a multi-user cluster. To force all users to act under the same set of attributes, groups, and purposes with respect to their data access and eliminate the risk of a data leak, all sparklyr multi-user clusters must be equalized either by convention (all users able to attach to the cluster have the same level of data access in Immuta) or by configuration (detailed below).
Single-user cluster configuration
1 - Enable sparklyr
In addition to the configuration for an Immuta cluster with R, add this environment variable to the Environment Variables section of the cluster:
This configuration makes changes to the iptables rules on the cluster to allow the sparklyr client to connect to the required ports on the JVM used by the sparklyr backend service.
2 - Set up a sparklyr connection in Databricks
Install and load libraries into a notebook. Databricks includes the stable version of sparklyr, so library(sparklyr) in an R notebook is sufficient, but you may opt to install the latest version of sparklyr from CRAN. Additionally, loading library(DBI) will allow you to execute SQL queries.
Set up a sparklyr connection:
3 - Configure a single-user cluster
Add the following items to the Spark Config section of the cluster:
The trustedFileSystems setting is required to allow Immuta’s wrapper FileSystem (used in conjunction with the Security Manager for data security purposes) to be used with credential passthrough. Additionally, the InstanceProfileCredentialsProvider must be configured to continue using the cluster’s instance profile for data access, rather than a role associated with the attached user.
Multi-user cluster configuration
Avoid deploying multi-user clusters with sparklyr configuration
It is possible, but not recommended, to deploy a multi-user cluster sparklyr configuration. Immuta cannot guarantee user isolation in a multi-user sparklyr configuration.
The configurations in this section enable sparklyr, require project equalization, map sparklyr sessions to the correct Immuta user, and prevent users from accessing Immuta native workspaces.
Add the following environment variables to the Environment Variables section of your cluster configuration:
Add the following items to the Spark Config section:
Limitations
Immuta’s integration with sparklyr does not currently support
spark-submit jobs
UDFs
The Spark environment variables reference guide lists the various possible settings controlled by these variables that you can set in your cluster policy before attaching it to your cluster.
In some cases it is necessary to add sensitive configuration to SparkSession.sparkContext.hadoopConfiguration to allow Spark to read data.
For example, when accessing external tables stored in Azure Data Lake Gen2, Spark must have credentials to access the target containers or filesystems in Azure Data Lake Gen2, but users must not have access to those credentials. In this case, an additional configuration file may be provided with a storage account key that the cluster may use to access Azure Data Lake Gen2.
To use an additional Hadoop configuration file, set the IMMUTA_INIT_ADDITIONAL_CONF_URI Spark environment variable to be the full URI to this file.
Databricks non-privileged users will only see sources to which they are subscribed in Immuta, and this can present problems if organizations have a data lake full of non-sensitive data and Immuta removes access to all of it. Immuta addresses this challenge by allowing Immuta users to access any tables that are not protected by Immuta (i.e., not registered as a data source or a table in a native workspace). Although this is similar to how privileged users in Databricks operate, non-privileged users cannot bypass Immuta controls.
Protected until made available by policy: This setting means that users can only see tables that Immuta has explicitly subscribed them to.
Behavior change
If a table is registered in Immuta and does not have a subscription policy applied to it, that data will be visible to users, even if the Protected until made available by policy setting is enabled.
If you have enabled this setting, author an "Allow individually selected users" global subscription policy that applies to all data sources.
Available until protected by policy: This setting means all tables are open until explicitly registered and protected by Immuta. This setting allows both non-Immuta reads and non-Immuta writes:
IMMUTA_SPARK_DATABRICKS_ALLOW_NON_IMMUTA_READS: Immuta users with regular (non-privileged) Databricks roles may SELECT from tables that are not registered in Immuta. This setting does not allow reading data directly with commands like spark.read.format("x"). Users are still required to read data and query tables using Spark SQL. When non-Immuta reads are enabled through the cluster policy, Immuta users will see all databases and tables when they run show databases or show tables. However, this does not mean they will be able to query all of them.
: Immuta users with regular (non-privileged) Databricks roles can run DDL commands and data-modifying commands against tables or spaces that are not registered in Immuta. With non-Immuta writes enabled through the cluster policy, users on the cluster can mix any policy-enforced data they may have access to via any registered data sources in Immuta with non-Immuta data and write the ensuing result to a non-Immuta write space where it would be visible to others. If this is not a desired possibility, the cluster should instead be configured to only use Immuta’s project workspaces.
The Configure a Databricks Spark integration guide includes instructions for applying these settings to your cluster.
In Immuta, a Databricks data source is considered ephemeral, meaning that the compute resources associated with that data source will not always be available.
Ephemeral data sources allow the use of ephemeral overrides, user-specific connection parameter overrides that are applied to Immuta metadata operations.
When a user runs a Spark job in Databricks, the Immuta plugin automatically submits ephemeral overrides for that user to Immuta for all applicable data sources to use the current cluster as compute for all subsequent metadata operations for that user against the applicable data sources.
For more details about ephemeral overrides and how to configure or disable them, see the Ephemeral overrides page.
Immuta projects combine users and data sources under a common purpose. Sometimes this purpose is for a single user to organize their data sources or to control an entire schema of data sources through a single projects screen; however, most often this is an Immuta purpose for which the data has been approved to be used and will restrict access to data and streamline team collaboration. Consequently, data owners can restrict access to data for a specified purpose through projects.
When a user is working within the context of a project, data users will only see the data in that project. This helps to prevent data leaks when users collaborate. Users can switch project contexts to access various data sources while acting under the appropriate purpose. Consider adjusting the following project settings to suit your organization's needs:
Project UDFs (web service and on-cluster caches): Immuta caches a mapping of user accounts and users' current projects in the Immuta Web Service and on-cluster. When users change their project with UDFs instead of the Immuta UI, Immuta invalidates all the caches on-cluster (so that everything changes immediately) and the cluster submits a request to change the project context to a web worker. Immediately after that request, another call is made to a web worker to refresh the current project. To allow use of project UDFs in Spark jobs, raise the caching on-cluster and lower the cache timeouts for the Immuta Web Service. Otherwise, caching could cause dissonance among the requests and calls to multiple web workers when users try to change their project contexts.
Preventing users from changing projects within a session: If your compliance requirements restrict users from changing projects within a session, you can block the use of Immuta's project UDFs on a Databricks Spark cluster. To do so, configure the IMMUTA_SPARK_DATABRICKS_DISABLED_UDFS Spark environment variable.
This section describes how Immuta interacts with common Databricks features.
Databricks users can see the Databricks change data feed (CDF) on queried tables if they are allowed to read raw data and meet specific qualifications. Immuta does not support applying policies to the changed data, and the CDF cannot be read for data source tables if the user does not have access to the raw data in Databricks or for streaming queries.
The CDF can be read if the querying user is allowed to read the raw data and ONE of the following statements is true:
the table is in the current workspace
the table is in a scratch path
non-Immuta reads are enabled AND the table does not intersect with a workspace under which the current user is not acting
non-Immuta reads are enabled AND the table is not part of an Immuta data source
Security vulnerability
Using this feature could create a security vulnerability, depending on the third-party library. For example, if a library exposes a public method named readProtectedFile that displays the contents of a sensitive file, then trusting that library would allow end users access to that file. Work with your Immuta support professional to determine if the risk does not apply to your environment or use case.
The trusted libraries feature allows Databricks cluster administrators to avoid Immuta Security Manager errors when using third-party libraries. An administrator can specify an installed library as trusted, which will enable that library's code to bypass the Immuta security manager. This feature does not impact Immuta's ability to apply policies; trusting a library only allows code through that otherwise would have been blocked by the Security Manager.
The following types of libraries are supported when installing a third-party library using the Databricks UI or the Databricks Libraries API:
Library source is Upload, DBFS or DBFS/S3 and the Library Type is Jar.
Library source is Maven.
When users install third-party libraries, those libraries will be denied access to sensitive resources by default. However, cluster administrators can specify which of the installed Databricks libraries should be trusted by Immuta. See the Install a trusted library guide to add a trusted library to your configuration.
Limitations
Installing trusted libraries outside of the Databricks Libraries API (e.g., ADD JAR ...) is not supported.
Databricks installs libraries right after a cluster has started, but there is no guarantee that library installation will complete before a user's code is executed. If a user executes code before a trusted library installation has completed, Immuta will not be able to identify the library as trusted. This can be solved by either
waiting for library installation to complete before running any third-party library commands or
executing a Spark query. This will force Immuta to wait for any trusted Immuta libraries to complete installation before proceeding.
When installing a library using Maven as a library source, Databricks will also install any transitive dependencies for the library. However, those transitive dependencies are installed behind the scenes and will not appear as installed libraries in either the Databricks UI or using the Databricks Libraries API. Only libraries specifically listed in the IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URIS environment variable will be trusted by Immuta, which does not include installed transitive dependencies. This effectively means that any code paths that include a class from a transitive dependency but do not include a class from a trusted third-party library can still be blocked by the Immuta security manager. For example, if a user installs a trusted third-party library that has a transitive dependency of a file-util library, the user will not be able to directly use the file-util library to read a sensitive file that is normally protected by the Immuta security manager.
In many cases, it is not a problem if dependent libraries aren't trusted because code paths where the trusted library calls down into dependent libraries will still be trusted. However, if the dependent library needs to be trusted, there is a workaround:
Connect any of these supported external catalogs to work with your Databricks Spark integration so data owners can tag their data.
Immuta supports the use of external metastores in local or remote mode:
Local mode: The metastore client running inside a cluster connects to the underlying metastore database directly via JDBC.
Remote mode: Instead of connecting to the underlying database directly, the metastore client connects to a separate metastore service via the Thrift protocol. The metastore service connects to the underlying database. When running a metastore in remote mode, DBFS is not supported.
For more details about these deployment modes, see how to set up Databricks clusters to connect to an existing external Apache Hive metastore.
Download the metastore jars and point to them as specified in Databricks documentation. Metastore jars must end up on the cluster's local disk at this explicit path: /databricks/hive_metastore_jars.
If using DBR 7.x with Hive 2.3.x, either
Set spark.sql.hive.metastore.version to 2.3.7 and spark.sql.hive.metastore.jars to builtin or
Download the metastore jars and set spark.sql.hive.metastore.jars to /databricks/hive_metastore_jars/* as before.
To use AWS Glue Data Catalog as the metastore for Databricks, see the Databricks documentation.
Users on Databricks Runtimes 8+ can manage notebook-scoped libraries with %pip commands.
However, this functionality differs from the support for Databricks trusted libraries, and Python libraries are not supported as trusted libraries. The Immuta Security Manager will deny the code of libraries installed with %pip access to sensitive resources.
Scratch paths are cluster-specific remote file paths that Databricks users are allowed to directly read from and write to without restriction. The creator of a Databricks cluster specifies the set of remote file paths that are designated as scratch paths on that cluster when they configure a Databricks cluster. Scratch paths are useful for scenarios where non-sensitive data needs to be written out to a specific location using a Databricks cluster protected by Immuta.
To configure a scratch path, use the IMMUTA_SPARK_DATABRICKS_SCRATCH_PATHS Spark environment variable.
In the Databricks Spark integration, Immuta installs an Immuta-maintained Spark plugin on your Databricks cluster. When a user queries data that has been registered in Immuta as a data source, the plugin injects policy logic into the plan Spark builds so that the results returned to the user only include data that specific user should see.
The sequence diagram below breaks down this process of events when an Immuta user queries data in Databricks.
A Databricks workspace with the Premium tier, which includes cluster policies (required to configure the Spark integration)
A cluster that uses one of these supported Databricks Runtimes:
11.3 LTS
14.3 LTS
For a comparison of features supported for both Databricks Runtimes, see the .
Supported languages
Python
R (not supported for Databricks Runtime 14.3 LTS)
A Databricks cluster that is one of these supported compute types:
Custom access mode
A Databricks workspace and cluster with the ability to directly make HTTP calls to the Immuta web service. The Immuta web service also must be able to connect to and perform queries on the Databricks cluster, and to call .
When an administrator configures the Databricks Spark integration, Immuta generates a cluster policy that the administrator then applies to the Databricks cluster. When the cluster starts after the cluster policy has been applied, the Databricks cluster that Immuta provides downloads Spark plugin artifacts onto the cluster that has the init script and puts the artifacts in the appropriate locations on local disk for use by Spark.
Once the init script runs, the Spark application running on the Databricks cluster will have the appropriate artifacts on its CLASSPATH to use Immuta for authorization and policy enforcement.
Immuta adds the following artifacts to your Databricks environment:
The Databricks Spark integration injects this Immuta-maintained Spark plugin into the SparkSQL stack at cluster startup time. Policy determinations are obtained from the connected Immuta tenant and applied before returning results to the user. The plugin includes wrappers and Immuta analysis hook plan rewrites to enforce policies.
Note: The Security Manager is disabled for.
The Immuta Security Manager ensures users can't perform unauthorized actions when using Scala and R, since those languages have features that allow users to circumvent policies without the Security Manager enabled. The Immuta Security Manager blocks users from executing code that could allow them to gain access to sensitive data by only allowing select code paths to access sensitive files and methods. These select code paths provide Immuta's code access to sensitive resources while blocking end users from these sensitive resources directly.
Performance
The Security Manager must inspect the call stack every time a permission check is triggered, which adds overhead to queries. To improve Immuta's query performance on Databricks, Immuta disables the Security Manager when Scala and R are not being used.
immuta databaseWhen a table is registered in Immuta as a data source, users can see that table in the native Databricks database and in the immuta database. This allows for an option to use a single database (immuta) for all tables.
The immuta database on Immuta-enabled clusters allows Immuta to track Immuta-managed data sources separately from remote Databricks tables so that policies and other security features can be applied. However, Immuta supports raw tables in Databricks, so table-backed queries do not need to reference this database.
Once the Immuta-enabled cluster is running, the following user actions spur various processes. The list below provides an overview of each process:
: When a data owner registers a Databricks securable as a data source, data source metadata (column type, securable name, column names, etc.) is retrieved from the Metastore and stored in the Immuta Metadata Database. If tags are then applied to the data source, Immuta stores this metadata in the Metadata Database as well.
Data source is deleted: When a data source is deleted, the data source metadata is deleted from the Metadata Database. Depending on the settings configured for the integration, users will either be able to query that data now that it is no longer registered in Immuta, or access to the securable will be revoked for all users. See the for details about this setting.
The image below illustrates these processes and how they interact.
The Databricks Spark integration allows users to author subscription and data policies to enforce access controls. See the corresponding pages for details about specific types of policies supported:
Immuta supports clusters on Databricks Runtime 14.3. The integration for this Databricks Runtime differs from the integration for Databricks Runtime 11.3 in the following ways:
: The Security Manager is disabled for Databricks Runtime 14.3. Because the Security Manager is used to prevent users from circumventing access controls when using R and Scala, those languages are unsupported. Only Python and SQL clusters are supported.
Py4J security and process isolation automatically enabled: Immuta relies on Databricks process isolation and Py4J security to prevent user code from performing unauthorized actions. After selecting Runtime 14.3 during configuration, Immuta will automatically enable process isolation and Py4J security.
dbutils is unsupported: Immuta relies on Databricks process isolation and Py4J security to prevent user code from performing unauthorized actions. This means that dbutils is not supported for Databricks Spark integrations using Databricks Runtime 14.3 LTS.
The table below compares the features supported for clusters on Databricks Runtime 11.3 and Databricks Runtime 14.3.
The Databricks Spark integration supports the following authentication methods to configure the integration:
OAuth machine-to-machine (M2M): Immuta uses the to integrate with , which allows Immuta to authenticate with Databricks using a client secret. Once Databricks verifies the Immuta service principal’s identity using the client secret, Immuta is granted a temporary OAuth token to perform token-based authentication in subsequent requests. When that token expires (after one hour), Immuta requests a new temporary token. See the for more details.
Personal access token (PAT): This token gives Immuta temporary permission to push the cluster policies to the configured Databricks workspace and overwrite any cluster policy templates previously applied to the workspace when configuring the integration or to register securables as Immuta data sources.
Immuta captures the code or query that triggers the Spark plan in Databricks, making audit records more useful in assessing what users are doing. To audit what triggers the Spark plan, Immuta hooks into Databricks where notebook cells and JDBC queries execute and saves the cell or query text. Then, Immuta pulls this information into the audits of the resulting Spark jobs.
Immuta supports auditing all queries run on a Databricks cluster, regardless of whether users touch Immuta-protected data or not. To configure Immuta to do so, set the in the Spark cluster configuration when configuring your integration.
See the for more details about the audit capabilities in the Databricks Spark integration.
Non-administrator users on an Immuta-enabled Databricks cluster must not have access to view or modify Immuta configuration or the immuta-spark-hive.jar file, as this poses a security loophole around Immuta policy enforcement. allow you to securely apply environment variables to Immuta-enabled clusters.
Databricks secrets can be used in the environment variables configuration section for a cluster by referencing the secret path instead of the actual value of the environment variable. For example, if a user wanted to make the MY_SECRET_ENV_VAR=abcd_1234 value secret, they could instead create a Databricks secret and reference it as the value of that variable by following these steps:
Create the secret scope my_secrets and add a secret with the key my_secret_env_var containing the sensitive environment variable.
Reference the secret in the environment variables section as MY_SECRET_ENV_VAR={{secrets/my_secrets/my_secret_env_var}}.
At runtime, {{secrets/my_secrets/my_secret_env_var}} would be replaced with the actual value of the secret if the owner of the cluster has access to that secret.
There are limitations to isolation among users in Scala jobs on a Databricks cluster, even when using Immuta’s Security Manager. When data is broadcast, cached (spilled to disk), or otherwise saved to SPARK_LOCAL_DIR, it's impossible to distinguish between which user’s data is composed in each file/block. If you are concerned about this vulnerability, Immuta suggests that you
limit Scala clusters to Scala jobs only and
require equalized projects, which will force all users to act under the same set of attributes, groups, and purposes with respect to their data access. To require that Scala clusters be used in equalized projects and avoid the risk described above, set the to true.
Once this configuration is complete, users on the cluster will need to switch to an Immuta equalized project before running a job. Once the first job is run using that equalized project, all subsequent jobs, no matter the user, must also be run under that same equalized project. If you need to change a cluster's project, you must restart the cluster.
When data is read in Spark using an Immuta policy-enforced plan, the masking and redaction of rows is performed at the leaf level of the physical Spark plan, so a policy such as "Mask using hashing the column social_security_number for everyone" would be implemented as an expression on a project node right above the FileSourceScanExec/LeafExec node at the bottom of the plan. This process prevents raw data from being shuffled in a Spark application and, consequently, from ending up in SPARK_LOCAL_DIR.
This policy implementation coupled with an equalized project guarantees that data being dropped into SPARK_LOCAL_DIR will have policies enforced and that those policies will be homogeneous for all users on the cluster. Since each user will have access to the same data, if they attempt to manually access other users' cached data, they will only see what they have access to via equalized permissions on the cluster. If project equalization is not turned on, users could dig through that directory and find data from another user with heightened access, which would result in a data leak.
The has guidance for resolving issues with your installation.
Add the transitive dependency jar paths to the IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URIS Spark environment variable. In the driver log4j logs, Databricks outputs the source jar locations when it installs transitive dependencies. In the cluster driver logs, look for a log message similar to the following:
In the above example, where slf4j is the transitive dependency, you would add the path dbfs:/FileStore/jars/maven/org/slf4j/slf4j-api-1.7.25.jar to the IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URIS environment variable and restart your cluster.
INFO LibraryDownloadManager: Downloaded library dbfs:/FileStore/jars/maven/org/slf4j/slf4j-api-1.7.25.jar as
local file /local_disk0/tmp/addedFile8569165920223626894slf4j_api_1_7_25-784af.jarIMMUTA_DATABRICKS_SPARKLYR_SUPPORT_ENABLED=truesc <- spark_connect(method = "databricks")spark.databricks.passthrough.enabled true
spark.databricks.pyspark.trustedFilesystems com.databricks.s3a.S3AFileSystem,shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem,shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem,com.databricks.adl.AdlFileSystem,shaded.databricks.V2_1_4.com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem,shaded.databricks.org.apache.hadoop.fs.azure.NativeAzureFileSystem,shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem,org.apache.hadoop.fs.ImmutaSecureFileSystemWrapper
spark.hadoop.fs.s3a.aws.credentials.provider com.amazonaws.auth.InstanceProfileCredentialsProviderIMMUTA_DATABRICKS_SPARKLYR_SUPPORT_ENABLED=true
IMMUTA_SPARK_REQUIRE_EQUALIZATION=true
IMMUTA_SPARK_CURRENT_USER_SCIM_FALLBACK=falseimmuta.spark.acl.assume.not.privileged true
immuta.api.key=<user’s API key>dbGetQuery(sc, "show tables in immuta")SQL
The cluster init script checks the cluster’s configuration and automatically removes the Security Manager configuration when
spark.databricks.repl.allowedlanguages is a subset of {python, sql}
IMMUTA_SPARK_DATABRICKS_PY4J_STRICT_ENABLED is true
When the cluster is configured this way, Immuta can rely on Databricks' process isolation and Py4J security to prevent user code from performing unauthorized actions.
Note: Immuta still expects the spark.driver.extraJavaOptions and spark.executor.extraJavaOptions to be set and pointing at the Security Manager.
Beyond disabling the Security Manager, Immuta will skip several startup tasks that are required to secure the cluster when Scala and R are configured, and fewer permission checks will occur on the Driver and Executors in the Databricks cluster, reducing overhead and improving performance.
Caveats
There are still cases that require the Security Manager; in those instances, Immuta creates a fallback Security Manager to check the code path, so the IMMUTA_INIT_ALLOWED_CALLING_CLASSES_URI environment variable must always point to a valid calling class file.
Databricks’ dbutils is blocked by their Py4J security; therefore, it can’t be used to access scratch paths.
immuta from any calls to SHOW DATABASES so that users are not confused or misled by that database. Hiding the database does not disable access to it. Queries can still be performed against tables in the immuta database using the Immuta-qualified table name (e.g., immuta.my_schema_my_table) regardless of whether or not this database is hidden.To hide the immuta database, use the following environment variable in the Spark cluster configuration when configuring your integration:
Then, Immuta will not show this database when a SHOW DATABASES query is performed.
A policy is deleted: When a policy is deleted, the policy information is deleted from the Metadata Database. If users were granted access to the data source by that policy, their access is revoked.
Databricks user is mapped to Immuta: When a Databricks user is mapped to Immuta, their metadata is stored in the Metadata Database.
Databricks user queries data: When a user queries the data in Databricks, Immuta intercepts the call from Spark down to the Metastore. Then, the Immuta-maintained Spark plugin retrieves the policy information, the user metadata, and the data source metadata from the Metadata Database and injects this information as policy logic into the Spark logical plan. Once the physical plan is applied, Databricks returns policy-enforced data to the user.
Databricks Connect is unsupported: Databricks Connect is unsupported because Py4J security must be enabled to use it.
Non-Immuta reads and writes
✅
✅
✅
✅
✅
✅
Python
✅
✅
SQL
✅
✅
R
✅
❌
Scala
✅
❌
Immuta project workspaces
✅
❌
Smart mask ordering
✅
❌
Masking and tagging complex columns (STRUCT, ARRAY, MAP)
✅
❌
Photon support
✅
❌
dbutils
✅
❌
Databricks Connect
✅
❌
Write policies
❌
❌
Support for allowlisting networks or local filesystem paths
❌
✅
Subscription policies
✅
✅
Data policies
✅
✅
✅
✅
✅



✅
IMMUTA_SPARK_SHOW_IMMUTA_DATABASE=falsePrivate preview: This integration is available to select accounts. Contact your Immuta representative for details.
Immuta's Amazon S3 integration allows users to apply to data in S3 to restrict what prefixes, buckets, or objects users can access. To enforce access controls on this data, Immuta creates S3 grants that are administered by S3 Access Grants, an AWS feature that defines access permissions to data in S3.
You cannot add an existing location that is registered in your S3 Access Grants instance to the Immuta configuration. Immuta will create and register a location for you.
; contact your Immuta representative to get this feature enabled
Enable
APPLICATION_ADMIN Immuta permission to configure the integration
CREATE_S3_DATASOURCE Immuta permission to register S3 prefixes
There are two AWS roles that you will set up in AWS before configuring the integration in Immuta:
. AWS supports one Access Grants instance per region per AWS account.
Create a permissions policy with the following permissions. You will attach this permissions policy to your location IAM role once created.
If you use server-side encryption with AWS Key Management Service (AWS KMS) keys to encrypt your data, the following permissions are required for the IAM role in the policy. If you do not use this feature, do not include these permissions in your IAM policy:
kms:Decrypt
Create a permissions policy with the permissions in the sample policy below.
Replace <location_role_arn> and <access_grants_instance_arn> in the example below with the ARNs of the location IAM role you created and your Access Grants instance, respectively.
The Access Grants instance resource ARN should be scoped to apply to any future locations that will be created under this Access Grants instance. For example, "Resource": "arn:aws:s3:us-east-2:6********499:access-grants/default*" ensures that the role would have permissions for both of these locations:
arn:aws:s3:us-east-2:6********499:access-grants/default/newlocation1
A sample trust policy for this role is provided in the next section. Once you begin configuring the integration in Immuta, you will add this role to your integration configuration so that Immuta can authenticate with AWS and set up the integration. In a step in the next section, you will return to AWS to edit the trust policy for this role and add the AWS account Immuta provided and the external ID displayed in the Immuta console.
In Immuta, click the App Settings icon in the navigation menu and click the Integrations tab.
Click + Add Integration.
Select Amazon S3 from the dropdown menu and click Continue Configuration.
You can edit the following settings for an existing Amazon S3 integration on the app settings page:
friendly name
authentication type and values (access key, secret, and role)
To edit settings for an existing integration via the API, see the .
Follow the to register prefixes in Immuta.
To create an S3 data source using the API, see the .
Requirements: USER_ADMIN Immuta permission and either the GOVERNANCE or CREATE_S3_DATASOURCE Immuta permission
in Immuta to enforce access controls.
Map AWS IAM principals to each Immuta user to ensure Immuta properly enforces policies:
Click Identities in the navigation menu and select Users.
Requirement: User must be subscribed to the data source in Immuta
. If you're accessing S3 data through one of the supported (such as Amazon EMR on EC2), that application will make this request on your behalf, so you can skip this step.
.
Immuta's Amazon S3 integration allows users to apply to data in S3 to restrict what prefixes, buckets, or objects users can access. To enforce access controls on this data, Immuta creates S3 grants that are administered by S3 Access Grants, an AWS feature that defines access permissions to data in S3.
With this integration, users can avoid
hand-writing AWS IAM policies
managing AWS IAM role limits
manually tracking what user or role has access to what files in AWS S3 and verifying those are consistent with intent
To enforce controls on S3 data, Immuta interacts with several S3 Access Grants components:
Access Grants instance: An Access Grants instance is a logical container for individual grants that specify who can access what level of data in S3 in your AWS account and region. AWS supports one Access Grants instance per region per AWS account.
Location: A location specifies what data the Access Grants instance can grant access to. For example, registering a location with a scope of s3:// allows Access Grants to manage access to all S3 buckets in that AWS account and region, whereas setting the bucket s3://research-data as the scope limits Access Grants to managing access to that single bucket for that location. When you configure the S3 integration in Immuta, you specify a location's scope and IAM assumed role, and Immuta registers the location in your Access Grants instance and associates it with the provided IAM role for you. Each S3 integration you configure in Immuta is associated with one location, and Immuta manages all grants in that location. Therefore, grants cannot be manually created by users in an Access Grants instance location that Immuta has registered and manages. During data source registration, this location scope is prepended to the data source prefixes to build the final path used to grant or revoke access to that data in S3. For example, a location scope of
The diagram below illustrates how these S3 Access Grants components interact.
For more details about these Access Grants concepts, see the .
After an administrator creates an Access Grants instance and an assumed IAM role in their AWS account, an application administrator configures the Amazon S3 integration in Immuta. During configuration, the administrator provides the following connection information so that Immuta can create and register a location in that Access Grants instance:
AWS account ID and region
ARN for the existing Access Grants instance
ARN for the assumed IAM role
When Immuta registers this location, it associates the assumed IAM role with the location. This allows the IAM role to create temporary credentials with access scoped to a particular S3 prefix, bucket, or object in the location. The IAM role you create for this location must have all the object- and bucket-level permissions listed in the on all buckets and objects in the location; if it is missing permissions, the IAM role will not be able to grant those missing permissions to users or applications requesting temporary credentials.
In the example below, an application administrator registers the following location prefix and IAM role for their Access Grants instance in AWS account 123456:
Location path: s3://. This path allows a single Amazon S3 integration to manage all objects in S3 in that AWS account and region. Data owners can scope down access further when registering specific S3 prefixes and applying policies.
Location IAM role: The arn:aws:iam::123456:role/access-grants-role IAM role will be used to vend temporary credentials to users and applications.
Immuta registers this location and associated IAM role in the user's Access Grants instance:
After the S3 integration is configured, a data owner can register S3 prefixes and buckets that are in the configured Access Grants location path to enforce access controls on resources. Immuta stores the connection information for the prefix so that the metadata can be used to create and enforce subscription policies on S3 data.
A data owner or governor can apply a subscription policy to a registered prefix, bucket, or object to control who can access objects beginning with that prefix or in that bucket after it is registered in Immuta. Once a subscription policy is created and Immuta users are subscribed to the prefix, bucket, or object, Immuta calls the Access Grants API to create a grant for each subscribed user, specifying the following parameters in the payload so that Access Grants can create and store a grant for each user:
Access Grants location
READ access
User or role principle
In the example below, a data owner registers the s3://research-data/* bucket, and Immuta stores the connection information in the Immuta metadata database. Once the user, Taylor, is subscribed to s3://research-data/*, Immuta calls the Access Grants API to create a grant for that user to allow them to read and write S3 data in that bucket:
The definitions for each status and the state of configured data platform integrations is available in the .
To access S3 data registered in Immuta, users must be subscribed to the prefix, bucket, or object in Immuta, and their principals must be . Once users are subscribed, they request temporary credentials from S3 Access Grants. Access Grants looks up the grant ID associated with the requester. If no matching grant exists, they receive an access denied error. If one exists, Access Grants assumes the IAM role associated with the location and requests temporary credentials that are scoped to the prefix, bucket, or object and permissions specified by the individual grant. Access Grants vends the credentials to the requester, who uses those temporary credentials to access the data in S3.
In the example below, Taylor requests temporary credentials from S3 Access Grants. Access Grants looks up the grant ID (1) for that user, assumes the arn:aws:iam::123456:role/access-grants-role IAM role for the location, and vends temporary credentials to Taylor, who then uses the credentials to access the research-data bucket in S3:
Note that when accessing data through S3 Access Grants, the user or application interacts directly with the Access Grants API to request temporary credentials; Immuta does not act in this process at all. See the diagram below for an illustration of the process for accessing data through S3 Access Grants.
AWS services that support S3 Access Grants will request temporary credentials for users automatically. If users are not using a service that supports S3 Access Grants, they must have the to to request temporary credentials to access data through the access grant.
For a list of AWS services that support S3 Access Grants, see the .
Immuta's S3 integration allows data owners and governors to apply object-level access controls on data in S3 through subscription policies. When a user is subscribed to a registered prefix, bucket, or object, Immuta calls the Access Grants API to create an individual grant that narrows the scope of access within the location to that registered prefix, bucket, or object. See the diagram below for a visualization of this process.
When a user's entitlements change or a subscription policy is added to, updated, or deleted from a prefix, Immuta performs one of the following processes for each user subscribed to the registered prefix:
User added to the prefix: Immuta specifies a permission (READ or READWRITE) for each user and uses the Access Grants API to create an individual grant for each user.
User updated: Immuta deletes the current grant ID and creates a new one using the Access Grants API.
User deleted
Immuta offers two to manage read and write access to data in S3:
Read access policies manage who can get objects from S3.
Write access policies manage who can modify data in S3.
Data policies, which provide more granular controls by redacting or masking values in a table, are not supported for S3.
Data owners can register an S3 prefix at any level in the S3 path by . During this process, Immuta stores the connection information for use in .
Each prefix added in the data registration workflow is created as a single Immuta data source, and a subscription policy added to a data source applies to any objects in that bucket or beginning with that prefix:
Therefore, data owners should register prefixes or buckets at the lowest level of access control they need for that data. Using the example above, if the data owner needed to allow different users to access s3://yellow-bucket/research-data/* than those who should access s3://yellow-bucket/analyst-data/*, the data owner must register the research-data/* and analyst-data/* prefixes separately and then apply a subscription policy to those prefixes:
When an S3 data source is deleted, Immuta deletes all the grants associated with that prefix, bucket, or object in that location.
Access can be managed in AWS using IAM users, roles, or Identity Center (IDC). Immuta for user provisioning in the S3 integration.
However, if you manage access in AWS through IAM roles instead of users, user provisioning in Immuta must be done using IAM role principals. This means that if users share IAM roles, you could end up in a situation where you over-provision access to everyone in the IAM role.
See the guidelines below for the best practices to avoid this behavior if you currently use IAM roles to manage access.
Enable (recommended): IDC is the best approach for user provisioning because it treats users as users, not users as roles. Consequently, access controls are enforced for the querying user, nothing more. This approach eliminates over-provisioning and permits granular access control. Furthermore, IDC uses trusted identity propagation, meaning AWS propagates a user's identity wherever that user may operate within the AWS ecosystem. As a result, a user's identity always remains known and consistent as they navigate across AWS services, which is a key requirement for organizations to properly govern that user. Enabling IDC does not impact any existing access controls; it is additive. Immuta will manage the GRANTs for you using IDC if it is enabled and configured in Immuta. See the for instructions on mapping users from AWS IDC to user accounts in Immuta.
Create an IAM role per user: If you do not have IDC enabled, create an IAM role per user that is unique to that user and assign that IAM role to each corresponding user in Immuta. Ensure that the IAM role cannot be shared with other users. This approach can be a challenge because there is an .
Names are case-sensitive
The IAM role name and IAM user name are case-sensitive. See the for details.
Immuta supports mapping an Immuta user to AWS in one of the following ways:
: Only a single Immuta user can be mapped to an IAM role. This restriction prohibits enforcing policies on AWS users who could assume that role. Therefore, if using role principals, create a new user in Immuta that represents the role so that the role then has the permissions applied specifically to it.
See the for instructions on mapping principals to user accounts in Immuta.
The Amazon S3 integration will not interfere with existing legacy S3 integrations, and multiple S3 integrations can exist in a single Immuta tenant.
AWS services that support S3 Access Grants will request temporary credentials for users automatically. If users are not using a service that supports S3 Access Grants, they must have the to to request temporary credentials to access data through the access grant.
For a list of AWS services that support S3 Access Grants, see the .
During private preview, Immuta supports up to 500 prefixes (data sources) and up to 20 Immuta users that are mapped to S3 identities principals. This is a preview limitation that will be removed in a future phase of the integration.
S3 Access Grants allows 100,000 grants per region per account. Thus, if you have 5 Immuta users with access to 20,000 registered prefixes, you would reach this limit. for details.
The following Immuta features are not currently supported by the integration in private preview:
Location IAM role: The S3 Access Grants service assumes this role to vend credentials to the querying user. Permissions required for this IAM location role are provided in the Set up S3 Access Grants instance and IAM roles section.
Authentication IAM role (or user): Immuta uses this role to authenticate with AWS, set up the integration, and issue grants. This entity must
have the necessary permissions to create locations and issue grants. Permissions required for this IAM authentication role are provided in the Set up S3 Access Grants instance and IAM roles section.
kms:GenerateDataKeyReplace <bucket_arn> in the example below with the ARN of the bucket scope that contains data you want to grant access to. Note: The resource for object-level permissions must end with a wildcard so that Immuta can grant access to objects inside that prefix.
Create an AWS IAM role and select Custom trust policy as the trusted entity type.
Edit the trust policy to give the S3 Access Grants service principal access to this role in the resource policy file. The trust policy for this role should include at least the permissions provided in the example below, but might need additional permissions depending on other local setup factors.
Attach the permissions policy you created in the previous step to this IAM role to grant the permissions to the role. Once you begin configuring the integration in Immuta, you will add this role to your integration configuration so that Immuta can register this role with your Access Grants location.
arn:aws:s3:us-east-2:6********499:access-grants/default/newlocation2
If you use AWS IAM Identity Center, associate your IAM Identity Center instance with your S3 Access Grants instance. Then add the permissions listed in the sample policy below to your IAM authentication policy.
Copy the JSON below and replace the following bracketed placeholder values with your own. For details about the actions and resource values, see the IAM Identity Center API reference documentation.
<iam_identity_center_instance_arn>: The ARN of the instance of IAM Identity Center (InstanceArn) that is configured with the application.
<iam_identity_center_application_arn_for_s3_access_grants>: The configured with IAM Identity Center.
<aws_account>: Your AWS account ID.
<identity_store_id>: The globally that is connected to the Identity Center instance. This value is generated when a new identity store is created.
Create an AWS IAM role (recommended) and select AWS account as the trusted entity type.
Attach the permissions policy you created in the previous step(s) to the authentication IAM role to grant the permissions to the role.
Immuta will assume this authentication IAM role from Immuta's AWS account in order to perform operations in your AWS account. Contact your Immuta representative before proceeding, and Immuta will
Provide the Immuta AWS account to add to your trust policy.
Update the Immuta AWS configuration to allow Immuta to assume this role.
Complete the connection details fields, where
Friendly Name is a name for the integration that is unique across all Amazon S3 integrations configured in Immuta.
AWS Account ID is the ID of your AWS account.
AWS Region is the AWS region to use.
is the role the S3 Access Grants service assumes to vend credentials to the grantee. When a grantee accesses S3 data, the Access Grants service attaches session policies and assumes this role in order to vend credentials scoped to a prefix or bucket to the grantee. This role needs full access to all paths under the S3 location prefix.
S3 Access Grants S3 Location Scope is the base S3 location that Immuta will use for this connection when registering S3 prefixes. This path must be unique across all S3 integrations configured in Immuta. During data source registration, this prefix is prepended to the data source prefixes to build the final path used to grant or revoke access to that data in S3. For example, a location prefix of s3://research-data would be prepended to the data source prefix /demographics to generate a final path of s3://research-data/demographics.
Select your authentication method:
Access using AWS IAM role (recommended): Immuta will assume this IAM role from Immuta's AWS account in order to perform operations in your AWS account. You should have already contacted your Immuta representative so that they could give you the AWS account to add to your trust policy and update the Immuta AWS configuration to allow Immuta to assume this role. If you have not contacted your Immuta representative yet, please follow the instructions in the section above before completing these steps:
Enter the role ARN in the AWS IAM Role field. Immuta will assume this role when interacting with AWS.
Set the external ID provided in a condition on the trust relationship for the cross-account IAM role specified above. See the for guidance. An example trust policy is provided below. Replace the values in placeholder brackets with your own:
Access using access key and secret access key: Provide your AWS Access Key ID and AWS Secret Access Key. The credentials you provide should have the included in the sample permissions policy for the authentication role (or user).
Click Verify Credentials.
Click Next to review and confirm your connection information, and then click Complete Setup.
Navigate to the user's page and click the more actions icon next to their username.
Select Change S3 User or AWS IAM Role from the dropdown menu.
Use the dropdown menu to select the User Type. Then complete the S3 field. User and role names are case-sensitive. See the AWS documentation for details.
AWS IAM role principals: Only a single Immuta user can be mapped to an IAM role. This restriction prohibits enforcing policies on AWS users who could assume that role. Therefore, if using role principals, create a new user in Immuta that represents the role so that the role then has the permissions applied specifically to it.
AWS Identity Center user IDs: You must use the numeric User ID value found in AWS IAM Identity Center, not the user's email address. Ensure that you have added the content to your IAM policy JSON as outlined in the above to allow Immuta to use AWS Identity Center.
Unset (fallback to Immuta username): When selecting this option, the S3 username is assumed to be the same as the Immuta username.
Click Save.
See the Mapping IAM principals in Immuta section for details about supported principals.
s3://research-data/demographicss3://research-data/demographicsIndividual grants: Individual permission grants in S3 Access Grants specify the identity that can access the data, the access level, and the location of the S3 data. Immuta creates a grant for each user subscribed to a prefix, bucket, or object by interacting with the Access Grants API. Each grant has its own ID and gives the user or role principle access to the data.
IAM assumed role: This is an IAM role you create in S3 that has full access to all prefixes, buckets, and objects in the Access Grants location registered by Immuta. This IAM role is used to vend temporary credentials to users or applications. When a grantee requests temporary credentials, the S3 Access Grants service assumes this role to vend credentials scoped to the prefix, bucket, or object specified in the grant to the grantee. The grantee then uses these credentials to access S3 data. When configuring the integration in Immuta, you specify this role, and then Immuta associates this role with the registered location in the Access Grants instance.
Temporary credentials: These just-in-time access credentials provide access to a prefix, bucket, or object with a permission level of READ or READWRITE in S3. When a user or application requests temporary credentials to access S3 data, the S3 Access Grants instance evaluates the request against the grants Immuta has created for that user. If a matching grant exists, S3 Access Grants assumes the IAM role associated with the location of the matching grant and scopes the permissions of the IAM session to the S3 prefix, bucket, or object specified by the grant and vends these temporary credentials to the requester. These credentials have a default timeout of 1 hour, but this duration can be changed by the requester.
Request on behalf of IAM roles (not recommended): Create users in Immuta that map to each of your existing IAM roles. Then, when users request access to data, they request on behalf of the IAM role user rather than themselves. This approach is not recommended because everyone in that role will gain access to data when granted access through a policy, and adding future users to that role will also grant access. Furthermore, it requires policy authors and approvers to understand what role should have access to what data.
Audit
Data policies
Impersonation
Schema monitoring
Tag ingestion








{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1234567891011",
"Effect": "Allow",
"Principal": {
"Service":"access-grants.s3.amazonaws.com"
},
"Action": [
"sts:AssumeRole",
"sts:SetSourceIdentity"
]
}
]
} {
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ObjectLevelReadPermissions",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:GetObjectVersion",
"s3:GetObjectAcl",
"s3:GetObjectVersionAcl",
"s3:ListMultipartUploadParts"
],
"Resource": "<bucket_arn>/*"
},
{
"Sid": "ObjectLevelWritePermissions",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:PutObjectAcl",
"s3:PutObjectVersionAcl",
"s3:DeleteObject",
"s3:DeleteObjectVersion",
"s3:AbortMultipartUpload"
],
"Resource": "<bucket_arn>/*"
},
{
"Sid": "BucketLevelReadPermissions",
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": "<bucket_arn>"
},
{
"Sid": "ListAllBuckets",
"Effect": "Allow",
"Action": [
"s3:ListAllMyBuckets"
],
"Resource": "*"
}
]
}{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "RolePermissions",
"Effect": "Allow",
"Action": [
"iam:GetRole",
"iam:PassRole"
],
"Resource": "<location_role_arn>"
},
{
"Sid": "AccessGrants",
"Effect": "Allow",
"Action": [
"s3:CreateAccessGrant",
"s3:DeleteAccessGrantsLocation",
"s3:GetAccessGrantsLocation",
"s3:CreateAccessGrantsLocation",
"s3:GetAccessGrantsInstance",
"s3:GetAccessGrantsInstanceForPrefix",
"s3:GetAccessGrantsInstanceResourcePolicy",
"s3:ListAccessGrants",
"s3:ListAccessGrantsLocations",
"s3:ListAccessGrantsInstances",
"s3:DeleteAccessGrant",
"s3:GetAccessGrant"
],
"Resource": [
"<access_grants_instance_arn>"
]
}
]
}{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "sso",
"Effect": "Allow",
"Action": [
"sso:DescribeInstance",
"sso:DescribeApplication",
"sso-directory:DescribeUsers"
],
"Resource": [
"<iam_identity_center_instance_arn>",
"<iam_identity_center_application_arn_for_s3_access_grants>",
"arn:aws:identitystore:::user/*",
"arn:aws:identitystore::<aws_account>:identitystore/<identity_store_id>"
]
},
{
"Sid": "idc",
"Effect": "Allow",
"Action": [
"identitystore:DescribeUser",
"identitystore:DescribeGroup"
],
"Resource": [
"<iam_identity_center_instance_arn>",
"<iam_identity_center_application_arn_for_s3_access_grants>",
"arn:aws:identitystore:::user/*",
"arn:aws:identitystore::<aws_account>:identitystore/<identity_store_id>"
]
}
]
}{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<IMMUTA_AWS_ACCOUNT_ID>:root"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "<EXTERNAL_ID>"
}
}
}
]
}Snowflake Enterprise Edition required
In this integration, Immuta manages access to Snowflake tables by administering Snowflake row access policies and column masking policies on those tables, allowing users to query tables directly in Snowflake while dynamic policies are enforced.
Like with all Immuta integrations, Immuta can inject its ABAC model into policy building and administration to remove policy management burden and significantly reduce role explosion.
When an administrator configures the Snowflake integration with Immuta, Immuta creates an IMMUTA database and schemas (immuta_procedures, immuta_policies, and immuta_functions) within Snowflake to contain policy definitions and user entitlements. Immuta then creates a system role and gives that system account the privileges required to orchestrate policies in Snowflake and maintain state between Snowflake and Immuta. See the for a list of privileges, the user they must be granted to, and an explanation of why they must be granted.
An Immuta application administrator and registers Snowflake warehouse and databases with Immuta.
Immuta creates a database inside the configured Snowflake warehouse that contains Immuta policy definitions and user entitlements.
A data owner .
When Immuta users create policies, they are then pushed into the Immuta database within Snowflake; there, the Immuta system account orchestrates Snowflake and directly onto Snowflake tables. Changes in Immuta policies, user attributes, or data sources trigger webhooks that keep the Snowflake policies up-to-date.
For a user to query Immuta-protected data, they must meet two qualifications:
They must be subscribed to the Immuta data source.
They must be granted SELECT access on the table by the Snowflake object owner or automatically via the .
After a user has met these qualifications they can query Snowflake tables directly.
See the integration support matrix on the for a list of supported data policy types in Snowflake.
The privileges Immuta issues to users when they are subscribed to a data source vary depending on the object type. See an outline of privileges granted by Immuta on Snowflake object types on the .
When a user applies a masking policy to a Snowflake data source, Immuta truncates masked values to align with Snowflake column length ( types) and precision ( types) requirements.
Consider these columns in a data source that have the following masking policies applied.
Column A (VARCHAR(6)): Mask using hashing for everyone
Column B (VARCHAR(5)): Mask using a constant REDACTED for everyone
Column C (VARCHAR(6)): Mask by making null for everyone
Querying this data source in Snowflake would return the following values:
Hashing collisions
Hashing collisions are more likely to occur across or within Snowflake columns restricted to short lengths, since Immuta truncates the hashed value to the limit of the column. (Hashed values truncated to 5 characters have a higher risk of collision than hashed values truncated to 20 characters.) Therefore, avoid applying hashing policies to Snowflake columns with such restrictions.
For more details about Snowflake column length and precision requirements, see the documentation.
When a policy is applied to a column, Immuta uses to cache the result of the called function. Then, when a user queries a column that has that policy applied to it, Immuta uses that cached result to dramatically improve query performance.
The privilege grants the Snowflake integration requires align to the least privilege security principle. The table below describes each privilege required in Snowflake for the setup user, the IMMUTA_SYSTEM_ACCOUNT user, or the metadata registration user. The references to IMMUTA_DB , IMMUTA_WH, and IMMUTA_IMPERSONATOR_ROLE in the table can be replaced with what you chose for the name of your Immuta database, warehouse, and impersonation role when setting up the integration, respectively.
The definitions for each status and the state of configured data platform integrations is available in the .
Register Snowflake data sources using a dedicated Snowflake role. Avoid using individual user accounts for data source onboarding. Instead, create a service account (Snowflake user account TYPE=SERVICE) with SELECT access for onboarding data sources. No policies will apply to that account, ensuring that your integration works with the following use cases:
Deprecation notice
Support for Snowflake project workspaces has been deprecated. See the for EOL dates.
: Snowflake workspaces generate static views with the credentials used to register the table as an Immuta data source. Those tables must be registered in Immuta by an excepted role so that policies applied to the backing tables are not applied to the project workspace views.
Using views and tables within Immuta: Because this integration uses Snowflake governance policies, users can register tables and views as Immuta data sources. However, if you want to register views and apply different policies to them than their backing tables, the owner of the view must be an ; otherwise, the backing table’s policies will be applied to that view.
Private preview: This feature is available to select accounts. Contact your Immuta representative to enable this feature.
Bulk data source creation is the more efficient process when loading more than 5000 data sources from Snowflake and allows for data sources to be registered in Immuta before running identification or applying policies.
To use this feature, see the .
Based on performance tests that create 100,000 data sources, Immuta recommends a SaaS XL environment.
Performance gains are limited when enabling identification at the time of data source creation.
External catalog integrations are not recognized during bulk data source creation. Users must manually trigger a catalog sync for tags to appear on the data source through the data source's health check.
Excepted roles and users are assigned when the integration is installed, and no policies will apply to these users' queries, despite any Immuta policies enforced on the tables they are querying. Credentials used to register a data source in Immuta will be automatically added to this excepted list for that Snowflake table. Consequently, roles and users added to this list and used to register data sources in Immuta should be limited to service accounts.
Immuta excludes the listed roles and users from policies by wrapping all policies in a CASE statement that will check if a user is acting under one of the listed usernames or roles. If a user is, then the policy will not be acted on the queried table. If the user is not, then the policy will be executed like normal. Immuta does not distinguish between role and username, so if you have a role and user with the exact same name, both the user and any user acting under that role will have full access to the data sources and no policies will be enforced for them.
The Snowflake integration supports the following authentication methods to configure the integration and create data sources.
Username and password: Users can authenticate with their Snowflake username and password.
Key pair: Users can authenticate with a .
Snowflake External OAuth: Users can authenticate with .
Immuta's OAuth authentication method uses the to integrate with Snowflake External OAuth. When a user configures the Snowflake integration or connects a Snowflake data source, Immuta uses the token credentials (obtained using a certificate or passing a client secret) to craft an authenticated access token to connect with Snowflake. This allows organizations that already use Snowflake External OAuth to use that secure authentication with Immuta.
An Immuta application administrator configures the Snowflake integration or creates a data source.
Immuta creates a custom token and sends it to the authorization server.
The authorization server confirms the information sent from Immuta and issues an access token to Immuta.
The Immuta Snowflake integration supports the following Snowflake features.
: While Immuta does not persist any of your data, , like when a user generates a data source fingerprint. This data is encrypted using TLS from the data source to Immuta as it traverses the public internet. Alternatively, Immuta can be connected to a user's Snowflake Account over either AWS PrivateLink or Azure Private Link so that any data moving between the user's data source and the Immuta tenant is over a private network.
: However, you cannot add a masking policy to an external table column while creating the external table in Snowflake because masking policies cannot be attached to virtual columns.
: Users can have additional write access in their integration using project workspaces.
: Immuta automatically ingests Snowflake object tags from your Snowflake instance and adds them to the appropriate data sources.
: Impersonation allows users to query data as another Immuta user. For details about enabling impersonation, see the below.
Deprecation notice
Support for this feature has been deprecated. See the for EOL dates.
Immuta system account required Snowflake privileges
CREATE [OR REPLACE] PROCEDURE
DROP ROLE
Users can have additional write access in their integration using project workspaces. For more details, see the page.
To use project workspaces with the Snowflake integration, the default role of the account used to create data sources in the project must be added to the "Excepted Roles/Users List." If the role is not added, you will not be able to query the equalized view using the project role in Snowflake.
Impersonation allows Immuta users or system accounts to query data sources they are subscribed to as another Immuta user.
Consider the following users and their data source subscriptions.
User 1 data source subscriptions
HR data source
Research data source
User 2 data source subscriptions
HR data source
Consumer report data source
The table below illustrates what data is returned when User 1, User 2, and User 2 impersonating User 1 query these data sources.
Users with the APPLICATION_ADMIN Immuta permission can enable impersonation when configuring a Snowflake integration for the first time or edit an existing integration to enable impersonation (or change the impersonation role):
New integrations: Use the to enable impersonation for a new Snowflake integration.
Existing integrations: Use the to enable impersonation for existing Snowflake integrations or to edit the impersonation role for an integration that has impersonation already enabled.
Snowflake query audit will show the user running the queries as the user logged in to Snowflake, not as the user they are impersonating.
You can enable Snowflake tag ingestion so that Immuta will ingest Snowflake object tags from your Snowflake instance into Immuta and add them to the appropriate data sources.
The Snowflake tags' key and value pairs will be reflected in Immuta as two levels: the key will be the top level and the value the second. As Snowflake tags are hierarchical, Snowflake tags applied to a database will also be applied to all of the schemas in that database, all of the tables within those schemas, and all of the columns within those tables. For example: If a database is tagged PII, all of the tables and columns in that database will also be tagged PII.
Snowflake tag ingestion supports two authentication methods:
Username and password
Key pair
To enable Snowflake tag ingestion, see the page.
Credentials
If you want all Snowflake data sources to have Snowflake data tags ingested into Immuta, ensure the credentials provided on the for the external catalog feature can access all the data sources registered in Immuta. Any data sources the credentials do not have access to will not be tagged in Immuta. In practice, it is recommended to just use the same credentials for the (or ) and tag ingestion.
Snowflake has some . If you manually refresh the governance page to see all tags created globally, users can experience a delay of up to two hours. However, if you run schema detection or a health check to find where those tags are applied, the delay will not occur because Immuta will only refresh tags for those specific tables.
The Snowflake integration audits Immuta user queries run in the integration's warehouses by running a query in Snowflake to retrieve user query histories. Those histories are then populated into audit logs. See the for details about the contents of the logs.
The audit ingest is set when . The default ingest frequency is every hour, but this can be configured to a different frequency on the . Additionally, audit ingestion can be manually requested at any time from the Immuta audit page. When manually requested, it will only search for new queries that were created since the last query that had been audited. The job is run in the background, so the new queries will not be immediately available.
A user can to a single Immuta tenant and use them dynamically.
There can only be one integration connection with Immuta per host.
The host of the data source must match the host of the integration for policies to apply.
Projects can only be configured to use one Snowflake host.
If there are errors in generating or applying policies natively in Snowflake, the data source will be locked and only users on the and the credentials used to create the data source will be able to access the data.
Once a Snowflake integration is disabled in Immuta, the user must remove the access that was granted in Snowflake. If that access is not revoked, users will be able to access the raw table in Snowflake.
Migration must be done using the credentials and credential method (automatic or bootstrap) used to configure the integration.
The Immuta Snowflake integration uses Snowflake governance features to let users query data natively in Snowflake. This means that Immuta also inherits some Snowflake limitations using correlated subqueries with and . These limitations appear when writing , but do not remove the utility of row-level policies.
Requirement for a custom WHERE policy: The Immuta system account must have SELECT privileges on all tables/views referenced in a subquery. The Immuta system role name is specified by the user, and the role is created when the Snowflake instance is integrated.
Any subqueries that error in Snowflake will also error in Immuta.
Including one or more subqueries in the Immuta policy condition may cause errors in Snowflake. If an error occurs, it may happen during policy creation or at query-time. To avoid these errors, limit the number of subqueries, limit the number of JOIN operations, and simplify WHERE clause conditions.
For more information on the Snowflake subquery limitations see
A data owner, data governor, or administrator creates or changes a policy or a user's attributes change in Immuta.
The Immuta web service calls a stored procedure that modifies the user entitlements or policies.
Immuta manages and applies Snowflake governance column and row access policies to Snowflake tables that are registered as Immuta data sources.
If Snowflake table grants is not enabled, Snowflake object owner or user with the global MANAGE GRANTS privilege grants SELECT privilege on relevant Snowflake tables to users. Note: Although they are GRANTed access, if they are not subscribed to the table via Immuta-authored policies, they will not see data.
A Snowflake user who is subscribed to the data source in Immuta queries the corresponding table directly in Snowflake and sees policy-enforced data.
The setup script this user runs creates the IMMUTA_SYSTEM_ACCOUNT user that Immuta will use to manage the integration.
MANAGE GRANTS ON ACCOUNT
Setup user
All
The user configuring the integration must be able to GRANT global privileges and access to objects within the Snowflake account. All privileges that are documented here are granted to the IMMUTA_SYSTEM_ACCOUNT user by this setup user.
OWNERSHIP ON ROLE IMMUTA_IMPERSONATOR_ROLE
IMMUTA_SYSTEM_ACCOUNT user
Impersonation
If impersonation is enabled, Immuta must be able to manage the Snowflake roles used for impersonation, which is created when the setup script runs, in order to manage the impersonation feature.
ALL PRIVILEGES ON DATABASE IMMUTA_DB
ALL PRIVILEGES ON ALL SCHEMAS IN DATABASE IMMUTA_DB
USAGE ON FUTURE PROCEDURES IN SCHEMA IMMUTA_DB.IMMUTA_PROCEDURES
IMMUTA_SYSTEM_ACCOUNT user
All
The setup script grants the Immuta system account user these privileges because Immuta must have full ownership of the Immuta database where Immuta objects are managed.
USAGE ON WAREHOUSE IMMUTA_WH
IMMUTA_SYSTEM_ACCOUNT user
All
To make changes to state in the Immuta database, Immuta requires access to compute (a Snowflake warehouse). Some state changes are DDL operations, and others are DML and require compute.
IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE
IMMUTA_SYSTEM_ACCOUNT user
Audit
To ingest audit information from Snowflake, Immuta must have access to the SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY view. See the for details.
APPLY MASKING POLICY ON ACCOUNT
APPLY ROW ACCESS POLICY ON ACCOUNT
IMMUTA_SYSTEM_ACCOUNT user
Snowflake integration with governance features enabled
Immuta must be able to apply policies to objects throughout your organization's Snowflake account and query for existing policies on objects using the POLICY_REFERENCES .
MANAGE GRANTS ON ACCOUNT
IMMUTA_SYSTEM_ACCOUNT user
Table grants
Immuta must be able to MANAGE GRANTS on objects throughout your organization's Snowflake account.
CREATE ROLE ON ACCOUNT
IMMUTA_SYSTEM_ACCOUNT user
Table grants
When using the table grants feature, Immuta must be able to create roles as targets for Immuta subscription policy permissions in your organization’s Snowflake account.
USAGE on all databases and schemas with registered data sources
REFERENCES on all tables and views registered in Immuta
Metadata registration user
Data source registration
Immuta must be able to see metadata on securables to register them as data sources and populate the data dictionary.
SELECT on all tables and views registered in Immuta
Metadata registration user
Identification and specialized masking policies that require fingerprinting
Immuta must have this privilege to run the necessary queries for on your data sources.
APPLY TAG ON ACCOUNT
Metadata registration user
Tag ingestion
To ingest table, view, and column tag information from Snowflake, Immuta must have this permission. Immuta reads from the TAG_REFERENCES .
IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE
Metadata registration user
Tag ingestion
To ingest table, view, and column tag information from Snowflake, Immuta must have access to the SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY view. See the for details.
USAGE ON DATABASE IMMUTA_DB
USAGE ON SCHEMA IMMUTA_DB.IMMUTA_PROCEDURES
USAGE ON SCHEMA IMMUTA_DB.IMMUTA_FUNCTIONS
PUBLIC role
All
Immuta has stored procedures and functions that are used for policy enforcement and do not expose or contain any sensitive information. These objects must be accessible by all users to facilitate the use and creation of policies or views to enforce Immuta policies in Snowflake.
SELECT ON IMMUTA_DB.IMMUTA_SYSTEM.ALLOW_LIST
PUBLIC role
All
Immuta retains a list of excepted roles and users when using the Snowflake integration. The roles and users in this list will be exempt from policies applied to tables in Snowflake to give organizations flexibility in case there are entities that should not be bound to Immuta policies in Snowflake (for example, a system or application role or user).
Snowflake authenticates the token and grants access to the requested resources from Immuta.
The integration is connected and users can query data.
✅
External table
✅
✅
✅
Event table
✅
✅
✅
Iceberg table
✅
✅
✅
Dynamic table
✅
✅
✅
Query audit: Immuta audits queries run in Snowflake against Snowflake data registered as Immuta data sources.
Snowflake low row access policy mode: The Snowflake low row access policy mode improves query performance in Immuta's Snowflake integration by decreasing the number of Snowflake row access policies Immuta creates.
Snowflake table grants: This feature allows Immuta to manage privileges on your Snowflake tables and views according to the subscription policies on the corresponding Immuta data sources.
REVOKE ROLE
address column is masked
When configuring one Snowflake instance with multiple Immuta tenants, the user or system account that enables the integration on the app settings page must be unique for each Immuta tenant.
You cannot add a masking policy to an external table column while creating the external table because a masking policy cannot be attached to a virtual column.
If you create an Immuta data source from a Snowflake view created using a select * from query, Immuta column detection will not work as expected because Snowflake views are not automatically updated based on backing table changes. To remedy this, you can create views that have the specific columns you want or you can CREATE AND REPLACE the view in Snowflake whenever the backing table is updated and manually run the column detection job on the data source page.
If a user is created in Snowflake after that user is already registered in Immuta, Immuta does not grant usage on the per-user role automatically - meaning Immuta does not govern this user's access without manual intervention. If a Snowflake user is created after that user is registered in Immuta, the user account must be disabled and re-enabled to trigger a sync of Immuta policies to govern that user. Whenever possible, Snowflake users should be created before registering those users in Immuta.
Snowflake tables from imported databases are not supported. Instead, create a view of the table and register that view as a data source.
Impersonation is not supported in Snowflake if table grants or low row access policy mode is enabled.
5w4502
REDAC
null
990
6e3611
REDAC
null
750
9s7934
REDAC
null
CREATE DATABASE ON ACCOUNT WITH GRANT OPTION
Setup user
All
The setup script this user runs creates an Immuta database in your organization's Snowflake account where all Immuta managed objects (UDFs, masking policies, row access policies, and user entitlements) will be written and stored.
CREATE ROLE ON ACCOUNT WITH GRANT OPTION
Setup user
All
The setup script this user runs creates a ROLE for Immuta that will be used to manage the integration once it has been initialized.
CREATE USER ON ACCOUNT WITH GRANT OPTION
Setup user
Table
✅
✅
✅
View
✅
✅
✅
Materialized view
✅
User 1
name column is masked
ssn column is masked with hashing
❌ Denied access
User 2
All data visible
❌ Denied access
address column is masked
User 2 impersonating User 1
name column is masked

380
All
✅
❌ Denied access
Immuta’s integration with Unity Catalog allows you to enforce fine-grained access controls on Unity Catalog securable objects with Immuta policies. Instead of manually creating UDFs or granting access to each table in Databricks, you can author your policies in Immuta and have Immuta manage and orchestrate Unity Catalog access-control policies on your data in Databricks clusters or SQL warehouses:
Subscription policies: Immuta subscription policies automatically grant and revoke access to specific Databricks securable objects.
: Immuta data policies enforce row- and column-level security.
USAGE ON FUTURE FUNCTIONS IN SCHEMA IMMUTA_DB.IMMUTA_FUNCTIONS
USAGE ON SCHEMA IMMUTA_DB.IMMUTA_SYSTEM
SELECT ON IMMUTA_DB.IMMUTA_SYSTEM.USER_PROFILE
Unity Catalog uses the following hierarchy of data objects:
Metastore: Created at the account level and is attached to one or more Databricks workspaces. The metastore contains metadata of all the catalogs, schemas, and tables available to query. All clusters on that workspace use the configured metastore and all workspaces that are configured to use a single metastore share those objects.
Catalog: Sits on top of schemas (also called databases) and tables to manage permissions across a set of schemas
Schema: Organizes tables and views
Table-etc: Table (managed or external tables), view, volume, model, and function
For details about the Unity Catalog object model, see the Databricks Unity Catalog documentation.
The Databricks Unity Catalog integration supports
applying column masks and row filters on specific securable objects
applying subscription policies on tables and views
enforcing Unity Catalog access controls, even if Immuta becomes disconnected
allowing non-Immuta reads and writes
using Photon
using a proxy server
Unity Catalog supports managing permissions account-wide in Databricks through controls applied directly to objects in the metastore. To establish a connection with Databricks and apply controls to securable objects within the metastore, Immuta requires a service principal with privileges to manage all data protected by Immuta. Databricks OAuth for service principals (OAuth M2M) or a personal access token (PAT) can be provided for Immuta to authenticate as the service principal. See the Databricks Unity Catalog privileges section for a list of specific Databricks privileges.
Immuta uses this service principal to run queries that set up user-defined functions (UDFs) and other data necessary for policy enforcement. Upon enabling the integration, Immuta will create a catalog that contains these schemas:
immuta_system: Contains internal Immuta data.
immuta_policies_n: Contains policy UDFs.
When policies require changes to be pushed to Unity Catalog, Immuta updates the internal tables in the immuta_system schema with the updated policy information. If necessary, new UDFs are pushed to replace any out-of-date policies in the immuta_policies_n schemas and any row filters or column masks are updated to point at the new policies. Many of these operations require compute on the configured Databricks cluster or SQL warehouse, so compute must be available for these policies to succeed.
Workspace-catalog binding allows users to leverage Databricks’ catalog isolation mode to limit catalog access to specific Databricks workspaces. The default isolation mode is OPEN, meaning all workspaces can access the catalog (with the exception of the automatically-created workspace catalog), provided they are in the metastore attached to the catalog. Setting this mode to ISOLATED allows the catalog owner to specify a workspace-catalog binding, which means the owner can dictate which workspaces are authorized to access the catalog. This prevents other workspaces from accessing the specified catalogs. To bind a catalog to a specific workspace in Databricks Unity Catalog, see the Databricks documentation.
Typical use cases for binding a catalog to specific workspaces include
Ensuring users can only access production data from a production workspace environment.
For example, you may have production data in a prod_catalog, as well as a production workspace you are introducing to your organization. Binding the prod_catalog to the prod_workspace ensures that workspace admins and users can only access prod_catalog from the prod_workspace environment.
Ensuring users can only process sensitive data from a specific workspace. Limiting the environments from which users can access sensitive data helps better secure your organization’s data. Limiting access to one workspace also simplifies any monitoring, auditing, and understanding of which users are accessing specific data. This would entail a similar setup as the example above.
Giving users read-only access to production data from a developer workspace.
This enables your organization to effectively conduct development and testing, while minimizing risk to production data. All user access to this catalog from this workspace can be specified as read-only, ensuring developers can access the data they need for testing without risk of any unwanted updates.
This feature is currently only supported by the legacy Databricks Unity Catalog integration.
Immuta’s Databricks Unity Catalog integration allows users to configure additional workspace connections to support using Databricks' workspace-catalog binding feature. Users can configure additional workspace connections in their Immuta integrations to be consistent with the workspace-catalog bindings that are set up in Databricks. Immuta will use each additional workspace connection to govern the catalog(s) that workspace is bound to in Databricks. If desired, each set of bound catalogs can also be configured to run on its own compute.
To use this feature, you should first set up a workspace-catalog binding in your Databricks account. Once that is configured, you can use Immuta's Integrations API to configure an additional workspace connection. This can be added when you initially set up the integration or by updating your existing integration configuration.
Limitations
Additional workspace connections in Databricks Unity Catalog are not currently supported in Immuta's connections.
Each additional workspace connection must be in the same metastore as the primary workspace used to set up the integration.
No two additional workspace connections can be responsible for the same catalog.
The privileges the Databricks Unity Catalog integration requires align to the least privilege security principle. The table below describes each privilege required in Databricks Unity Catalog for the setup user and the Immuta service principal.
Account admin
Setup user
This privilege allows the setup user to grant the Immuta service principal the necessary permissions to orchestrate Unity Catalog access controls and maintain state between Immuta and Databricks Unity Catalog.
CREATE CATALOG on the Unity Catalog metastore
Setup user
This privilege allows the setup user to create an Immuta-owned catalog and tables.
Metastore admin
Setup user
This privilege is required only if enabling query audit, which requires granting access to system tables to the Immuta service principal. To grant access, a user that is both a metastore admin and an account admin must grant USE and SELECT permissions on the system schemas to the service principal. See for more details.
USE CATALOG and MANAGE on all catalogs containing securables registered as Immuta data sources
USE SCHEMA on all schemas containing securables registered as Immuta data sources
Immuta service principal
Immuta’s Unity Catalog integration applies Databricks table-, row-, and column-level security controls that are enforced natively within Databricks. Immuta's management of these Databricks security controls is automated and ensures that they synchronize with Immuta policy or user entitlement changes.
Table-level security: Immuta manages REVOKE and GRANT privileges on Databricks securable objects that have been registered as Immuta data sources. When you register a data source in Immuta, Immuta uses the Unity Catalog API to issue GRANTS or REVOKES against the catalog, schema, or table in Databricks for users registered in Immuta.
Row-level security: Immuta applies SQL UDFs to restrict access to rows for querying users.
Column-level security: Immuta applies column-mask SQL UDFs to tables for querying users. These column-mask UDFs run for any column that requires masking.
On securable objects
If you enable a Databricks Unity Catalog object in Immuta, Immuta will only manage users' access to that data object after a subscription policy grants them access to the data source. Immuta preserves all preexisting grants in Databricks and only revokes user access when explicitly dictated by an Immuta policy.
Expand the collapsible blocks below to see how Immuta-managed grants and Databricks-managed grants are affected in various scenarios.
Immuta only manages grants for Immuta users once a subscription policy applies to them. Any grants performed on Databricks objects outside Immuta will not be revoked.
The table below illustrates how the following policy enforces access controls for 4 different users, some of whom have Databricks-managed grants:
Allow users to subscribe to the data source if they are a member of group
HR
In this example,
User A is granted access to the table by Immuta.
User B is granted access to the table by Databricks.
User C cannot access the table because they do not meet the conditions of the Immuta policy and they have been not granted access to the table in Databricks outside of Immuta.
If a subscription policy is edited, Immuta only affects the Immuta-managed grants on the data object.
The table below illustrates how the editing the following policy enforces access controls for 4 different users, some of whom have Databricks-managed grants:
Previous policy
Allow users to subscribe to the data source if they are a member of group
HR
Updated policy
Allow users to subscribe if they are a member of group
Engineers
In this example,
User A is revoked access to the table by Immuta. Because this user was previously granted access to the data source by Immuta, that Immuta-managed grant is revoked.
User B is granted access to the table by Databricks and Immuta. The Immuta-managed SELECT grant coexists with their Databricks-managed SELECT grant.
If a subscription policy is deleted, users' Immuta grants are revoked for that policy. Users will retain access to the data source if they have Immuta-granted access through another subscription policy that overlaps or if they have a Databricks-managed grant for that data source.
The table below illustrates how deleting the following Immuta policy affects access for 4 different users, some of whom have Databricks-managed grants:
Allow users to subscribe to the data source if they are a member of group
HR
In this example,
User A is revoked access to the table by Immuta. Because this user was to the data source by Immuta, that Immuta-managed grant is revoked when the subscription policy is deleted.
User B is granted access to the table by Databricks.
User C cannot access the table because they have not been granted access by an Immuta policy and they have been not granted access to the table in Databricks.
If a data source is disabled, users' access reverts to what the Databricks-managed access was previously, before the data object was registered in Immuta.
The table below illustrates how disabling the data source in Immuta affects access for 4 different users, some of whom have Databricks-managed grants.
In this example,
User A is revoked access to the table by Immuta. The Immuta-managed grant is revoked when the data source is disabled because the state of the grants on this data object reverts to what it was before the data object was registered in Immuta.
User B is granted access to the table by Databricks. The Databricks-managed grant remains because the state of the grants on this data object reverts to what it was before the data object was registered in Immuta.
User C cannot access the table because they had not been granted access to the table in Databricks.
If a user is not registered in Immuta, Immuta will have no effect on that user's access to data in Unity Catalog.
On schemas and catalogs
By default, Immuta will revoke Immuta users' USE CATALOG and USE SCHEMA privileges in Unity Catalog for users that do not have access to any of the underlying securables within that catalog/schema. If users have any Immuta-managed or Databricks-managed grants to a securable, Immuta will not revoke that catalog/schema access.
If you disable this setting, Immuta will only revoke the permissions granted on the securable objects themselves, and users' USE CATALOG and USE SCHEMA permissions will remain even if the user does not have access to any resource in that catalog/schema.
See the App settings page for instructions on changing this setting.
The Unity Catalog integration supports the following policy types:
Conditional masking
Constant
Custom masking
Hashing
Null (including on ARRAY, MAP, and STRUCT type columns)
Regex: You must use the global regex flag (g) when creating a regex masking policy in this integration. You cannot use the case insensitive regex flag (i) when creating a regex masking policy in this integration. See the for examples.
Rounding (date and numeric rounding)
Matching (only show rows where)
Custom WHERE
The privileges Immuta issues to users when they are subscribed to a data source vary depending on the object type. See an outline of privileges granted by Immuta on the Subscription policy access types page.
Project-scoped purpose exceptions for Databricks Unity Catalog integrations allow you to apply purpose-based policies to Databricks data sources in a project. As a result, users can only access that data when they are working within that specific project.
If you are using views in Databricks Unity Catalog, one of the following must be true for project-scoped purpose exceptions to apply to the views in Databricks:
The view and underlying table are registered as Immuta data sources and added to a project: If a view and its underlying table are both added as Immuta data sources, both of these assets must be added to the project for the project-scoped purpose exception to apply. If a view and underlying table are both added as data sources but the table is not added to an Immuta project, the purpose exception will not apply to the view because Databricks does not support fine-grained access controls on views.
Only the underlying table is registered as an Immuta data source and added to a project: If only the underlying table is registered as an Immuta data source but the view is not registered, the purpose exception will apply to both the table and corresponding view in Databricks. Views are the only Databricks object that will have Immuta policies applied to them even if they're not registered as Immuta data sources (as long as their underlying tables are registered).
This feature allows masked columns to be joined across data sources that belong to the same project. When data sources do not belong to a project, Immuta uses a unique salt per data source for hashing to prevent masked values from being joined. (See the Why use masked joins? guide for an explanation of that behavior.) However, once you add Databricks Unity Catalog data sources to a project and enable masked joins, Immuta uses a consistent salt across all the data sources in that project to allow the join.
For more information about masked joins and enabling them for your project, see the Masked joins section of documentation.
The Databricks group configured as the policy exemption group in Immuta will be exempt from Immuta data policy enforcement. This account-level group is created and managed in Databricks, not in Immuta.
If you have service or system accounts that need to be exempt from masking and row-level policy enforcement, add them to an account-level group in Databricks and include this group name in the Databricks Unity Catalog configuration in Immuta. Then, group members will be excluded from having data policies applied to them when they query Immuta-protected tables in Databricks.
Typically, service or system accounts that perform the following actions are added to an exemption group in Databricks:
Automated queries
ETL
Report generation
If you have multiple groups that must be exempt from data policies, add each group to a single group in Databricks that you then set as the policy exemption group in Immuta.
The service principal used to register data sources in Immuta will be automatically added to the exemption group for the Databricks securables it registers. Consequently, accounts added to the exemption group and used to register data sources in Immuta should be limited to service accounts.
To configure a policy exemption group, use the groupPattern object when setting up the integration using the connections API.
When enabling Unity Catalog support in Immuta, the catalog for all Databricks data sources will be updated to point at the default hive_metastore catalog. Internally, Databricks exposes this catalog as a proxy to the workspace-level Hive metastore that schemas and tables were kept in before Unity Catalog. Since this catalog is not a real Unity Catalog catalog, it does not support any Unity Catalog policies. Therefore, Immuta will ignore any data sources in the hive_metastore in any Databricks Unity Catalog integration, and policies will not be applied to tables there.
However, with Databricks metastore magic you can use hive_metastore and enforce subscription and data policies with the Databricks Spark integration.
The Databricks Unity Catalog integration supports the following authentication methods to configure the integration and create data sources:
Personal access token (PAT): This is the access token for the Immuta service principal. This service principal must have the metastore privileges listed in the permissions section for the metastore associated with the Databricks workspace. If this token is configured to expire, update this field regularly for the integration to continue to function.
OAuth machine-to-machine (M2M): Immuta uses the Client Credentials Flow to integrate with Databricks OAuth machine-to-machine authentication, which allows Immuta to authenticate with Databricks using a client secret. Once Databricks verifies the Immuta service principal’s identity using the client secret, Immuta is granted a temporary OAuth token to perform token-based authentication in subsequent requests. When that token expires (after one hour), Immuta requests a new temporary token. See the Databricks OAuth machine-to-machine (M2M) authentication page for more details.
The Unity Catalog data object model introduces a 3-tiered namespace, as outlined above. Consequently, your Databricks tables registered as data sources in Immuta will reference the catalog, schema (also called a database), and table.
Table
✅
✅
✅
View
✅
❌
✅
Materialized view
✅
External data connectors and query-federated tables are preview features in Databricks. See the Databricks documentation for details about the support and limitations of these features before registering them as data sources in the Unity Catalog integration.
Access requirements
For Databricks Unity Catalog audit to work, Immuta must have, at minimum, the following access:
USE CATALOG on the system catalog
USE SCHEMA on the system.access and system.query schemas
SELECT on the following system tables:
system.access.table_lineage
system.access.column_lineage
Immuta uses Databricks tables from the system catalog to understand the queries users make and present them in the query audit logs. See the Databricks Unity Catalog audit page for details about the contents of the logs.
The audit ingest is set when registering the connection and can be scoped to only ingest specific workspaces if needed. The default ingest frequency is every hour, but this can be configured to a different frequency on the Immuta app settings page. Additionally, audit ingestion can be manually requested at any time from the Immuta audit page. When manually requested, it will only search for new queries that were created since the last query that had been audited. The job is run in the background, so the new queries will not be immediately available.
Private preview: This feature is only available to select accounts. Contact your Immuta representative to enable this feature.
You can enable tag ingestion to allow Immuta to ingest Databricks Unity Catalog table and column tags so that you can use them in Immuta policies to enforce access controls. When you enable this feature, Immuta uses the credentials and connection information from the Databricks Unity Catalog integration to pull tags from Databricks and apply them to data sources as they are registered in Immuta. If Databricks data sources preexist the Databricks Unity Catalog tag ingestion enablement, those data sources will automatically sync to the catalog and tags will apply.
Immuta checks for changes to tags in Databricks and syncs Immuta data sources to those changes every hour by default. Immuta's tag ingestion process has a delta logic in order to establish all resources that have had a tag or description change inside Databricks Unity Catalog within a given timeframe to reduce excessive processing time and reduce compute cost.
Access requirements for Databricks Unity Catalog tag ingestion (delta logic)
Since the delta logic leverages the system.access.audit table in Databricks, Immuta must have, at minimum, the following access:
USE CATALOG on the system catalog
USE SCHEMA on the system.access schema
SELECT on the following system table:
system.access.audit
Note that without these permissions, Immuta will not be able to process any tag changes post the initial onboarding of data sources.
Once external tags are applied to Databricks data sources, those tags can be used to create subscription and data policies.
To enable Databricks Unity Catalog tag ingestion, see the Register a Databricks Unity Catalog connection page.
After making changes to tags in Databricks, you can manually sync the catalog so that the changes immediately apply to the data sources in Immuta. Otherwise, tag changes will automatically sync within a one hour timeframe. Please note that you may see this timeframe being exceeded in cases where Immuta has to process a lot of tag changes.
When syncing data sources to Databricks Unity Catalog tags, Immuta pulls the following information:
Table tags: These tags apply to the table and appear on the data source details tab. Databricks tags' key and value pairs are reflected in Immuta as a hierarchy with each level separated by a . delimiter. For example, the Databricks Unity Catalog tag Location: US would be represented as Location.US in Immuta.
Column tags: These tags are applied to data source columns and appear on the columns listed in the data dictionary tab. Databricks tags' key and value pairs are reflected in Immuta as a hierarchy with each level separated by a . delimiter. For example, the Databricks Unity Catalog tag Location: US would be represented as Location.US in Immuta.
Table comments field: This content appears as the data source description on the data source details tab.
Column comments field: This content appears as dictionary column descriptions on the data dictionary tab.
Only tags that apply to Databricks data sources in Immuta are available to build policies in Immuta. Immuta will not pull tags in from Databricks Unity Catalog unless those tags apply to registered data sources.
Cost implications: Tag ingestion in Databricks Unity Catalog requires compute resources. Therefore, having many Databricks data sources or frequently manually syncing data sources to Databricks Unity Catalog may incur additional costs.
Databricks Unity Catalog tag ingestion only supports tenants with fewer than 10,000 data sources registered.
See the Register a Databricks Unity Catalog connection guide for a list of requirements.
Row access policies with more than 1023 columns are unsupported. This is an underlying limitation of UDFs in Databricks. Immuta will only create row access policies with the minimum number of referenced columns. This limit will therefore apply to the number of columns referenced in the policy and not the total number in the table.
If you disable table grants, Immuta revokes the grants. Therefore, if users had access to a table before enabling Immuta, they’ll lose access.
If multiple Immuta tenants are connected to your Databricks environment, you must create a separate Immuta catalog for each of those tenants during configuration. Having multiple Immuta tenants use the same Immuta catalog causes failures in policy enforcement.
You must use the global regex flag (g) when creating a regex masking policy in this integration, and you cannot use the case insensitive regex flag (i) when creating a regex masking policy in this integration. See the examples below for guidance:
regex with a global flag (supported): /^ssn|social ?security$/g
regex without a global flag (unsupported): /^ssn|social ?security$/
If a registered data source is owned by a Databricks group at the table level, then the Unity Catalog integration cannot apply data masking policies to that table in Unity Catalog.
Therefore, set all table-level ownership on your Unity Catalog data sources to an individual user or service principal instead of a Databricks group. Catalogs and schemas can still be owned by a Databricks group, as ownership at that level doesn't interfere with the integration.
The following features are currently unsupported:
Immuta project workspaces
Multiple IAMs on a single cluster
Row filters and column masking policies on the following object types:
Functions
Models
Views
Volumes
Mixing masking policies on the same column
R and Scala cluster support
Scratch paths
User impersonation
Policy enforcement on raw Spark reads
Python UDFs for advanced masking functions
Direct file-to-SQL reads
Data policies (except for masking with NULL) on ARRAY, MAP, or STRUCT type columns
Shallow clones
User D is granted access to the table by Immuta and Databricks. The Immuta-managed SELECT grant coexists with their Databricks-managed SELECT grant.
User D
- HR group
Revoked
Revoked
❌
User D is revoked access to the table by the change to the Immuta policy. The Immuta-managed SELECT grant (from the previous policy) and the pre-existing Databricks-managed SELECT grant have been revoked because the Immuta policy explicitly dictated that their access should be removed.
User D is revoked access to the table by Immuta. Because this user was previously granted access to the data source by Immuta, Immuta took over managing this user's grants on the table, and that Immuta-managed grant is revoked when the subscription policy is deleted.
User D is granted access to the table by Databricks. The Immuta-managed grant is revoked and the Databricks grant remains when the data source is disabled because the state of the grants on this data object reverts to what it was before the data object was registered in Immuta.
Where user
Where value in column
Minimization
Time-based restrictions
system.access.audit
system.query.history
regex with a case insensitive flag (unsupported): /^ssn|social ?security$/gi
regex without a case insensitive flag (supported): /^ssn|social ?security$/g
These privileges allow the service principal to apply row filters and column masks on the securable.
MODIFY and SELECT on all securables registered as Immuta data sources
Immuta service principal
These privileges allow the service principal to apply row filters and column masks on the securable. Additionally, they are required for identification to run on the securable.
OWNER on the Immuta catalog
Immuta service principal
The Immuta service principal must own the catalog Immuta creates during setup that stores the Immuta policy information. The Immuta setup script grants ownership of this catalog to the Immuta service principal when you configure the integration.
USE CATALOG on the system catalog
USE SCHEMA on the system.access and system.query schemas
SELECT on the following system tables:
system.access.table_lineage
system.access.column_lineage
Immuta service principal
These privileges allow Immuta to audit user queries in Databricks Unity Catalog.
USE CATALOG on the system catalog
USE SCHEMA on the system.access schema
SELECT on the following system table:
system.access.audit
Immuta service principal
User A
- HR group
SELECT
None
✅
User B
- Engineering group
None
SELECT
✅
User C
None
None
❌
User D
- HR group
SELECT
SELECT
✅
User A
- HR group
Revoked
None
❌
User B
- Engineering group
SELECT
SELECT
✅
User C
None
None
User A
- HR group
None
None
❌
User B
- Engineering group
None
SELECT
✅
User C
None
None
❌
User D
- HR group
None
None
❌
User A
- HR group
None
None
❌
User B
- Engineering group
None
SELECT
✅
User C
None
None
❌
User D
- HR group
None
SELECT
✅
✅
✅
Streaming table
✅
✅
✅
External table
✅
✅
✅
Foreign table
✅
✅
✅
Volumes (external and managed)
✅
❌
✅
Models
✅
❌
✅
Functions
✅
❌
✅
Delta Shares
✅
❌
✅
❌
system.access.auditsystem.query.history
Immuta does not require users to learn a new API or language to access protected data. Instead, Immuta integrates with existing tools and data platforms while remaining invisible to downstream consumers.
The table below outlines features supported by each of Immuta's data platform integrations.
The table below illustrates the subscription policy access types supported by each integration. If a data platform isn't included in the table, that integration does not support any subscription policies. For more details about read and write access policy support for these data platforms, see the .
The table below outlines the types of data policies supported for various data platforms. If a data platform isn't included in the table, that integration does not support any data policies.
For details about each of these policies, see the .
Identification has varied support for data sources from different technologies based on the identifier type. For details about how identification works in Immuta, see the .
The table below outlines what information is included in the query audit logs for each integration where query audit is supported.
Legend:
✅ This is available and the information is included in audit logs.
❌ This is not available and the information is not included in audit logs.
✅
❌ View-based integrations are read-only
✅
✅
✅
❌ Write access is controlled through and
✅
✅
✅
❌ View-based integrations are read-only
✅
✅
✅
✅
✅
✅
✅
❌
✅
❌
✅
✅
❌
Custom function
✅
✅
✅
✅
✅
✅
✅
✅
❌
Format preserving masking
❌
❌
❌
❌
❌
❌
✅
❌
❌
Hashing
✅
✅
✅
✅
✅
✅
✅
✅
❌
Limit to purpose
❌
✅
✅
✅
Supported with caveats
✅
✅
✅
❌
Masking fields within STRUCT columns
❌
❌
❌
✅
Supported with caveats
❌
❌
❌
❌
Minimize
❌
✅
✅
✅
✅
✅
✅
✅
❌
Only show data by time
❌
✅
✅
✅
✅
✅
✅
✅
❌
Only show rows (matching)
✅
✅
✅
✅
✅
✅
✅
✅
✅
Randomized response
❌
❌
❌
❌
❌
❌
✅
❌
❌
Regex
❌
✅
❌
✅
✅
✅
✅
✅
❌
Replace with NULL
✅
✅
✅
Supported with caveats
✅
✅
✅
✅
✅
Replace with constant
✅
✅
✅
Supported with caveats
✅
✅
✅
✅
❌
Reversible masking
❌
✅
❌
✅
❌
❌
✅
✅
❌
Rounding
❌
✅
✅
✅
✅
✅
✅
✅
❌
WHERE clause
✅
✅
✅
✅
✅
✅
✅
✅
✅
AWS Lake Formation
❌
❌
✅
Azure Synapse Analytics
❌
❌
✅
Databricks Lakebase
❌
❌
✅
Databricks Spark
✅
✅
✅
Databricks Unity Catalog
✅
✅
✅
Google BigQuery
❌
❌
✅
MariaDB
❌
❌
❌
MySQL
❌
❌
❌
Oracle
❌
❌
✅
PostgreSQL
❌
❌
✅
Snowflake
✅
✅
✅
SQL Server
❌
❌
✅
Starburst (Trino)
✅
✅
✅
Teradata
❌
❌
✅
Columns returned
✅
❌
✅
✅
Rows returned
✅
❌
✅
✅
Query text
✅
✅
✅
✅
Unauthorized information
Limited support
✅
✅
❌
✅
✅
✅
✅
❌
❌
❌
✅
✅
✅
✅
✅
❌
❌
✅
✅
❌
✅
❌
❌
❌
✅
✅
❌
✅
❌
❌
✅
✅
✅
✅
✅
✅
❌
❌
✅
✅
❌
✅
❌
❌
❌
✅
✅
✅
✅
✅
✅
❌
✅
✅
✅
✅
❌
✅
✅
✅
✅
✅
✅
❌
❌
❌
✅
❌
❌
❌
❌
❌
❌
✅
❌
❌
❌
❌
❌
❌
✅
❌
❌
✅
❌
N/A
❌
✅
✅
❌
✅
❌
❌
❌
✅
✅
✅
✅
Supported with caveats
✅
✅
✅
❌
❌
✅
❌
❌
❌
✅
✅
✅
✅
✅
✅
❌
✅
✅
✅
✅
❌
❌
❌
✅
✅
✅
❌ View-based integrations are read-only
✅
✅
✅
Cell-level masking
❌
✅
✅
Amazon Redshift
❌
❌
✅
Amazon Redshift Spectrum
✅
✅
✅
Amazon S3
❌
❌
Table and user coverage
Registered data sources and users
Registered data sources and users
All tables and users
Registered data sources and users
Object queried
✅
✅
✅
✅
✅
✅
✅


