1 of 93

Integrations

Immuta integrates with your data platforms and external catalogs so you can register your data and effectively manage access controls on that data.

This section includes concept, reference, and how-to guides for configuring your data platform integration, registering data sources, and connecting your external catalog so that you can discover, monitor, and protect sensitive data.

Immuta integrations

This reference guide outlines the features, policies, and audit capabilities supported by each integration.

Snowflake

This section includes how-to and reference guides for Snowflake and how it integrates with Immuta.

Databricks Unity Catalog

This section includes how-to and reference guides for Databricks Unity Catalog and how it integrates with Immuta.

Databricks Spark

This section includes how-to and reference guides for Databricks Spark and how it integrates with Immuta.

Starburst (Trino)

This section includes how-to and reference guides for Starburst (Trino) and how it integrates with Immuta.

Redshift

This section includes how-to and reference guides for Redshift and how it integrates with Immuta.

Azure Synapse Analytics

This section includes how-to and reference guides for Azure Synapse Analytics and how it integrates with Immuta.

Amazon S3

This page includes how-to and reference content for Amazon S3 and how it integrates with Immuta.

Google BigQuery

This page includes how-to and reference content for Google BigQuery and how it integrates with Immuta.

Catalogs

This section covers the various data catalogs Immuta integrates with.

Immuta Integrations

Immuta does not require users to learn a new API or language to access protected data. Instead, Immuta integrates with existing tools and ongoing work while remaining invisible to downstream consumers.

The following data platforms integrate with Immuta:

Snowflake integration: With this integration, policies administered in Immuta are pushed down into Snowflake as Snowflake governance features (row access policies and masking policies).
Databricks:
- Databricks Unity Catalog integration: This integration allows you to manage multiple Databricks workspaces through Unity Catalog while protecting your data with Immuta policies. Instead of manually creating UDFs or granting access to each table in Databricks, you can author your policies in Immuta and have Immuta manage and enforce Unity Catalog access-control policies on your data in Databricks clusters or SQL warehouse.
- Databricks Spark integration: This integration enforces policies on Databricks tables registered as data sources in Immuta, allowing users to query policy-enforced data on Databricks clusters (including job clusters). Immuta policies are applied to the plan that Spark builds for users' queries, all executed directly against Databricks tables.
Google BigQuery: In this integration, Immuta generates policy-enforced views in your configured Google BigQuery dataset for tables registered as Immuta data sources.
Starburst (Trino) integration: The Starburst (Trino) integration allows you to access policy-protected data directly in your Starburst (Trino) catalogs without rewriting queries or changing your workflows. Immuta policies are translated into Starburst (Trino) rules and permissions and applied directly to tables within users’ existing catalogs.
Redshift integration: With the Redshift integration, Immuta applies policies directly in Redshift. This allows data analysts to query their data directly in Redshift instead of going through a proxy.
Azure Synapse Analytics integration: The Azure Synapse Analytics integration allows Immuta to apply policies directly in Azure Synapse Analytics dedicated SQL pools without needing users to go through a proxy. Instead, users can work within their existing Synapse Studio and have per-user policies dynamically applied at query time.
Amazon S3 integration: The Amazon S3 integration allows users to apply subscription policies to data in S3 to restrict what prefixes, buckets, or objects users can access. To enforce access controls on this data, Immuta creates S3 grants that are administered by S3 Access Grants, an AWS feature that defines access permissions to data in S3.

Feature support

The table below outlines the features supported by each of Immuta's integrations.

Project workspaces

Tag ingestion

User impersonation

Query audit

Multiple integrations

Snowflake

Databricks Unity Catalog

Databricks Spark

Google BigQuery

Starburst

Redshift

Azure Synapse Analytics

Amazon S3

Policy support

Certain policies are unsupported or supported with caveats*, depending on the integration:

*Supported with Caveats:

On Databricks data sources, joins will not be allowed on data protected with replace with NULL or constant policies.
Databricks Unity Catalog ARRAY, MAP, or STRUCT type columns only support masking with NULL.
On Starburst data sources, the @iam interpolation function can block the creation of a view.

For details about each of these policies, see the Policies in Immuta page.

Audit support for platform queries

The table below outlines what information is included in the query audit logs for each integration where query audit is supported.

Snowflake

Databricks Spark

Databricks Unity Catalog

Starburst (Trino)

Table and user coverage

Registered data sources and users

All tables and users

Registered data sources and users

Object queried

Columns returned

Query text

Unauthorized information

Policy details

User's entitlements

Column tags

Table tags

Legend:

Snowflake

Immuta manages access to Snowflake tables by administering Snowflake row access policies and column masking policies on those tables, allowing users to query tables directly in Snowflake while dynamic policies are enforced.

Getting started

This getting started guide outlines how to integrate your Snowflake account with Immuta.

How-to guides

Configure a Snowflake integration: Configure the Snowflake integration.
Snowflake table grants migration: Migrate to using Snowflake table grants in your Snowflake integration.
Edit or remove an existing integration: Manage integration settings or delete your existing Snowflake integration.
Integration settings:
- Enable Snowflake table grants: Enable Snowflake table grants and configure the Snowflake role prefix.
- Use Snowflake data sharing with Immuta: Use Snowflake data sharing with table grants or project workspaces.
- Snowflake low row access policy mode: Enable Snowflake low row access policy mode.
- Snowflake lineage tag propagation: Configure your Snowflake integration to automatically apply tags added to a Snowflake table to its descendant data source columns in Immuta.

Reference guides

Phased Snowflake onboarding approach: A phased onboarding approach to configuring the Snowflake integration ensures that your users will not be immediately affected by changes as you add data sources and policies. This guide describes the settings and requirements for implementing this phased approach.
Snowflake integration reference guide: This reference guide describes the design and features of the Snowflake integration.
Snowflake data sharing with Immuta: Organizations can share the policy-protected data of their Snowflake database with other Snowflake accounts with Immuta policies enforced in real time. This guide describes the components of using Immuta with Snowflake data shares.
Snowflake lineage tag propagation: Snowflake column lineage specifies how data flows from source tables or columns to the target tables in write operations. When Snowflake lineage tag propagation is enabled in Immuta, Immuta automatically applies tags added to a Snowflake table to its descendant data source columns in Immuta so you can build policies using those tags to restrict access to sensitive data.
Snowflake low row access policy mode: The Snowflake low row access policy mode improves query performance in Immuta's Snowflake integration. To do so, this mode decreases the number of Snowflake row access policies Immuta creates and uses table grants to manage user access. This guide describes the design and requirements of this mode.
Snowflake table grants: Snowflake table grants simplifies the management of privileges in Snowflake when using Immuta. Instead of manually granting users access to tables registered in Immuta, you allow Immuta to manage privileges on your Snowflake tables and views according to subscription policies. This guide describes the components of Snowflake table grants and how they are used in Immuta's Snowflake integration.
Warehouse sizing recommendations: Adjust the size and scale of clusters for your warehouse to manage workloads so that you can use Snowflake compute resources the most cost effectively.

Getting Started

The how-to guides linked on this page illustrate how to integrate Snowflake with Immuta.

Requirement: Snowflake Enterprise Edition

Configure your Snowflake integration

These guides provide information on the recommended features to enable with Snowflake, or see the Detect use case for a comprehensive guide on the benefits of these features and other recommendations.

Configure your Snowflake integration with the following features enabled:
- Snowflake table grants (enabled by default)
- Snowflake low row access policy mode (enabled by default)
- Snowflake query audit (enabled by default)
Select None as your default subscription policy.
Integrate an IAM with Immuta.
Map external user IDs from Snowflake to Immuta.

Register metadata

These guides provide instructions for organizing your Snowflake data to align with your governance structure.

Detect your user activity

These guides provide step-by-step instructions for auditing and detecting your users' activity, or see the Detect use case for a comprehensive guide on the benefits of these features and other recommendations.

Discover your data

These guides provide step-by-step instructions for discovering, classifying, and tagging your data.

Enable sensitive data discovery (SDD).
Register a subset of your tables to configure and validate SDD.
Configure SDD to discover entities of interest for your policy needs.
Validate that the SDD tags are applied correctly.
Register your remaining tables at the schema level with schema monitoring turned on.
Implement classification to categorize and tag sensitive data.

Secure your data

These guides provide instructions for configuring and securing your data with governance policies, or see the Secure use cases for a comprehensive guide on creating policies to fit your organization's use case.

Create a global subscription policy.
Validate the policy. You do not have to validate every policy you create in Immuta; instead, examine a few to validate the behavior you expect to see:
1. Validate that the Immuta users impacted now have an Immuta role in Snowflake dedicated to them.
2. Validate that when acting under the Immuta role those users have access to the table(s) in question.
3. Validate that users without access in Immuta can still access the table with a different Snowflake role that has access.
4. Validate that a user with SECONDARY ROLES ALL enabled retains access if
  - they were not granted access by Immuta and
  - they have a role that provides them access, even if they are not currently acting under that role.
Create a global data policy.
Validate that a user with a role that can access the table in question (whether it's an Immuta role or not) sees the impact of that data policy.
Once all Immuta policies are in place, remove or alter old roles.

How-to Guides

Configure a Snowflake Integration

Permissions

The permissions outlined in this section are the Snowflake privileges required for a basic configuration. See the Snowflake reference guide for a list of privileges necessary for additional features and settings.

Automatic setup permissions

When performing an automated installation, Immuta requires temporary, one-time use of credentials with the following permissions:

CREATE DATABASE ON ACCOUNT WITH GRANT OPTION
CREATE ROLE ON ACCOUNT WITH GRANT OPTION
CREATE USER ON ACCOUNT WITH GRANT OPTION
MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION
APPLY MASKING POLICY ON ACCOUNT WITH GRANT OPTION
APPLY ROW ACCESS POLICY ON ACCOUNT WITH GRANT OPTION

These permissions will be used to create and configure a new IMMUTA database within the specified Snowflake instance. The credentials are not stored or saved by Immuta, and Immuta doesn’t retain access to them after initial setup is complete.

You can create a new account for Immuta to use that has these permissions, or you can grant temporary use of a pre-existing account. By default, the pre-existing account with appropriate permissions is ACCOUNTADMIN. If you create a new account, it can be deleted after initial setup is complete.

Alternatively, you can create the IMMUTA database within the specified Snowflake instance manually using the manual setup option.

Manual setup permissions

The specified role used to run the bootstrap needs to have the following privileges:

CREATE DATABASE ON ACCOUNT WITH GRANT OPTION
CREATE ROLE ON ACCOUNT WITH GRANT OPTION
CREATE USER ON ACCOUNT WITH GRANT OPTION
MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION
APPLY MASKING POLICY ON ACCOUNT WITH GRANT OPTION
APPLY ROW ACCESS POLICY ON ACCOUNT WITH GRANT OPTION

It will create a user called IMMUTA_SYSTEM_ACCOUNT, and grant the following privileges to that user:

APPLY MASKING POLICY ON ACCOUNT
APPLY ROW ACCESS POLICY ON ACCOUNT
Additional grants associated with the IMMUTA database

Configure the integration

Snowflake resource names: Use uppercase for the names of the Snowflake resources you create below.

Click the App Settings icon in the navigation panel.
Click the Integrations tab.
Click the +Add Integration button and select Snowflake from the dropdown menu.
Complete the Host, Port, and Default Warehouse fields.
Opt to check the Enable Project Workspace box. This will allow for managed write access within Snowflake. Note: Project workspaces still use Snowflake views, so the default role of the account used to create the data sources in the project must be added to the Excepted Roles List. This option is unavailable when table grants is enabled.
Opt to check the Enable Impersonation box and customize the Impersonation Role to allow users to natively impersonate another user. You cannot edit this choice after you configure the integration.
Snowflake query audit is enabled by default; you can disable it by clicking the Enable Query Audit checkbox.
1. Configure the audit frequency by scrolling to Integrations Settings and find the Snowflake Audit Sync Schedule section.
2. Enter how often, in hours, you want Immuta to ingest audit events from Snowflake as an integer between 1 and 24.
3. Continue with your integration configuration.

Select your configuration method

Altering parameters in Snowflake at the account level may cause unexpected behavior of the Snowflake integration in Immuta

The QUOTED_IDENTIFIERS_IGNORE_CASE parameter must be set to false (the default setting in Snowflake) at the account level. Changing this value to true causes unexpected behavior of the Snowflake integration.

You have two options for configuring your Snowflake environment:

Automatic setup: Grant Immuta one-time use of credentials to automatically configure your Snowflake environment and the integration.
Manual setup: Run the Immuta script in your Snowflake environment yourself to configure your Snowflake environment and the integration.

Automatic setup

Required permissions: When performing an automated installation, Immuta requires temporary, one-time use of credentials with the Snowflake permissions listed above.

From the Select Authentication Method Dropdown, select one of the following authentication methods:

Username and Password: Complete the Username, Password, and Role fields.
Key Pair Authentication:
1. Complete the Username field.
2. When using a private key, enter the private key file password in the Additional Connection String Options. Use the following format: PRIV_KEY_FILE_PWD=<your_pw>
3. Click Key Pair (Required), and upload a Snowflake key pair file.
4. Complete the Role field.

Manual setup

Account creation best practice

The account you create for Immuta should only be used for the integration and should not be used as the credentials for creating data sources in Immuta; doing so will cause issues. Instead, create a separate, dedicated READ-ONLY account for creating and registering data sources within Immuta.

Required privileges: The specified role used to run the bootstrap needs to have the Snowflake permissions listed above.

Select Manual.
Use the Dropdown Menu to select your Authentication Method:
- Username and password: Enter the Username and Password and set them in the bootstrap script for the Immuta system account credentials.
- Key pair authentication: Upload the Key Pair file and when using a private key, enter the private key file password in the Additional Connection String Options. Use the following format: PRIV_KEY_FILE_PWD=<your_pw>
- Snowflake External OAuth:
  1. Create a security integration for your Snowflake External OAuth. Note that if you have an existing security integration, then the Immuta system role must be added to the existing EXTERNAL_OAUTH_ALLOWED_ROLES_LIST. The Immuta system role will be the Immuta database provided above with _SYSTEM. If you used the default database name it will be IMMUTA_SYSTEM.
  2. Fill out the Token Endpoint. This is where the generated token is sent.
  3. Fill out the Client ID. This is the subject of the generated token.
  4. Select the method Immuta will use to obtain an access token:
    Certificate
    Keep the Use Certificate checkbox enabled.
    Opt to fill out the Resource field with a URI of the resource where the requested token will be used.
    Enter the x509 Certificate Thumbprint. This identifies the corresponding key to the token and is often abbreviated as `x5t` or is called `sub` (Subject).
    Upload the PEM Certificate, which is the client certificate that is used to sign the authorization request.
    Client secret
    Uncheck the Use Certificate checkbox.
    Enter the Scope (string). The scope limits the operations and roles allowed in Snowflake by the access token. See the OAuth 2.0 scopes documentation for details about scopes.
    Enter the Client Secret (string). Immuta uses this secret to authenticate with the authorization server when it requests a token.
In the Setup section, click bootstrap script to download the script. Then, fill out the appropriate fields and run the bootstrap script in Snowflake.

Different accounts

The account used to enable the integration must be different from the account used to create data sources in Immuta. Otherwise, workspace views won't be generated properly.

Select available warehouses (optional)

If you enabled a Snowflake workspace, select Warehouses from the dropdown menu that will be available to project owners when creating Snowflake workspaces. Select from a list of all the warehouses available to the privileged account entered above. Note that any warehouse accessible by the PUBLIC role does not need to be explicitly added.

Select excepted roles and users

Enter the Excepted Roles/User List. Each role or username (both case-sensitive) in this list should be separated by a comma. Wildcards are unsupported.

Excepted roles/users will have no policies applied to queries

Any user with the username or acting under the role in this list will have no policies applied to them when querying Immuta protected Snowflake tables in Snowflake. Therefore, this list should be used for service or system accounts and the default role of the account used to create the data sources in the Immuta projects (if you have Snowflake workspace enabled).

Save the configuration

Click Save.

Opt to enable Snowflake tag ingestion

To allow Immuta to automatically import table and column tags from Snowflake, enable Snowflake tag ingestion in the external catalog section of the Immuta app settings page.

Snowflake user authentication

To configure Snowflake tag ingestion, which syncs Snowflake tags into Immuta, you must provide a Snowflake user who has, at minimum, the ability to set the following privileges:

GRANT IMPORTED PRIVILEGES ON DATABASE snowflake
GRANT APPLY TAG ON ACCOUNT

Navigate to the App Settings page.
Scroll to 2 External Catalogs, and click Add Catalog.
Enter a Display Name and select Snowflake from the dropdown menu.
Enter the Account.
Enter the Authentication information: Username, Password, Port, Default Warehouse, and Role.
Opt to enter the Proxy Host, Proxy Port, and Encrypted Key File Passphrase.
Opt to Upload Certificates.
Click the Test Connection button.
Click the Test Data Source Link.
Once both tests are successful, click Save.

Register data

Snowflake Table Grants Migration

To migrate from the private preview version of table grants (available before September 2022) to the GA version, complete the steps below.

Navigate to the App Settings page.
Click Integration Settings in the left panel, and scroll to the Global Integrations Settings section.
Uncheck the Snowflake Table Grants checkbox to disable the feature.
Click Save. Wait for about 1 minute per 1000 users. This gives time for Immuta to drop all the previously created user roles.
Use the Enable Snowflake table grants tutorial to re-enable the feature.

Edit or Remove Your Snowflake Integration

To edit or remove a Snowflake integration, you have two options:

Automatic: Grant Immuta one-time use of credentials to automatically edit or remove the integration.
The credentials provided must have the following permissions:
- CREATE DATABASE ON ACCOUNT WITH GRANT OPTION
- CREATE ROLE ON ACCOUNT WITH GRANT OPTION
- CREATE USER ON ACCOUNT WITH GRANT OPTION
- MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION
Manual: Run the Immuta script in your Snowflake environment yourself to edit or remove the integration.
The specified role used to run the bootstrap needs to have the following privileges:
- CREATE DATABASE ON ACCOUNT WITH GRANT OPTION
- CREATE ROLE ON ACCOUNT WITH GRANT OPTION
- CREATE USER ON ACCOUNT WITH GRANT OPTION
- MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION
- APPLY MASKING POLICY ON ACCOUNT WITH GRANT OPTION
- APPLY ROW ACCESS POLICY ON ACCOUNT WITH GRANT OPTION

Edit a Snowflake integration

Select one of the following options for editing your integration:

Automatic: Grant Immuta one-time use of credentials to automatically edit the integration.
Manual: Run the Immuta script in your Snowflake environment yourself to edit the integration.

Automatic edit

Click the App Settings icon in the left sidebar.
Click the Integrations tab and click the down arrow next to the Snowflake integration.
Edit the field you want to change or check a checkbox of a feature you would like to enable. Note any field shadowed is not editable, and the integration must be disabled and re-installed to change it.
From the Select Authentication Method Dropdown, select either Username and Password or Key Pair Authentication:
- Username and Password option: Complete the Username, Password, and Role fields.
- Key Pair Authentication option:
  1. Complete the Username field.
  2. When using a private key, enter the private key file password in the Additional Connection String Options. Use the following format: PRIV_KEY_FILE_PWD=<your_pw>
  3. Click Key Pair (Required), and upload a Snowflake key pair file.
  4. Complete the Role field.
Click Save.

Manual edit

Click the App Settings icon in the left sidebar.
Click the Integrations tab and click the down arrow next to the Snowflake integration.
Edit the field you want to change or check a checkbox of a feature you would like to enable. Note any field shadowed is not editable, and the integration must be disabled and re-installed to change it.
Click edit script to download the script, and then run it in Snowflake.
Click Save.

Remove a Snowflake integration

Select one of the following options for deleting your integration:

Automatic: Grant Immuta one-time use of credentials to automatically remove the integration and Immuta-managed resources from your Snowflake environment.
Manual: Run the Immuta script in your Snowflake environment yourself to remove Immuta-managed resources and policies from Snowflake.

Automatic removal

Click the App Settings icon in the left sidebar.
Click the Integrations tab and click the down arrow next to the Snowflake integration.
Click the checkbox to disable the integration.
Enter the Username, Password, and Role that was entered when the integration was configured.
Click Save.

Manual removal

Click the App Settings icon in the left sidebar.
Click the Integrations tab and click the down arrow next to the Snowflake integration.
Click the checkbox to disable the integration.
Click cleanup script to download the script.
Click Save.
Run the cleanup script in Snowflake.

Integration Settings

Enable Snowflake Table Grants

Navigate to the App Settings page.
Scroll to the Global Integrations Settings section.
Opt to change the Role Prefix. Snowflake table grants creates a new Snowflake role for each Immuta user. To ensure these Snowflake role names do not collide with existing Snowflake roles, each Snowflake role created for Snowflake table grants requires a common prefix. When using multiple Immuta accounts within a single Snowflake account, the Snowflake table grants role prefix should be unique for each Immuta account. The prefix must adhere to Snowflake identifier requirements and be less than 50 characters. Once the configuration is saved, the prefix cannot be modified; however, the Snowflake table grants feature can be disabled and re-enabled to change the prefix.
Finish configuring your integration by following one of these guidelines:
- New Snowflake integration: Set up a new Snowflake integration by following the configuration guide.
- Existing Snowflake integration (automatic setup): You will be prompted to enter connection information for a Snowflake user. Immuta will execute the migration to Snowflake table grants using a connection established with this Snowflake user. The Snowflake user you provide here must have Snowflake privileges to run these privilege grants.
- Existing Snowflake integration (manual setup): Immuta will display a link to a migration script you must run in Snowflake and a link to a rollback script for use in the event of a failed migration. Important: Execute the migration script in Snowflake before clicking Save on the app settings page.

Snowflake table grants private preview migration

To migrate from the private preview version of Snowflake table grants (available before September 2022) to the generally available version of Snowflake table grants, follow the steps in the migration guide.

Use Snowflake Data Sharing with Immuta

Prerequisites:

Create Immuta Policies to Protect the Data

Required Permission: Immuta: GOVERNANCE

Build Immuta data policies to fit your organization's compliance requirements.

It's important to understand that subscription policies are not relevant to Snowflake data shares, because the act of sharing the data is the subscription policy. Data policies can be enforced on the consuming account from the producer account on a share following these instructions.

Register the Snowflake Data Consumer with Immuta

Required Permission: Immuta: USER_ADMIN

To register the Snowflake data consumer in Immuta,

Create a new Immuta user.
Update the Immuta user's Snowflake username to match the account ID for the data consumer. This value is the output on the data consumer side when SELECT CURRENT_ACCOUNT() is run in Snowflake.
Give the Immuta user the appropriate attributes and groups for your organization's policies.
Subscribe the Immuta user to the data sources.

Required Permission: Snowflake ACCOUNTADMIN

To share the policy-protected data source,

Create a Snowflake Data Share of the Snowflake table that has been registered in Immuta.
Grant reference usage on the Immuta database to the share you created:
```
GRANT REFERENCE_USAGE ON DATABASE "<Immuta database of the provider account>" TO SHARE "<DATA_SHARE>";
```
Replace the content in angle brackets above with the name of your Immuta database and Snowflake data share.

Configure Snowflake Lineage Tag Propagation

Private preview: This feature is only available to select accounts. Reach out to your Immuta representative to enable this feature.

Contact your Immuta representative to enable this feature in your Immuta tenant.

Configure the Snowflake integration

Navigate to the App Setting page and click the Integration tab.
Click +Add Integration and select Snowflake from the dropdown menu.
Complete the Host, Port, and Default Warehouse fields.
Enable Query Audit.
Enable Lineage and complete the following fields:
- Ingest Batch Sizes: This setting configures the number of rows Immuta ingests per batch when streaming Access History data from your Snowflake instance.
- Table Filter: This filter determines which tables Immuta will ingest lineage for. Enter a regular expression that excludes / from the beginning and end to filter tables. Without this filter, Immuta will attempt to ingest lineage for every table on your Snowflake instance.
- Tag Filter: This filter determines which tags to propagate using lineage. Enter a regular expression that excludes / from the beginning and end to filter tags. Without this filter, Immuta will ingest lineage for every tag on your Snowflake instance.
Select Manual or Automatic Setup and follow the steps in this guide to configure the Snowflake integration

Trigger Snowflake lineage sync job

Prerequisite

Authenticate with the Immuta API.

Trigger the lineage job

The Snowflake lineage sync endpoint triggers the lineage ingestion job that allows Immuta to propagate Snowflake tags added through lineage to Immuta data sources.

Copy the example and replace the Immuta URL and API key with your own.
Change the payload attribute values to your own, where
- tableFilter (string): This regular expression determines which tables Immuta will ingest lineage for. Enter a regular expression that excludes / from the beginning and end to filter tables. Without this filter, Immuta will attempt to ingest lineage for every table on your Snowflake instance.
- batchSize (integer): This parameter configures the number of rows Immuta ingests per batch when streaming Access History data from your Snowflake instance. Minimum 1.
- lastTimestamp (string): Setting this parameter will only return lineage events later than the value provided. Use a format like 2022-06-29T09:47:06.012-07:00.
```
curl -X 'POST' \
    'https://www.organization.immuta.com/lineage/ingest/snowflake' \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -H 'Authorization: 846e9e43c86a4ct1be14290d95127d13f' \
    -d '{
    "tableFilter": "MY_DATABASE\\MY_SCHEMA\\..*",
    "batchSize": 1,
    "lastTimestamp": "2022-06-29T09:47:06.012-07:00"
    }'
```

Next steps

Once the sync job is complete, you can complete the following steps:

Enable Snowflake Low Row Access Policy Mode

If you have Snowflake low row access policy mode enabled in private preview and have impersonation enabled, see these upgrade instructions. Otherwise, query performance will be negatively affected.

Click the App Settings icon in the sidebar and scroll to the Global Integration Settings section.
Click the Enable Snowflake Low Row Access Policy Mode checkbox to enable the feature.
Confirm to allow Immuta to automatically disable impersonation for the Snowflake integration. If you do not confirm, you will not be able to enable Snowflake low row access policy mode.
Click Save.

Configure your Snowflake integration

If you already have a Snowflake integration configured, you don't need to reconfigure your integration. Your Snowflake policies automatically refresh when you enable Snowflake low row access policy mode.

Configure your Snowflake integration. Note that you will not be able to enable project workspaces or user impersonation with Snowflake low row access policy mode enabled.
Click Save and Confirm your changes.

Upgrade Snowflake Low Row Access Policy Mode

Prerequisites

This upgrade step is necessary if you meet both of the following criteria:

You have the Snowflake low row access policy mode enabled in private preview.
You have user impersonation enabled.

If you do not meet this criteria, follow the instructions on the configuration guide.

Upgrade to Snowflake low row access policy mode

To upgrade to the generally available version of the feature, disable your Snowflake integration on the app settings page and then re-enable it.

Reference Guides

Snowflake Integration

Snowflake Enterprise Edition required

In this integration, Immuta manages access to Snowflake tables by administering Snowflake row access policies and column masking policies on those tables, allowing users to query tables directly in Snowflake while dynamic policies are enforced.

Like with all Immuta integrations, Immuta can inject its ABAC model into policy building and administration to remove policy management burden and significantly reduce role explosion.

How the integration works

When an administrator configures the Snowflake integration with Immuta, Immuta creates an IMMUTA database and schemas (immuta_procedures, immuta_policies, and immuta_functions) within Snowflake to contain policy definitions and user entitlements. Immuta then creates a system role and gives that system account the privileges required to orchestrate policies in Snowflake and maintain state between Snowflake and Immuta. See the Snowflake privileges section for a list of privileges, the user they must be granted to, and an explanation of why they must be granted.

Data flow

An Immuta application administrator configures the Snowflake integration and registers Snowflake warehouse and databases with Immuta.
Immuta creates a database inside the configured Snowflake warehouse that contains Immuta policy definitions and user entitlements.
A data owner registers Snowflake tables in Immuta as data sources.
If Snowflake tag ingestion was enabled during the configuration, Immuta uses the host provided in the configuration and ingests internal tags on Snowflake tables registered as Immuta data sources.
A data owner, data governor, or administrator creates or changes a policy or a user's attributes change in Immuta.
The Immuta web service calls a stored procedure that modifies the user entitlements or policies.
Immuta manages and applies Snowflake governance column and row access policies to Snowflake tables that are registered as Immuta data sources.
If Snowflake table grants is not enabled, Snowflake object owner or user with the global MANAGE GRANTS privilege grants SELECT privilege on relevant Snowflake tables to users. Note: Although they are GRANTed access, if they are not subscribed to the table via Immuta-authored policies, they will not see data.
A Snowflake user who is subscribed to the data source in Immuta queries the corresponding table directly in Snowflake and sees policy-enforced data.

Policy enforcement

When Immuta users create policies, they are then pushed into the Immuta database within Snowflake; there, the Immuta system account orchestrates Snowflake row access policies and column masking policies directly onto Snowflake tables. Changes in Immuta policies, user attributes, or data sources trigger webhooks that keep the Snowflake policies up-to-date.

For a user to query Immuta-protected data, they must meet two qualifications:

They must be subscribed to the Immuta data source.
They must be granted SELECT access on the table by the Snowflake object owner or automatically via the Snowflake table grants feature.

After a user has met these qualifications they can query Snowflake tables directly.

See the integration support matrix on the Data policy types reference guide for a list of supported data policy types in Snowflake.

Comply with column length and precision requirements in a Snowflake masking policy

When a user applies a masking policy to a Snowflake data source, Immuta truncates masked values to align with Snowflake column length (VARCHAR(X) types) and precision (NUMBER (X,Y) types) requirements.

Consider these columns in a data source that have the following masking policies applied:

Column A (VARCHAR(6)): Mask using hashing for everyone
Column B (VARCHAR(5)): Mask using a constant REDACTED for everyone
Column C (VARCHAR(6)): Mask by making null for everyone
Column D (NUMBER(3, 0)): Mask by rounding to the nearest 10 for everyone

Querying this data source in Snowflake would return the following values:

5w4502

REDAC

990

6e3611

REDAC

750

9s7934

REDAC

380

Hashing collisions

Hashing collisions are more likely to occur across or within Snowflake columns restricted to short lengths, since Immuta truncates the hashed value to the limit of the column. (Hashed values truncated to 5 characters have a higher risk of collision than hashed values truncated to 20 characters.) Therefore, avoid applying hashing policies to Snowflake columns with such restrictions.

For more details about Snowflake column length and precision requirements, see the Snowflake behavior change release documentation.

Query performance

When a policy is applied to a column, Immuta uses Snowflake memoizable functions to cache the result of the called function. Then, when a user queries a column that has that policy applied to it, Immuta uses that cached result to dramatically improve query performance.

Snowflake privileges

The privilege grants the Snowflake integration requires align to the least privilege security principle. The table below describes each privilege required in Snowflake for the or the user. The references to IMMUTA_DB , IMMUTA_WH, and IMMUTA_IMPERSONATOR_ROLE in the table can be replaced with what you chose for the name of your Immuta database, warehouse, and impersonation role when setting up the integration, respectively.

Snowflake privilege

User requiring privilege

Features

Explanation

CREATE DATABASE ON ACCOUNT WITH GRANT OPTION

Setup user

All

The setup script this user runs creates an Immuta database in the customer Snowflake account where all Immuta managed objects (UDFs, masking policies, row access policies, and user entitlements) will be written and stored.

CREATE ROLE ON ACCOUNT WITH GRANT OPTION

Setup user

All

The setup script this user runs creates a ROLE for Immuta that will be used to manage the integration once it has been initialized.

CREATE USER ON ACCOUNT WITH GRANT OPTION

Setup user

All

The setup script this user runs creates the IMMUTA_SYSTEM_ACCOUNT user that Immuta will use to manage the integration.

MANAGE GRANTS ON ACCOUNT

Setup user

All

The user configuring the integration must be able to GRANT global privileges and access to objects within the Snowflake account. All privileges that are documented here are granted to the IMMUTA_SYSTEM_ACCOUNT user by this setup user.

OWNERSHIP ON ROLE IMMUTA_IMPERSONATOR_ROLE

IMMUTA_SYSTEM_ACCOUNT user

Impersonation

If impersonation is enabled, Immuta must be able to manage the Snowflake roles used for impersonation, which is created when the setup script runs, in order to manage the impersonation feature.

ALL PRIVILEGES ON DATABASE IMMUTA_DB

ALL PRIVILEGES ON ALL SCHEMAS IN DATABASE IMMUTA_DB

USAGE ON FUTURE PROCEDURES IN SCHEMA IMMUTA_DB.IMMUTA_PROCEDURES

IMMUTA_SYSTEM_ACCOUNT user

All

The setup script grants the Immuta system account user these privileges because Immuta must have full ownership of the Immuta database where Immuta objects are managed.

USAGE ON WAREHOUSE IMMUTA_WH

IMMUTA_SYSTEM_ACCOUNT user

All

To make changes to state in the Immuta database, Immuta requires access to compute (a Snowflake warehouse). Some state changes are DDL operations, and others are DML and require compute.

IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE

IMMUTA_SYSTEM_ACCOUNT user

Audit

APPLY TAG ON ACCOUNT

IMMUTA_SYSTEM_ACCOUNT user

Tag ingestion

APPLY MASKING POLICY ON ACCOUNT

APPLY ROW ACCESS POLICY ON ACCOUNT

IMMUTA_SYSTEM_ACCOUNT user

Snowflake integration with governance features enabled

MANAGE GRANTS ON ACCOUNT

IMMUTA_SYSTEM_ACCOUNT user

Table grants

Immuta must be able to MANAGE GRANTS on objects throughout the customer's Snowflake account.

CREATE ROLE ON ACCOUNT

IMMUTA_SYSTEM_ACCOUNT user

Table grants

When using the table grants feature, Immuta must be able to create roles as targets for Immuta subscription policy permissions in the customer’s Snowflake account.

USAGE ON DATABASE IMMUTA_DB

USAGE ON SCHEMA IMMUTA_DB.IMMUTA_PROCEDURES

USAGE ON SCHEMA IMMUTA_DB.IMMUTA_FUNCTIONS

USAGE ON FUTURE FUNCTIONS IN SCHEMA IMMUTA_DB.IMMUTA_FUNCTIONS

USAGE ON SCHEMA IMMUTA_DB.IMMUTA_SYSTEM

SELECT ON IMMUTA_DB.IMMUTA_SYSTEM.USER_PROFILE

PUBLIC role

All

Immuta has stored procedures and functions that are used for policy enforcement and do not expose or contain any sensitive information. These objects must be accessible by all users to facilitate the use and creation of policies or views to enforce Immuta policies in Snowflake.

SELECT ON IMMUTA_DB.IMMUTA_SYSTEM.ALLOW_LIST

PUBLIC role

All

Immuta retains a list of excepted roles and users when using the Snowflake integration. The roles and users in this list will be exempt from policies applied to tables in Snowflake to give customers flexibility in case there are entities that should not be bound to Immuta policies in Snowflake (for example, a system or application role or user).

Registering data sources

Register Snowflake data sources using a dedicated Snowflake role. Avoid using individual user accounts for data source onboarding. Instead, create a service account (Snowflake user account TYPE=SERVICE) with SELECT access for onboarding data sources. No policies will apply to that role, ensuring that your integration works with the following use cases:

Snowflake project workspaces: Snowflake workspaces generate static views with the credentials used to register the table as an Immuta data source. Those tables must be registered in Immuta by an excepted role so that policies applied to the backing tables are not applied to the project workspace views.
Using views and tables within Immuta: Because this integration uses Snowflake governance policies, users can register tables and views as Immuta data sources. However, if you want to register views and apply different policies to them than their backing tables, the owner of the view must be an excepted role; otherwise, the backing table’s policies will be applied to that view.

Snowflake bulk data source creation

Private preview: This feature is only available to select accounts. Reach out to your Immuta representative to enable this feature.

Bulk data source creation is the more efficient process when loading more than 5000 data sources from Snowflake and allows for data sources to be registered in Immuta before running sensitive data discovery or applying policies.

To use this feature, see the Bulk create Snowflake data sources guide.

Resource allocations

Based on performance tests that create 100,000 data sources, the following minimum resource allocations need to be applied to the appropriate pods in your Kubernetes environment for successful bulk data source creation.

Web

Database

Memory

4Gi

16Gi

CPU

Storage

8Gi

24Gi

Limitations

Performance gains are limited when enabling sensitive data discovery at the time of data source creation.
External catalog integrations are not recognized during bulk data source creation. Users must manually trigger a catalog sync for tags to appear on the data source through the data source's health check.

Excepted roles/users

Excepted roles and users are assigned when the integration is installed, and no policies will apply to these users' queries, despite any Immuta policies enforced on the tables they are querying. Credentials used to register a data source in Immuta will be automatically added to this excepted list for that Snowflake table. Consequently, roles and users added to this list and used to register data sources in Immuta should be limited to service accounts.

Immuta excludes the listed roles and users from policies by wrapping all policies in a CASE statement that will check if a user is acting under one of the listed usernames or roles. If a user is, then the policy will not be acted on the queried table. If the user is not, then the policy will be executed like normal. Immuta does not distinguish between role and username, so if you have a role and user with the exact same name, both the user and any user acting under that role will have full access to the data sources and no policies will be enforced for them.

Authentication methods

The Snowflake integration supports the following authentication methods to configure the integration and create data sources:

Username and password: Users can authenticate with their Snowflake username and password.
Key pair: Users can authenticate with a Snowflake key pair authentication.
Snowflake External OAuth: Users can authenticate with Snowflake External OAuth.

Snowflake External OAuth

Immuta's OAuth authentication method uses the Client Credentials Flow to integrate with Snowflake External OAuth. When a user configures the Snowflake integration or connects a Snowflake data source, Immuta uses the token credentials (obtained using a certificate or passing a client secret) to craft an authenticated access token to connect with Snowflake. This allows organizations that already use Snowflake External OAuth to use that secure authentication with Immuta.

Workflow

An Immuta application administrator configures the Snowflake integration or creates a data source.
Immuta creates a custom token and sends it to the authorization server.
The authorization server confirms the information sent from Immuta and issues an access token to Immuta.
Immuta sends the access token it received from the authorization server to Snowflake.
Snowflake authenticates the token and grants access to the requested resources from Immuta.
The integration is connected and users can query data.

Supported Snowflake feature

The Immuta Snowflake integration supports Snowflake external tables. However, you cannot add a masking policy to an external table column while creating the external table in Snowflake because masking policies cannot be attached to virtual columns.

Supported Immuta features

The Snowflake integration supports the Immuta features outlined below. Click the links provided for more details.

Immuta project workspaces: Users can have additional write access in their integration using project workspaces.
Tag ingestion: Immuta automatically ingests Snowflake object tags from your Snowflake instance and adds them to the appropriate data sources.
User impersonation: Impersonation allows users to query data as another Immuta user. To enable user impersonation, see the Integration user impersonation page.
Query audit: Immuta audits queries run in Snowflake against Snowflake data registered as Immuta data sources.
Multiple Snowflake instances
Snowflake low row access policy mode: The Snowflake low row access policy mode improves query performance in Immuta's Snowflake integration by decreasing the number of Snowflake row access policies Immuta creates.
Snowflake table grants: This feature allows Immuta to manage privileges on your Snowflake tables and views according to the subscription policies on the corresponding Immuta data sources.

Immuta project workspaces

Immuta system account required Snowflake privileges

CREATE [OR REPLACE] PROCEDURE
DROP ROLE
REVOKE ROLE

Users can have additional write access in their integration using project workspaces. For more details, see the Snowflake project workspaces page.

Caveat

To use project workspaces with the Snowflake integration, the default role of the account used to create data sources in the project must be added to the "Excepted Roles/Users List." If the role is not added, you will not be able to query the equalized view using the project role in Snowflake.

Tag ingestion

You can enable Snowflake tag ingestion so that Immuta will ingest Snowflake object tags from your Snowflake instance into Immuta and add them to the appropriate data sources.

The Snowflake tags' key and value pairs will be reflected in Immuta as two levels: the key will be the top level and the value the second. As Snowflake tags are hierarchical, Snowflake tags applied to a database will also be applied to all of the schemas in that database, all of the tables within those schemas, and all of the columns within those tables. For example: If a database is tagged PII, all of the tables and columns in that database will also be tagged PII.

To enable Snowflake tag ingestion, see the Configure a Snowflake integration page.

Caveats

Snowflake has some natural data latency. If you manually refresh the governance page to see all tags created globally, users can experience a delay of up to two hours. However, if you run schema detection or a health check to find where those tags are applied, the delay will not occur because Immuta will only refresh tags for those specific tables.

Query audit

Immuta system account required Snowflake privilege

IMPORTED PRIVILEGES ON DATABASE snowflake

Once this feature has been enabled with the Snowflake integration, Immuta will query Snowflake to retrieve user query histories. These histories provide audit records for queries against Snowflake data sources that are queried natively in Snowflake.

This process will happen automatically every hour by default but can be configured to a different frequency when configuring or editing the integration. Additionally, audit ingestion can be manually requested at any time from the Immuta audit page. When manually requested, it will only search for new queries that were created since the last query that had been audited. The job is run in the background, so the new queries will not be immediately available.

For details about prompting these logs and the contents of these audit logs, see the Snowflake query audit logs page.

Multiple Snowflake instances

A user can configure multiple integrations of Snowflake to a single Immuta tenant and use them dynamically or with workspaces.

Caveats

There can only be one integration connection with Immuta per host.
The host of the data source must match the host of the integration for the view to be created.
Projects can only be configured to use one Snowflake host.

Limitations

If there are errors in generating or applying policies natively in Snowflake, the data source will be locked and only users on the excepted roles/users list and the credentials used to create the data source will be able to access the data.
Once a Snowflake integration is disabled in Immuta, the user must remove the access that was granted in Snowflake. If that access is not revoked, users will be able to access the raw table in Snowflake.
Migration must be done using the credentials and credential method (automatic or bootstrap) used to configure the integration.
When configuring one Snowflake instance with multiple Immuta tenants, the user or system account that enables the integration on the app settings page must be unique for each Immuta tenant.
A Snowflake table can only have one set of policies enforced at a given time, so creating multiple data sources pointing to the same table is not supported. If this is a use case you need to support, create views in Snowflake and expose those instead.
You cannot add a masking policy to an external table column while creating the external table because a masking policy cannot be attached to a virtual column.
If you create an Immuta data source from a Snowflake view created using a select * from query, Immuta column detection will not work as expected because Snowflake views are not automatically updated based on backing table changes. To remedy this, you can create views that have the specific columns you want or you can CREATE AND REPLACE the view in Snowflake whenever the backing table is updated and manually run the column detection job on the data source page.
If a user is created in Snowflake after that user is already registered in Immuta, Immuta does not grant usage on the per-user role automatically - meaning Immuta does not govern this user's access without manual intervention. If a Snowflake user is created after that user is registered in Immuta, the user account must be disabled and re-enabled to trigger a sync of Immuta policies to govern that user. Whenever possible, Snowflake users should be created before registering those users in Immuta.
Snowflake tables from imported databases are not supported. Instead, create a view of the table and register that view as a data source.

Custom WHERE clause limitations

The Immuta Snowflake integration uses Snowflake governance features to let users query data natively in Snowflake. This means that Immuta also inherits some Snowflake limitations using correlated subqueries with row access policies and column-level security. These limitations appear when writing custom WHERE policies, but do not remove the utility of row-level policies.

Requirements for a custom WHERE policy

All column names must be fully qualified: Any column names that are unqualified (i.e., just the column name) will default to a column of the data source the policy is being applied to (if one matches the name).
The Immuta system account must have SELECT privileges on all tables/views referenced in a subquery: The Immuta system role name is specified by the user, and the role is created when the Snowflake instance is integrated.

Subquery limitations

Any subqueries that error in Snowflake will also error in Immuta.

Including one or more subqueries in the Immuta policy condition may cause errors in Snowflake. If an error occurs, it may happen during policy creation or at query-time. To avoid these errors, limit the number of subqueries, limit the number of JOIN operations, and simplify WHERE clause conditions.
For more information on the Snowflake subquery limitations see
- Understanding column-level security
- Understanding row access policies

Snowflake Data Sharing

Immuta is compatible with Snowflake Secure Data Sharing. Using both Immuta and Snowflake, organizations can share the policy-protected data of their Snowflake database with other Snowflake accounts with Immuta policies enforced in real time. This integration gives data consumers a live connection to the data and relieves data providers of the legal and technical burden of creating static data copies that leave their Snowflake environment.

Requirements:

Snowflake Enterprise Edition or higher
Immuta's table grants feature

Configuration

This method requires that the data consumer account is registered as an Immuta user with the Snowflake user name equal to the consuming account.

At that point, the user that represents the account being shared with can have the appropriate attributes and groups assigned to them, relevant to the data policies that need to be enforced. Once that user has access to the share in the consuming account (not managed by Immuta), they can query the share with the data policies from the producer account enforced because Immuta is treating that account as if they are a single user in Immuta.

For a tutorial on this workflow, see the Using Snowflake Data Sharing page.

Benefits

Using Immuta with Snowflake Data Sharing allows the sharer to

Only need limited knowledge of the context or goals of the existing policies in place: Because the sharer is not editing or creating policies to share their data, they only need a limited knowledge of how the policies work. Their main responsibility is making sure they properly represent the attributes of the data consumer (the account being shared to).
Leave policies untouched.

Snowflake Lineage Tag Propagation

Private preview: This feature is only available to select accounts. Reach out to your Immuta representative to enable this feature.

Snowflake column lineage specifies how data flows from source tables or columns to the target tables in write operations. When Snowflake lineage tag propagation is enabled in Immuta, Immuta automatically applies tags added to a Snowflake table to its descendant data source columns in Immuta so you can build policies using those tags to restrict access to sensitive data.

Snowflake Access History tracks user read and write operations. Snowflake column lineage extends this Access History to specify how data flows from source columns to the target columns in write operations, allowing data stewards to understand how sensitive data moves from ancestor tables to target tables so that they can

trace data back to its source to validate the integrity of dashboards and reports,
identify who performed write operations to meet compliance requirements,
evaluate data quality and pinpoint points of failure, and
tag sensitive data on source tables without having tag columns on their descendant tables.

However, tagging sensitive data doesn’t innately protect that data in Snowflake; users need Immuta to disseminate these lineage tags automatically to descendant tables registered in Immuta so data stewards can build policies using the semantic and business context captured by those tags to restrict access to sensitive data. When Snowflake lineage tag propagation is enabled, Immuta propagates tags applied to a data source to its descendant data source columns in Immuta, which keeps your data inventory in Immuta up-to-date and allows you to protect your data with policies without having to manually tag every new Snowflake data source you register in Immuta.

Data flow

An application administrator enables the feature on the Immuta app settings page.
Snowflake lineage metadata (column names and tags) for the Snowflake tables is stored in the metadata database.
A data owner creates a new data source (or adds a new column to a Snowflake table) that initiates a job that applies all tags for each column from its ancestor columns.
A data owner or governor adds a tag to a column in Immuta that has descendants, which initiates a job that propagates the tag to all descendants.
An audit record is created that includes which tags were applied and from which columns those tags originated.

Snowflake access history view and Immuta lineage job

The Snowflake Account Usage ACCESS_HISTORY view contains column lineage information.

To appropriately propagate tags to descendant data sources, Immuta fetches Access History metadata to determine what column tags have been updated, stores this metadata in the Immuta metadata database, and then applies those tags to relevant descendant columns of tables registered in Immuta.

Consider the following example using the Customer, Customer 2, and Customer 3 tables that were all registered in Immuta as data sources.

Customer: source table
Customer 2: descendant of Customer
Customer 3: descendant of Customer 2

If the Discovered.Electronic Mail Address tag is added to the Customer data source in Immuta, that tag will propagate through lineage to the Customer 2 and Customer 3 data sources.

Data source registration

After an application administrator has enabled Snowflake lineage tag propagation, data owners can register data in Immuta and have tags in Snowflake propagated from ancestor tables to descendant data sources. Whenever new tags are added to those tables in Immuta, those upstream tags will propagate to descendant data sources.

By default all tags are propagated, but these tags can be filtered on the app settings page or using the Immuta API.

Managing tags

Lineage tag propagation works with any tag added to the data dictionary. Tags can be manually added, synced from an external catalog, or discovered by SDD. Consider the following example using the Customer, Customer 2, and Customer 3 tables that were all registered in Immuta as data sources.

Customer: source table
Customer 2: descendant of Customer
Customer 3: descendant of Customer 2

Immuta added the Discovered.Electronic Mail Address tag to the Customer data source, and that tag propagated through lineage to the Customer 2 and Customer 3 data sources.

Removing the tag from the Customer 2 table soft deletes it from the Customer 2 data source. When a tag is deleted, downstream lineage tags are removed, unless another parent data source still has that tag. The tag remains visible, but it will not be re-added if a future propagation event specifies the same tag again. Immuta prevents you from removing Snowflake object tags from data sources. You can only remove Immuta-managed tags. To remove Snowflake object tags from tables, you must remove them in Snowflake.

However the Discovered.Electronic Mail Address tag still applies to the Customer 3 data source because Customer still has the tag applied. The only way a tag will be removed from descendant data sources is if no other ancestor of the descendant still prescribes the tag.

If the Snowflake lineage tag propagation feature is disabled, tags will remain on Immuta data sources.

Sensitive data discovery

Sensitive data discovery will still run on data sources and can be manually triggered. Tags applied through sensitive data discovery will propagate as tags added through lineage to descendant Immuta data sources.

Snowflake lineage audit

Immuta audit records include Snowflake lineage tag events when a tag is added or removed.

The example audit record below illustrates the SNOWFLAKE_TAGS.pii tag successfully propagating from the Customer table to Customer 2:

{
  "id": "c8e020cb-232c-4ba9-a0d8-f3a84ba6808d",
  "dateTime": "1670355170336",
  "month": 1475,
  "profileId": 1,
  "userId": "immuta_system_account",
  "dataSourceId": 2,
  "dataSourceName": "Customer 2",
  "count": 1,
  "recordType": "nativeLineageDataSourceTagUpdate",
  "success": true,
  "component": "dataSource",
  "extra": {
    "sourceColumn": {
      "nativeColumnName": "\"MY_DATABASE\".\"PUBLIC\".\"CUSTOMER\".\"C_FIRST_NAME\"",
      "dataSourceId": 1,
      "columnName": "c_first_name"
    },
    "dataSourceId": 2,
    "columnName": "c_first_name",
    "tagPropagationDirection": "downstream",
    "tags": [
      {
        "name": "SNOWFLAKE_TAGS.pii",
        "source": "immuta-us-east-1"
      }
    ]
  },
  "newAuditServiceFields": {
    "actorIp": null,
    "sessionId": null
  },
  "createdAt": "2022-12-06T19:32:50.372Z",
  "updatedAt": "2022-12-06T19:32:50.372Z"
}

Limitations

Without tableFilter set, Immuta will ingest lineage for every table on the Snowflake instance.
Tag propagation based on lineage is not retroactive. For example, if you add a table, add tags to that table, and then run the lineage ingestion job, tags will not get propagated. However, if you add a table, run the lineage ingestion job, and then add tags to the table, the tags will get propagated.
The lineage job needs to pull in lineage data before any tag is applied in Immuta. When Immuta gets new lineage information from Snowflake, Immuta does not update existing tags in Immuta.
There can be up to a 3-hour delay in Snowflake for a lineage event to make it into the ACCESS_HISTORY view.
Immuta does not ingest lineage information for views.
Snowflake only captures lineage events for CTAS, CLONE, MERGE, and INSERT write operations. Snowflake does not capture lineage events for DROP, RENAME, ADD, or SWAP. Instead of using these latter operations, you need to recreate a table with the same name if you need to make changes.
Immuta cannot enforce coherence of your Snowflake lineage. If a column, table, or schema in the middle of the lineage graph gets dropped, Immuta will not do anything unless a table with that same name gets recreated. This means a table that gets dropped but not recreated could live in Immuta’s system indefinitely.

Snowflake Low Row Access Policy Mode

The Snowflake low row access policy mode improves query performance in Immuta's Snowflake integration by decreasing the number of Snowflake row access policies Immuta creates and by using table grants to manage user access.

Immuta manages access to Snowflake tables by administering Snowflake row access policies and column masking policies on those tables, allowing users to query them directly in Snowflake while policies are enforced.

Without Snowflake low row access policy mode enabled, row access policies are created and administered by Immuta in the following scenarios:

Table grants are disabled and a subscription policy that does not automatically subscribe everyone to the data source is applied. Immuta administers Snowflake row access policies to filter out all the rows to restrict access to the entire table when the user doesn't have privileges to query it. However, if table grants are disabled and a subscription policy is applied that grants everyone access to the data source automatically, Immuta does not create a row access policy in Snowflake. See the subscription policies page for details about these policy types.
Purpose-based policy is applied to a data source. A row access policy filters out all the rows of the table if users aren't acting under the purpose specified in the policy when they query the table.
Row-level security policy is applied to a data source. A row access policy filters out rows querying users don't have access to.
User impersonation is enabled. A row access policy is created for every Snowflake table registered in Immuta.

Deprecation notice

Support for using the Snowflake integration with low row access policy mode disabled has been deprecated. You must enable this feature and table grants for your integration to continue working in future releases. See the release notes for EOL dates.

Reducing row access policies

Snowflake low row access policy mode is enabled by default to reduce the number of row access policies Immuta creates and improve query performance. Snowflake low row access policy mode requires

table grants to be enabled.
user impersonation to be disabled. User impersonation diminishes the performance of interactive queries because of the number of row access policies Immuta creates when it's enabled.

Project-scoped purpose exceptions for Snowflake with low row access policy mode enabled

Project-scoped purpose exceptions for Snowflake integrations allow you to apply purpose-based policies to Snowflake data sources in a project. As a result, users can only access that data when they are working within that specific project.

Masked joins for Snowflake with low row access policy mode enabled

This feature allows masked columns to be joined across data sources that belong to the same project. When data sources do not belong to a project, Immuta uses a unique salt per data source for hashing to prevent masked values from being joined. (See the Why use masked joins? guide for an explanation of that behavior.) However, once you add Snowflake data sources to a project and enable masked joins, Immuta uses a consistent salt across all the data sources in that project to allow the join.

For more information about masked joins and enabling them for your project, see the Masked joins section of documentation.

Limitations and considerations

Project workspaces are not compatible with this feature.
Impersonation is not supported when the Snowflake low row access policy mode is enabled.

Snowflake Table Grants

Snowflake table grants simplifies the management of privileges in Snowflake when using Immuta. Instead of having to manually grant users access to tables registered in Immuta, you allow Immuta to manage privileges on your Snowflake tables and views according to subscription policies. Then, users subscribed to a data source in Immuta can view and query the Snowflake table, while users who are not subscribed to the data source cannot view or query the Snowflake table.

Snowflake privileges

Enabling Snowflake table grants gives the following privileges to the Immuta Snowflake role:

MANAGE GRANTS ON ACCOUNT allows the Immuta Snowflake role to grant and revoke SELECT privileges on Snowflake tables and views that have been added as data sources in Immuta.
CREATE ROLE ON ACCOUNT allows for the creation of a Snowflake role for each user in Immuta, enabling fine-grained, attribute-based access controls to determine which tables are available to which individuals.

Table grants role

Since table privileges are granted to roles and not to users in Snowflake, Immuta's Snowflake table grants feature creates a new Snowflake role for each Immuta user. This design allows Immuta to manage table grants through fine-grained access controls that consider the individual attributes of users.

Each Snowflake user with an Immuta account will be granted a role that Immuta manages. The naming convention for this role is <IMMUTA>_USER_<username>, where

<IMMUTA> is the prefix you specified when enabling the feature on the Immuta app settings page.
<username> is the user's Immuta username.

Querying Snowflake tables managed by Immuta

Users are granted access to each Snowflake table or view automatically when they are subscribed to the corresponding data source in Immuta.

Users have two options for querying Snowflake tables that are managed by Immuta:

Use the role that Immuta creates and manages. (For example, USE ROLE IMMUTA_USER_<username>. See the section above for details about the role and name conventions.) If the current active primary role is used to query tables, USAGE on a Snowflake warehouse must be granted to the Immuta-managed Snowflake role for each user.
USE SECONDARY ROLES ALL, which allows users to use the privileges from all roles that they have been granted, including IMMUTA_USER_<username>, in addition to the current active primary role. Users may also set a value for DEFAULT_SECONDARY_ROLES as an object property on a Snowflake user. To learn more about primary roles and secondary roles in Snowflake, see Snowflake documentation.

Applying GRANTs and REVOKEs at scale

Immuta uses an algorithm to determine the most optimal way to group users in a role hierarchy in order to optimize the number of GRANTs (or REVOKES) executed in Snowflake. This is done by determining the least amount of possible permutations of access across tables and users based on the policies in place; then, those become intermediate roles in the hierarchy that each user is added to, based on the intermediate roles they belong to.

As an example, take the below users and data sources they have access to. To do this naively by individually granting every user to the tables they have access to would result in 37 grants:

Conversely, using the Immuta algorithm, we can optimize the number of grants in the same scenario down to 29:

It’s important to consider a few things here:

If the permutations of access are small, there will be a huge optimization realized (very few intermediate roles). If every user has their own unique permutation of access, the optimization will be negligible (an intermediate role per user). It is most common that the number of permutations of access will be many multiples smaller than the actual user count, so there should be large optimizations. In other words, a much smaller number of intermediate roles and the number of total overall grants reduced, since the tables are granted to roles and roles to users.
This only happens once up front. After that, changes are incremental based on policy changes and user attribute changes (smaller updates), unless there’s a policy that makes a sweeping change across all users. The addition of new users who have access becomes much more straightforward also due to the fact above. User’s access will be granted via the intermediate role, and, therefore, a lot of the work is front loaded in the intermediate role creation.

Limitations

Project workspaces are not supported when Snowflake table grants is enabled.
If an Immuta tenant is connected to an external IAM and that external IAM has a username identical to another username in Immuta's built-in IAM, those users will have the same Snowflake role, leading both to see the same data.
Sometimes the role generated can contain special characters such as @ because it's based on the user name configured from your identity manager. Because of this, it is recommended that any code references to the Immuta-generated role be enclosed with double quotes.

Warehouse Sizing Recommendations

The warehouse you select when configuring the Snowflake integration uses compute resources to set up the integration, register data sources, orchestrate policies, and run jobs like sensitive data discovery. Snowflake credit charges are based on the size of and amount of time the warehouse is active, not the number of queries run.

This document prescribes how and when to adjust the size and scale of clusters for your warehouse to manage workloads so that you can use Snowflake compute resources the most cost effectively.

In general, increase the size of and number of clusters for the warehouse to handle heavy workloads and multiple queries. Workloads are typically lighter after data sources are onboarded and policies are established in Immuta, so compute resources can be reduced after those workloads complete.

Integration and data source registration warehouse use

The Snowflake integration uses warehouse compute resources to sync policies created in Immuta to the Snowflake objects registered as data sources and, if enabled, to run and . Follow the guidelines below to adjust the warehouse size and scale according to your needs.

Increase the of and of clusters for the warehouse during large policy syncs, updates, and changes.
Enable to optimize resource use in Snowflake. In the Snowflake UI, the lowest auto suspend time setting is 5 minutes. However, through SQL query, you can set auto_suspend to 61 seconds (since the minimum uptime for a warehouse is 60 seconds). For example,
```
ALTER WAREHOUSE "WH_NAME" SET WAREHOUSE_SIZE = 'XSMALL' AUTO_SUSPEND = 61 AUTO_RESUME = TRUE MIN_CLUSTER_COUNT = 1 MAX_CLUSTER_COUNT = 2 SCALING_POLICY = 'STANDARD' COMMENT = '';
```
Sensitive data discovery uses compute resources for each table registered if it is enabled. Consider when registering data sources if you have an or a tagging strategy in place.
Register data before creating global policies. By default, Immuta on registered data (unless an existing global policy applies to it), which allows Immuta to only pull metadata instead of also applying policies when data sources are created. Registering data before policies are created reduces the workload and the Snowflake compute resources needed.
Begin onboarding with a small dataset of tables, and then review and monitor query performance in the . Adjust the virtual warehouse accordingly to handle heavier loads.
uses the compute warehouse that was employed during the initial ingestion. If you expect a low number of new tables or minimal changes to the table structure, consider scaling down the warehouse size.

Resize the warehouse after after data sources are registered and policies are established. For example,

ALTER WAREHOUSE "INTEGRATION_WH" SET WAREHOUSE_SIZE = 'XSMALL' AUTO_SUSPEND = 120 AUTO_RESUME = TRUE MIN_CLUSTER_COUNT = 1 MAX_CLUSTER_COUNT = 2 SCALING_POLICY = 'STANDARD';

For more details and guidance about warehouse sizing, see the .

Identifying bulk jobs and heavy workloads

Even after your integration is configured, data sources are registered, and policies are established, changes to those data sources or policies may initiate heavy workloads. Follow the guidelines below to adjust your warehouse size and scale according to your needs.

Check how many credits queries have consumed:

SELECT h.* FROM "SNOWFLAKE"."ACCOUNT_USAGE"."QUERY_HISTORY" h
INNER JOIN "SNOWFLAKE"."ACCOUNT_USAGE"."SESSIONS" s
ON s.session_id = h.session_id
WHERE GET(parse_json(s.client_environment), 'APPLICATION') = 'IMMUTA' limit 25;

Phased Snowflake Onboarding Concept Guide

While you're onboarding Snowflake data sources and designing policies, you don't want to disrupt your Snowflake users' existing workflows. Instead, you want to gradually onboard Immuta through a series of successive changes that will not impact your existing Snowflake users.

A phased onboarding approach to configuring the Snowflake integration ensures that your users will not be immediately affected by changes as you add data sources and configure policies.

Several features allow you to gradually onboard data sources and policies in Immuta:

: By default, no policy is applied at registration time; instead of applying a restrictive policy immediately upon registration, the table is registered in Immuta and waits for a policy to be applied, if ever.
There are several benefits to this design:
- All existing roles maintain access to the data and registration of the table or view with Immuta has zero impact on your data platform.
- It gives you time to configure tags on the Immuta registered tables and views, either manually or through automatic means, such as Immuta’s sensitive data detection (SDD), or an external catalog integration to include Snowflake tags.
- It gives you time to assess and validate the sensitive data tags that were applied.
- You can build only row and column controls with Immuta and let your existing roles manage table access instead of using Immuta subscription policies for table access.
coupled with Snowflake low row access policy mode: With these features enabled, Immuta manages access to tables (subscription policies) through GRANTs. This works by assigning each user their own unique role created by Immuta and all table access is managed using that single role.
Without these two features enabled, Immuta uses a Snowflake row access policy (RAP) to manage table access. A RAP only allows users to access rows in the table if they were explicitly granted access through an Immuta subscription policy; otherwise, the user sees no rows. This behavior means all existing Snowflake roles lose access to the table contents until explicitly granted access through Immuta subscription policies. Essentially, roles outside of Immuta don't control access anymore.
By using table grants and the low row access policy mode, users and roles outside Immuta continue to work.
There are two benefits to this approach:
- All pre-existing Snowflake roles retain access to the data until you explicitly revoke access (outside Immuta).
- It provides a way to test that Immuta GRANTs are working without impacting production workloads.

Requirements

The following configuration is required for phased Snowflake onboarding:

Impersonation is disabled
Project workspaces are disabled

If either of these capabilities is necessary for your use case, you cannot do phased Snowflake onboarding as described below.

Databricks Unity Catalog

This integration allows you to manage and access data in your Databricks account across all of your workspaces. With Immuta’s Databricks Unity Catalog integration, you can write your policies in Immuta and have them enforced automatically by Databricks across data in your Unity Catalog metastore.

This getting started guide outlines how to integrate Databricks Unity Catalog with Immuta.

How-to guides

: Configure the Databricks Unity Catalog integration.
: Migrate from the legacy Databricks Spark integrations to the Databricks Unity Catalog integration.

Reference guide

: This guide describes the design and components of the integration.

Getting Started

The how-to guides linked on this page illustrate how to integrate Databricks Unity Catalog with Immuta.

Requirements:

Unity Catalog and attached to a Databricks workspace. Immuta supports configuring a single metastore for each configured integration, and that metastore may be attached to multiple Databricks workspaces.
Unity Catalog enabled on your Databricks cluster or SQL warehouse. All SQL warehouses have Unity Catalog enabled if your workspace is attached to a Unity Catalog metastore.

Configure your Databricks Unity Catalog integration

These guides provide information on the recommended features to enable with Databricks Unity Catalog, or see the for a comprehensive guide on the benefits of these features and other recommendations.

with the following feature enabled: (enabled by default)
Select None as your .
.
.

Register metadata

These guides provide instructions for organizing your Databricks Unity Catalog data to align with your governance structure.

These guides provide instructions for discovering, classifying, and tagging your data.

Validate the policies. You do not have to validate every policy you create in Immuta; instead, examine a few to validate the behavior you expect to see.
Once all Immuta policies are in place, remove or alter old permissions and revoke access to the ungoverned tables.

How-to Guides

Configure a Databricks Unity Catalog Integration

allows you to manage and access data in your Databricks account across all of your workspaces. With Immuta’s Databricks Unity Catalog integration, you can write your policies in Immuta and have them enforced automatically by Databricks across data in your Unity Catalog metastore.

Permissions

An Immuta user with the APPLICATION_ADMIN permission must configure a Databricks Unity Catalog integration.

A Databricks user authorized to create a Databricks service principal must create one for Immuta. This service principal is used continuously by Immuta to orchestrate Unity Catalog policies and maintain state between Immuta and Databricks. This service principal needs the following Databricks privileges:

USE CATALOG and MANAGE on all catalogs containing securables registered as Immuta data sources and USE SCHEMA on all schemas containing securables registered as Immuta data sources.
MODIFY and SELECT on all securables registered as Immuta data sources.

MANAGE and MODIFY are required so that the service principal can apply row filters and column masks on the securable; to do so, the service principal must also have SELECT on the securable as well as USE CATALOG on its parent catalog and USE SCHEMA on its parent schema. Since privileges are inherited, you can grant the service principal the MODIFY and SELECT privilege on all catalogs or schemas containing Immuta data sources, which automatically grants the service principal the MODIFY and SELECT privilege on all current and future securables in the catalog or schema. The service principal also inherits MANAGE from the parent catalog for the purpose of applying row filters and column masks, but that privilege must be set directly on the parent catalog in order for grants to be fully applied.

See the for more details about Unity Catalog privileges and securable objects.

Optionally, to include audit, the service principal needs the following additional privileges:

USE CATALOG on system catalog
USE SCHEMA on system.access schema
SELECT on system.access.audit table
SELECT on system.access.table_lineage table
SELECT on system.access.column_lineage table

Access to system tables is governed by Unity Catalog. No user has access to these system schemas by default. To grant access, a user that is both a metastore admin and an account admin must grant USE and SELECT permissions on the system schemas to the service principal. See. The system.access schema must also be on the metastore before it can be used.

Requirements

Before you configure the Databricks Unity Catalog integration, ensure that you have fulfilled the following requirements:

Unity Catalog enabled on your Databricks cluster or SQL warehouse. All SQL warehouses have Unity Catalog enabled if your workspace is attached to a Unity Catalog metastore. Immuta recommends linking a SQL warehouse to your Immuta tenant rather than a cluster for both performance and availability reasons.
If you select single user access mode for your cluster, you must
- enable serverless compute for your workspace.

Unity Catalog best practices

Ensure your integration with Unity Catalog goes smoothly by following these guidelines:

Use a Databricks SQL warehouse to configure the integration. Databricks SQL warehouses are faster to start than traditional clusters, require less management, and can run all the SQL that Immuta requires for policy administration. A serverless warehouse provides nearly instant startup time and is the preferred option for connecting to Immuta.
Move all data into Unity Catalog before configuring Immuta with Unity Catalog. The default catalog used once Unity Catalog support is enabled in Immuta is the hive_metastore, which is not supported by the Unity Catalog integration. Data sources in the Hive Metastore must be managed by the Databricks Spark integration. Existing data sources will need to be re-created after they are moved to Unity Catalog and the Unity Catalog integration is configured.

Migrate data to Unity Catalog

Ensure that all Databricks clusters that have Immuta installed are stopped and the Immuta configuration is removed from the cluster. Immuta-specific cluster configuration is no longer needed with the Databricks Unity Catalog integration.

Create the Databricks service principal

Opt to enable query audit for Unity Catalog

- USE CATALOG on the system catalog
- USE SCHEMA on the system.access schema
- SELECT on the following system tables:
  - system.access.audit
  - system.access.table_lineage
  - system.access.column_lineage

Configure the Databricks Unity Catalog integration

You have two options for configuring your Databricks Unity Catalog integration:

Automatic setup

Click the App Settings icon in the left sidebar.
Scroll to the Global Integrations Settings section and check the Enable Databricks Unity Catalog support in Immuta checkbox.
Click the Integrations tab.
Click + Add Integration and select Databricks Unity Catalog from the dropdown menu.
Complete the following fields:
- Server Hostname is the hostname of your Databricks workspace.
- HTTP Path is the HTTP path of your Databricks cluster or SQL warehouse.
- Immuta Catalog is the name of the catalog Immuta will create to store internal entitlements and other user data specific to Immuta. This catalog will only be readable for the Immuta service principal and should not be granted to other users. The catalog name may only contain letters, numbers, and underscores and cannot start with a number.
If using a proxy server with Databricks Unity Catalog, click the Enable Proxy Support checkbox and complete the Proxy Host and Proxy Port fields. The username and password fields are optional.
Opt to fill out the Exemption Group field with the name of a group in Databricks that will be excluded from having data policies applied and must not be changed from the default value. Create this account-level group for privileged users and service accounts that require an unmasked view of data before configuring the integration in Immuta.
Opt to scope the query audit ingestion by entering in Unity Catalog Workspace IDs. Enter a comma-separated list of the workspace IDs that you want Immuta to ingest audit records for. If left empty, Immuta will audit all tables and users in Unity Catalog.
2. Enter how often, in hours, you want Immuta to ingest audit events from Unity Catalog as an integer between 1 and 24.
3. Continue with your integration configuration.
Select your authentication method from the dropdown:
- OAuth machine-to-machine (M2M):
  - AWS Databricks:
    Fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.cloud.databricks.com/oidc/v1/token.
    Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
  - Azure Databricks:
    Within Immuta, fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.azuredatabricks.net/oidc/v1/token.
    Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
Click Save.

Manual setup

Click the App Settings icon in the left sidebar.
Scroll to the Global Integrations Settings section and check the Enable Databricks Unity Catalog support in Immuta checkbox.
Click the Integrations tab.
Click + Add Integration and select Databricks Unity Catalog from the dropdown menu.
Complete the following fields:
- Server Hostname is the hostname of your Databricks workspace.
- HTTP Path is the HTTP path of your Databricks cluster or SQL warehouse.
- Immuta Catalog is the name of the catalog Immuta will create to store internal entitlements and other user data specific to Immuta. This catalog will only be readable for the Immuta service principal and should not be granted to other users. The catalog name may only contain letters, numbers, and underscores and cannot start with a number.
If using a proxy server with Databricks Unity Catalog, click the Enable Proxy Support checkbox and complete the Proxy Host and Proxy Port fields. The username and password fields are optional.
Opt to fill out the Exemption Group field with the name of a group in Databricks that will be excluded from having data policies applied and must not be changed from the default value. Create this account-level group for privileged users and service accounts that require an unmasked view of data before configuring the integration in Immuta.
Opt to scope the query audit ingestion by entering in Unity Catalog Workspace IDs. Enter a comma-separated list of the workspace IDs that you want Immuta to ingest audit records for. If left empty, Immuta will audit all tables and users in Unity Catalog.
2. Enter how often, in hours, you want Immuta to ingest audit events from Unity Catalog as an integer between 1 and 24.
3. Continue with your integration configuration.
Select your authentication method from the dropdown:
- OAuth machine-to-machine (M2M):
  - AWS Databricks:
    Fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.cloud.databricks.com/oidc/v1/token.
    Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
  - Azure Databricks:
    Within Immuta, fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.azuredatabricks.net/oidc/v1/token.
    Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
Select the Manual toggle and copy or download the script. You can modify the script to customize your storage location for tables, schemas, or catalogs.
Run the script in Databricks.
Click Save.

Map Databricks users to Immuta

If the usernames in Immuta do not match usernames in Databricks, map each Databricks username to each Immuta user account to ensure Immuta properly enforces policies using one of the methods linked below:

Opt to enable Databricks Unity Catalog tag ingestion

Design partner preview

This feature is only available to select accounts. Reach out to your Immuta representative to enable this feature.

Requirement: A Databricks Unity Catalog integration must be configured for tags to be ingested.

To allow Immuta to automatically import table and column tags from Databricks Unity Catalog, enable Databricks Unity Catalog tag ingestion in the external catalog section of the Immuta app settings page.

Navigate to the App Settings page.
Scroll to 2 External Catalogs, and click Add Catalog.
Enter a Display Name and select Databricks Unity Catalog from the dropdown menu.
Click Save and confirm your changes.

Register data

Migrate to Unity Catalog

When you enable Unity Catalog, Immuta automatically migrates your existing Databricks data sources in Immuta to reference the legacy hive_metastore catalog to account for Unity Catalog's . New data sources will reference the Unity Catalog metastore you create and attach to your Databricks workspace.

Because the hive_metastore catalog is not managed by Unity Catalog, existing data sources in the hive_metastore cannot have Unity Catalog access controls applied to them. .

To allow Immuta to administer Unity Catalog access controls on that data, move the data to Unity Catalog and re-register those tables in Immuta by completing the steps below. If you don't move all data before configuring the integration, will protect your existing data sources throughout the migration process.

Ensure that all Databricks clusters that have Immuta installed are stopped and the Immuta configuration is removed from the cluster. Immuta-specific cluster configuration is no longer needed with the Databricks Unity Catalog integration.
Move all data into Unity Catalog before configuring Immuta with Unity Catalog. Existing data sources will need to be re-created after they are moved to Unity Catalog and the Unity Catalog integration is configured.
.

How-to Guides

Reference Guides

Configuration Settings

Databricks Spark Cluster Policies

How-to Guides

Reference Guides

Snowflake Integration

Snowflake Enterprise Edition required

Like with all Immuta integrations, Immuta can inject its ABAC model into policy building and administration to remove policy management burden and significantly reduce role explosion.

How the integration works

Data flow

An Immuta application administrator configures the Snowflake integration and registers Snowflake warehouse and databases with Immuta.
Immuta creates a database inside the configured Snowflake warehouse that contains Immuta policy definitions and user entitlements.
A data owner registers Snowflake tables in Immuta as data sources.
If Snowflake tag ingestion was enabled during the configuration, Immuta uses the host provided in the configuration and ingests internal tags on Snowflake tables registered as Immuta data sources.
A data owner, data governor, or administrator creates or changes a policy or a user's attributes change in Immuta.
The Immuta web service calls a stored procedure that modifies the user entitlements or policies.
Immuta manages and applies Snowflake governance column and row access policies to Snowflake tables that are registered as Immuta data sources.
If Snowflake table grants is not enabled, Snowflake object owner or user with the global MANAGE GRANTS privilege grants SELECT privilege on relevant Snowflake tables to users. Note: Although they are GRANTed access, if they are not subscribed to the table via Immuta-authored policies, they will not see data.
A Snowflake user who is subscribed to the data source in Immuta queries the corresponding table directly in Snowflake and sees policy-enforced data.

Policy enforcement

For a user to query Immuta-protected data, they must meet two qualifications:

They must be subscribed to the Immuta data source.
They must be granted SELECT access on the table by the Snowflake object owner or automatically via the Snowflake table grants feature.

After a user has met these qualifications they can query Snowflake tables directly.

See the integration support matrix on the Data policy types reference guide for a list of supported data policy types in Snowflake.

Comply with column length and precision requirements in a Snowflake masking policy

Consider these columns in a data source that have the following masking policies applied:

Column A (VARCHAR(6)): Mask using hashing for everyone
Column B (VARCHAR(5)): Mask using a constant REDACTED for everyone
Column C (VARCHAR(6)): Mask by making null for everyone
Column D (NUMBER(3, 0)): Mask by rounding to the nearest 10 for everyone

Querying this data source in Snowflake would return the following values:

5w4502

REDAC

990

6e3611

REDAC

750

9s7934

REDAC

380

Hashing collisions

For more details about Snowflake column length and precision requirements, see the Snowflake behavior change release documentation.

Query performance

Snowflake privileges

Snowflake privilege

User requiring privilege

Features

Explanation

CREATE DATABASE ON ACCOUNT WITH GRANT OPTION

Setup user

All

CREATE ROLE ON ACCOUNT WITH GRANT OPTION

Setup user

All

The setup script this user runs creates a ROLE for Immuta that will be used to manage the integration once it has been initialized.

CREATE USER ON ACCOUNT WITH GRANT OPTION

Setup user

All

The setup script this user runs creates the IMMUTA_SYSTEM_ACCOUNT user that Immuta will use to manage the integration.

MANAGE GRANTS ON ACCOUNT

Setup user

All

OWNERSHIP ON ROLE IMMUTA_IMPERSONATOR_ROLE

IMMUTA_SYSTEM_ACCOUNT user

Impersonation

If impersonation is enabled, Immuta must be able to manage the Snowflake roles used for impersonation, which is created when the setup script runs, in order to manage the impersonation feature.

ALL PRIVILEGES ON DATABASE IMMUTA_DB

ALL PRIVILEGES ON ALL SCHEMAS IN DATABASE IMMUTA_DB

USAGE ON FUTURE PROCEDURES IN SCHEMA IMMUTA_DB.IMMUTA_PROCEDURES

IMMUTA_SYSTEM_ACCOUNT user

All

The setup script grants the Immuta system account user these privileges because Immuta must have full ownership of the Immuta database where Immuta objects are managed.

USAGE ON WAREHOUSE IMMUTA_WH

IMMUTA_SYSTEM_ACCOUNT user

All

To make changes to state in the Immuta database, Immuta requires access to compute (a Snowflake warehouse). Some state changes are DDL operations, and others are DML and require compute.

IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE

IMMUTA_SYSTEM_ACCOUNT user

Audit

To ingest audit information from Snowflake, Immuta must have access to the SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY view. See the for details.

APPLY TAG ON ACCOUNT

IMMUTA_SYSTEM_ACCOUNT user

Tag ingestion

To ingest table, view, and column tag information from Snowflake, Immuta must have this permission. Immuta reads from the TAG_REFERENCES .

APPLY MASKING POLICY ON ACCOUNT

APPLY ROW ACCESS POLICY ON ACCOUNT

IMMUTA_SYSTEM_ACCOUNT user

Snowflake integration with governance features enabled

Immuta must be able to apply policies to objects throughout the customer’s Snowflake account and query for existing policies on objects using the POLICY_REFERENCES .

MANAGE GRANTS ON ACCOUNT

IMMUTA_SYSTEM_ACCOUNT user

Table grants

Immuta must be able to MANAGE GRANTS on objects throughout the customer's Snowflake account.

CREATE ROLE ON ACCOUNT

IMMUTA_SYSTEM_ACCOUNT user

Table grants

When using the table grants feature, Immuta must be able to create roles as targets for Immuta subscription policy permissions in the customer’s Snowflake account.

USAGE ON DATABASE IMMUTA_DB

USAGE ON SCHEMA IMMUTA_DB.IMMUTA_PROCEDURES

USAGE ON SCHEMA IMMUTA_DB.IMMUTA_FUNCTIONS

USAGE ON FUTURE FUNCTIONS IN SCHEMA IMMUTA_DB.IMMUTA_FUNCTIONS

USAGE ON SCHEMA IMMUTA_DB.IMMUTA_SYSTEM

SELECT ON IMMUTA_DB.IMMUTA_SYSTEM.USER_PROFILE

PUBLIC role

All

SELECT ON IMMUTA_DB.IMMUTA_SYSTEM.ALLOW_LIST

PUBLIC role

All

Registering data sources

Snowflake project workspaces: Snowflake workspaces generate static views with the credentials used to register the table as an Immuta data source. Those tables must be registered in Immuta by an excepted role so that policies applied to the backing tables are not applied to the project workspace views.
Using views and tables within Immuta: Because this integration uses Snowflake governance policies, users can register tables and views as Immuta data sources. However, if you want to register views and apply different policies to them than their backing tables, the owner of the view must be an excepted role; otherwise, the backing table’s policies will be applied to that view.

Snowflake bulk data source creation

Private preview: This feature is only available to select accounts. Reach out to your Immuta representative to enable this feature.

To use this feature, see the Bulk create Snowflake data sources guide.

Resource allocations

Web

Database

Memory

4Gi

16Gi

CPU

Storage

8Gi

24Gi

Limitations

Performance gains are limited when enabling sensitive data discovery at the time of data source creation.
External catalog integrations are not recognized during bulk data source creation. Users must manually trigger a catalog sync for tags to appear on the data source through the data source's health check.

Excepted roles/users

Authentication methods

The Snowflake integration supports the following authentication methods to configure the integration and create data sources:

Username and password: Users can authenticate with their Snowflake username and password.
Key pair: Users can authenticate with a Snowflake key pair authentication.
Snowflake External OAuth: Users can authenticate with Snowflake External OAuth.

Snowflake External OAuth

Workflow

An Immuta application administrator configures the Snowflake integration or creates a data source.
Immuta creates a custom token and sends it to the authorization server.
The authorization server confirms the information sent from Immuta and issues an access token to Immuta.
Immuta sends the access token it received from the authorization server to Snowflake.
Snowflake authenticates the token and grants access to the requested resources from Immuta.
The integration is connected and users can query data.

Supported Snowflake feature

Supported Immuta features

The Snowflake integration supports the Immuta features outlined below. Click the links provided for more details.

Immuta project workspaces: Users can have additional write access in their integration using project workspaces.
Tag ingestion: Immuta automatically ingests Snowflake object tags from your Snowflake instance and adds them to the appropriate data sources.
User impersonation: Impersonation allows users to query data as another Immuta user. To enable user impersonation, see the Integration user impersonation page.
Query audit: Immuta audits queries run in Snowflake against Snowflake data registered as Immuta data sources.
Multiple Snowflake instances
Snowflake low row access policy mode: The Snowflake low row access policy mode improves query performance in Immuta's Snowflake integration by decreasing the number of Snowflake row access policies Immuta creates.
Snowflake table grants: This feature allows Immuta to manage privileges on your Snowflake tables and views according to the subscription policies on the corresponding Immuta data sources.

Immuta project workspaces

Immuta system account required Snowflake privileges

CREATE [OR REPLACE] PROCEDURE
DROP ROLE
REVOKE ROLE

Users can have additional write access in their integration using project workspaces. For more details, see the Snowflake project workspaces page.

Caveat

Tag ingestion

You can enable Snowflake tag ingestion so that Immuta will ingest Snowflake object tags from your Snowflake instance into Immuta and add them to the appropriate data sources.

To enable Snowflake tag ingestion, see the Configure a Snowflake integration page.

Caveats

Query audit

Immuta system account required Snowflake privilege

IMPORTED PRIVILEGES ON DATABASE snowflake

For details about prompting these logs and the contents of these audit logs, see the Snowflake query audit logs page.

Multiple Snowflake instances

A user can configure multiple integrations of Snowflake to a single Immuta tenant and use them dynamically or with workspaces.

Caveats

There can only be one integration connection with Immuta per host.
The host of the data source must match the host of the integration for the view to be created.
Projects can only be configured to use one Snowflake host.

Limitations

If there are errors in generating or applying policies natively in Snowflake, the data source will be locked and only users on the excepted roles/users list and the credentials used to create the data source will be able to access the data.
Once a Snowflake integration is disabled in Immuta, the user must remove the access that was granted in Snowflake. If that access is not revoked, users will be able to access the raw table in Snowflake.
Migration must be done using the credentials and credential method (automatic or bootstrap) used to configure the integration.
When configuring one Snowflake instance with multiple Immuta tenants, the user or system account that enables the integration on the app settings page must be unique for each Immuta tenant.
A Snowflake table can only have one set of policies enforced at a given time, so creating multiple data sources pointing to the same table is not supported. If this is a use case you need to support, create views in Snowflake and expose those instead.
You cannot add a masking policy to an external table column while creating the external table because a masking policy cannot be attached to a virtual column.
If you create an Immuta data source from a Snowflake view created using a select * from query, Immuta column detection will not work as expected because Snowflake views are not automatically updated based on backing table changes. To remedy this, you can create views that have the specific columns you want or you can CREATE AND REPLACE the view in Snowflake whenever the backing table is updated and manually run the column detection job on the data source page.
If a user is created in Snowflake after that user is already registered in Immuta, Immuta does not grant usage on the per-user role automatically - meaning Immuta does not govern this user's access without manual intervention. If a Snowflake user is created after that user is registered in Immuta, the user account must be disabled and re-enabled to trigger a sync of Immuta policies to govern that user. Whenever possible, Snowflake users should be created before registering those users in Immuta.
Snowflake tables from imported databases are not supported. Instead, create a view of the table and register that view as a data source.

Custom WHERE clause limitations

Requirements for a custom WHERE policy

All column names must be fully qualified: Any column names that are unqualified (i.e., just the column name) will default to a column of the data source the policy is being applied to (if one matches the name).
The Immuta system account must have SELECT privileges on all tables/views referenced in a subquery: The Immuta system role name is specified by the user, and the role is created when the Snowflake instance is integrated.

Subquery limitations

Any subqueries that error in Snowflake will also error in Immuta.

Including one or more subqueries in the Immuta policy condition may cause errors in Snowflake. If an error occurs, it may happen during policy creation or at query-time. To avoid these errors, limit the number of subqueries, limit the number of JOIN operations, and simplify WHERE clause conditions.
For more information on the Snowflake subquery limitations see
- Understanding column-level security
- Understanding row access policies

Manual Databricks Spark Configuration

This guide details the manual installation method for enabling access to Databricks with Immuta policies enforced. Before proceeding, ensure your Databricks workspace, instance, and permissions meet the guidelines outlined in the Installation Introduction.

Databricks Unity Catalog: If Unity Catalog is enabled in a Databricks workspace, you must use an Immuta cluster policy when you set up the integration to create an Immuta-enabled cluster.

immuta_conf.xml is no longer required

The immuta_conf.xml file that was previously used to configure the Databricks Spark integration is no longer required to install Immuta, so it is no longer staged as a deployment artifact. However, you can use these snippets if you wish to deploy an immuta_conf.xml file to set properties.

The required Immuta base URL and Immuta system API key properties, along with any other valid properties, can still be specified as Spark environment variables or in the optional immuta_conf.xml file. As before, if the same property is specified in both locations, the Spark environment variable takes precedence.

If you have an existing immuta_conf.xml file, you can continue using it. However, it's recommended that you delete any default properties from the file that you have not explicitly overridden, or remove the file completely and rely on Spark environment variables. Either method will ensure that any property defaults changed in upcoming Immuta releases are propagated to your environment.

1 - Download and Configure Immuta Artifacts

Spark Version

Use Spark 2 with Databricks Runtime prior to 7.x. Use Spark 3 with Databricks Runtime 7.x or later. Attempting to use an incompatible jar and Databricks Runtime will fail.

Navigate to the Immuta archives page. If you are prompted to log in and need basic authentication credentials, contact your Immuta support professional.
Navigate to the Databricks folder for your Immuta version. Ex: https://archives.immuta.com/hadoop/databricks/2024.3.7/.
Download the .jar file (Immuta plugin) as well as the other scripts listed below, which will load the plugin at cluster startup.
```
allowedCallingClasses.json
immuta-benchmark-suite.dbc
immuta-spark-hive-X.X.X_YYYYMMDD-hadoop-Z.Z.Z-public.jar
immuta_cluster_init_script.sh
obscuredCommands.yaml
```
The immuta-benchmark-suite.dbc is a collection of notebooks packaged as a .dbc file. After you have added cluster policies to your cluster, you can import this file into Databricks to run performance tests and compare a regular Databricks cluster to one protected by Immuta. Detailed instructions are available in the first notebook, which will require an Immuta and non-Immuta cluster to generate test data and perform queries. Note: Use Spark 2 with Databricks Runtime prior to 7.x. Use Spark 3 with Databricks Runtime 7.x or later. Attempting to use an incompatible jar and Databricks Runtime will fail.
Specify the following properties as Spark environment variables or in the optional immuta_conf.xml file. If the same property is specified in both locations, the Spark environment variable takes precedence. The variable names are the config names in all upper case with _ instead of .. For example, to set the value of immuta.base.url via an environment variable, you would set the following in the Environment Variables section of cluster configuration: IMMUTA_BASE_URL=https://immuta.mycompany.com
- immuta.system.api.key: Obtain this value from the Immuta Configuration UI under HDFS > System API Key. You will need to be a user with the APPLICATION_ADMIN role to complete this action.
- immuta.base.url: The full URL for the target Immuta tenant Ex: https://immuta.mycompany.com.
- immuta.user.mapping.iamid: If users authenticate to Immuta using an IAM different from Immuta's built-in IAM, you need to update the configuration file to reflect the ID of that IAM. The IAM ID is shown within the Immuta App Settings page within the Identity Management section. See Databricks to Immuta User Mapping for more details.

Environment variables with Google Cloud Platform

Do not use environment variables to set sensitive properties when using Google Cloud Platform. Set them directly in immuta_conf.xml.

Generating a key will destroy any previously generated HDFS keys. This will cause previously integrated HDFS systems to lose access to your Immuta console. The key will only be shown once when generated.

2 - Stage Immuta Artifacts

When configuring the Databricks cluster, a path will need to be provided to each of the artifacts downloaded/created in the previous step. To do this, those artifacts must be hosted somewhere that your Databricks instance can access. The following methods can be used for this step:

Host files in AWS/S3 and provide access by the cluster
Host files in Azure ADL Gen 1 or Gen 2 and provide access by the cluster
Host files on an HTTPS server accessible by the cluster
Host files in DBFS (Not recommended for production)

These artifacts will be downloaded to the required location within the clusters file-system by the init script downloaded in the previous step. In order for the init script to find these files, a URI will have to be provided through environment variables configured on the cluster. Each method's URI structure and setup is explained below.

AWS/S3

URI Structure: s3://[bucket]/[path]

Create an instance profile for clusters by following Databricks documentation.
Upload the configuration file, JSON file, and JAR file to an S3 bucket that the role from step 1 has access to.

Authenticating with Access Keys or Session Tokens (Optional)

If you wish to authenticate using access keys, add the following items to the cluster's environment variables:

IMMUTA_INIT_AWS_SECRET_ACCESS_KEY=<aws secret key>
IMMUTA_INIT_AWS_ACCESS_KEY_ID=<aws access key id>

If you've assumed a role and received a session token, that can be added here as well:

IMMUTA_INIT_AWS_SESSION_TOKEN=<aws session token>

Azure

ADL Gen 2

URI Structure: abfs(s)://[container]@[account].dfs.core.windows.net/[path]

Upload the configuration file, JSON file, and JAR file to an ADL gen 2 blob container.

Environment Variables:

If you want to authenticate using an account key, add the following to your cluster's environment variables:

IMMUTA_INIT_AZCOPY_CRED_TYPE=SharedKey
IMMUTA_INIT_ACCOUNT_NAME=<ADLg2 account name>
IMMUTA_INIT_ACCOUNT_KEY=<ADLg2 account key>

If you want to authenticate using an Azure SAS token, add the following to your cluster's environment variables:

IMMUTA_INIT_AZURE_SAS_TOKEN=<SAS token>

ADL Gen 1

URI Structure: adl://[account].azuredatalakestore.net/[path]

Upload the configuration file, JSON file, and JAR file to ADL gen 1.

Environment Variables:

If authenticating as a Microsoft Entra ID user,

IMMUTA_INIT_AZURE_AD_USER=<Microsoft Entra ID username>
IMMUTA_INIT_AZURE_PASSWORD=<Microsoft Entra ID password>

If authenticating using a service principal,

IMMUTA_INIT_AZURE_SERVICE_PRINCIPAL=<azure service principal>
IMMUTA_INIT_AZURE_PASSWORD=<azure service principal password>
IMMUTA_INIT_AZURE_TENANT=<tenant ID where principal was created>

HTTPS

URI Structure: http(s)://[host](:port)/[path]

Artifacts are available for download from Immuta using basic authentication. Your basic authentication credentials can be obtained from your Immuta support professional.

Environment Variables (Optional)

IMMUTA_INIT_HTTPS_USER=<basic auth username>
IMMUTA_INIT_HTTPS_PASSWORD=<basic auth password>

# Note: Credentials can also be included as part of the artifact URI. For example,
IMMUTA_INIT_JAR_URI=https://user:password@archives.immuta.com/path/to/file

DBFS

DBFS does not support access control

Any Databricks user can access DBFS via the Databricks command line utility. Files containing sensitive materials (such as Immuta API keys) should not be stored there in plain text. Use other methods described herein to properly secure such materials.

URI Structure: dbfs:/[path]

Upload the artifacts directly to DBFS using the Databricks CLI.

Since any user has access to everything in DBFS:

The artifacts can be stored anywhere in DBFS.
It's best to have a cluster-specific place for your artifacts in DBFS if you are testing to avoid overwriting or reusing someone else's artifacts accidentally.

3 - Protect Immuta Environment Variables with Databricks Secrets

It is important that non-administrator users on an Immuta-enabled Databricks cluster do not have access to view or modify Immuta configuration or the immuta-spark-hive.jar file, as this would potentially pose a security loophole around Immuta policy enforcement. Therefore, use Databricks secrets to apply environment variables to an Immuta-enabled cluster in a secure way.

Databricks secrets can be used in the Environment Variables configuration section for a cluster by referencing the secret path rather than the actual value of the environment variable. For example, if a user wanted to make the following value secret

MY_SECRET_ENV_VAR=super_secret_stuff

they could instead create a Databricks secret and reference it as the value of that variable. For instance, if the secret scope my_secrets was created, and the user added a secret with the key my_secret_env_var containing the desired sensitive environment variable, they would reference it in the Environment Variables section:

MY_SECRET_ENV_VAR={{secrets/my_secrets/my_secret_env_var}}

Then, at runtime, {{secrets/my_secrets/my_secret_env_var}} would be replaced with the actual value of the secret if the owner of the cluster has access to that secret.

Best practice: Replace sensitive variables with secrets

Immuta recommends that any sensitive environment variables listed below in the various artifact deployment instructions be replaced with secrets.

4 - Create and Configure the Cluster

Cluster creation in an Immuta-enabled organization or Databricks workspace should be limited to administrative users to avoid allowing users to create non-Immuta enabled clusters.

Create a cluster in Databricks by following the Databricks documentation.
Select the Custom Access mode.
Opt to adjust the Autopilot Options and Worker Type settings. The default values provided here may be more than what is necessary for non-production or smaller use-cases. To reduce resource usage you can enable/disable autoscaling, limit the size and number of workers, and set the inactivity timeout to a lower value.
In the Advanced Options section, click the Instances tab.
- IAM Role (AWS ONLY): Select the instance role you created for this cluster. (For access key authentication, you should instead use the environment variables listed in the AWS section.)

Click the Spark tab. In Spark Config field, add your configuration.

Cluster Configuration Requirements:

spark.executor.extraJavaOptions -Djava.security.manager=com.immuta.security.ImmutaSecurityManager /
    -Dimmuta.security.manager.classes.config=file:///databricks/immuta/allowedCallingClasses.json /
    -Dimmuta.spark.encryption.fpe.class=com.immuta.spark.encryption.ff1.ImmutaFF1Service
spark.driver.extraJavaOptions -Djava.security.manager=com.immuta.security.ImmutaSecurityManager /
    -Dimmuta.security.manager.classes.config=file:///databricks/immuta/allowedCallingClasses.json /
    -Dimmuta.spark.encryption.fpe.class=com.immuta.spark.encryption.ff1.ImmutaFF1Service
spark.databricks.repl.allowedLanguages python,sql
spark.databricks.pyspark.enableProcessIsolation true
spark.databricks.isv.product Immuta

In the Environment Variables section, add the environment variables necessary for your configuration. Remember that these variables should be protected with Databricks secrets as mentioned above.

# Specify the URI to the artifacts that were hosted in the previous steps
# The URI must adhere to the supported types for each service mentioned above
IMMUTA_INIT_JAR_URI=<Full URI to immuta-spark-hive.jar>
IMMUTA_INIT_CONF_URI=<Full URI to Immuta configuration file>
IMMUTA_INIT_ALLOWED_CALLING_CLASSES_URI=<full URI to allowedCallingClasses.json>
IMMUTA_INIT_OBSCURED_COMMANDS_URI=<full URI to obscuredCommands.yaml>

# (OPTIONAL)
# Specify an additional configuration file to be added to the spark.sparkContext.hadoopConfiguration.
# This file allows administrators to add sensitive configuration needed by the SparkSession that
# should not viewable by users.
# Further explanation of this variable as well as examples are provided below.
IMMUTA_INIT_ADDITIONAL_CONF_URI=<full URI to additional configuration file>

Click the Init Scripts tab and set the following configurations:
- Destination: Specify the service you used to host the Immuta artifacts.
- File Path: Specify the full URI to the immuta_cluster_init_script.sh.
- Add the new key/value to the configuration.
Click the Permissions tab and configure the following setting:
- Who has access: Users or groups will need to have the permission Can Attach To to execute queries against Immuta configured data sources.
(Re)start the cluster.

Additional Hadoop Configuration File (Optional)

As mentioned in the "Environment Variables" section of the cluster configuration, there may be some cases where it is necessary to add sensitive configuration to SparkSession.sparkContext.hadoopConfiguration in order to read the data composing Immuta data sources.

As an example, when accessing external tables stored in Azure Data Lake Gen 2, Spark must have credentials to access the target containers/filesystems in ADLg2, but users must not have access to those credentials. In this case, an additional configuration file may be provided with a storage account key that the cluster may use to access ADLg2.

To use an additional Hadoop configuration file, you will need to set the IMMUTA_INIT_ADDITIONAL_CONF_URI environment variable referenced in the Create and configure the cluster section to be the full URI to this file.

The additional configuration file looks very similar to the Immuta Configuration file referenced above. Some example configuration files for accessing different storage layers are below.

Amazon S3

IAM role for S3 access

S3 can also be accessed using an IAM role attached to the cluster. See the Databricks documentation for more details.

<configuration>
    <property>
        <name>fs.s3n.awsAccessKeyId</name>
        <value>[AWS access key ID]</value>
    </property>
    <property>
        <name>fs.s3n.awsSecretAccessKey</name>
        <value>[AWS secret key]</value>
    </property>
</configuration>

Azure Data Lake Gen 2

<configuration>
    <property>
        <name>fs.azure.account.key.[storage account name].dfs.core.windows.net</name>
        <value>[storage account key]</value>
    </property>
</configuration>

Azure Data Lake Gen 1

ADL prefix: Prior to Databricks Runtime version 6, the following configuration items should have a prefix of dfs.adls rather than fs.adl.

<configuration>
    <property>
        <name>fs.adl.oauth2.refresh.url</name>
        <value>https://login.microsoftonline.com/[directory ID]/oauth2/token</value>
    </property>
    <property>
        <name>fs.adl.oauth2.access.token.provider.type</name>
        <value>ClientCredential</value>
    </property>
    <property>
        <name>fs.adl.oauth2.credential</name>
        <value>[client secret from Azure]</value>
    </property>
    <property>
        <name>fs.adl.oauth2.client.id</name>
        <value>[client ID from Azure]</value>
    </property>
</configuration>

Azure Blob Storage

<configuration>
    <property>
        <name>fs.azure.account.key.[storage account name].blob.core.windows.net</name>
        <value>[storage account key]</value>
    </property>
</configuration>

5 - Register data

6 - Query Immuta Data

When the Immuta enabled Databricks cluster has been successfully started, users will see a new database labeled "immuta". This database is the virtual layer provided to access data sources configured within the connected Immuta instance.

Before users can query an Immuta data source, an administrator must give the user Can Attach To permissions on the cluster and GRANT the user access to the immuta database.

The following SQL query can be run as an administrator within a journal to give the user access to "Immuta":

%sql
GRANT SELECT,READ_METADATA ON DATABASE immuta TO `user@company.com`

Below are example queries that can be run to obtain data from an Immuta-configured data source. Because Immuta supports raw tables in Databricks, you do not have to use Immuta-qualified table names in your queries like the first example. Instead, you can run queries like the second example, which does not reference the immuta database.

%sql
select * from immuta.my_data_source limit 5;

%sql
select * from my_data_source limit 5;

Creating a Databricks Data Source

See the Databricks Data Source Creation guide for a detailed walkthrough.

Databricks to Immuta User Mapping

By default, the IAM used to map users between Databricks and Immuta is the BIM (Immuta's internal IAM). The Immuta Spark plugin will check the Databricks username against the username within the BIM to determine access. For a basic integration, this means the users email address in Databricks and the connected Immuta tenant must match.

It is possible within Immuta to have multiple users share the same username if they exist within different IAMs. In this case, the cluster can be configured to lookup users from a specified IAM. To do this, the value of immuta.user.mapping.iamid created and hosted in the previous steps must be updated to be the targeted IAM ID configured within the Immuta tenant. The IAM ID can be found on the App Settings page. Each Databricks cluster can only be mapped to one IAM.

Configure Starburst (Trino) Integration

The plugin comes pre-installed with Starburst Enterprise, so this page provides separate sets of guidelines for configuration:

Starburst Cluster Configuration: These instructions are specific to Starburst Enterprise clusters.
Trino Cluster Configuration: These instructions are specific to open-source Trino clusters.

Starburst Cluster Configuration

Requirement

A valid Starburst Enterprise license.

Starburst does not support using Starburst built-in access control (BIAC) concurrently with any other access control providers such as Immuta. If Starburst BIAC is in use, it must be disabled to allow Immuta to enforce policies on cluster.

1 - Enable the Integration

Click the App Settings icon in the left sidebar.
Click the Integrations tab.
Click Add Integration and select Trino from the Integration Type dropdown menu.
Click Save.

OAuth Authentication

If you are using OAuth or asynchronous authentication to create Starburst (Trino) data sources, configure the globalAdminUsername property in the advanced configuration section of the Immuta app settings page.

Click the App Settings page icon.
Click Advanced Settings and scroll to Advanced Configuration.
Paste the following YAML configuration snippet in the text box, replacing the email address below with your admin username:
```
trino:
  globalAdminUsername: "admins@trino.com"
```

2 - Configure the Immuta System Access Control Plugin in Starburst

Default configuration property values

If you use the default property values in the configuration file described in this section,

you will give users read and write access to tables that are not registered in Immuta and
results for SHOW queries will not be filtered on table metadata.

These default settings help ensure that a new Starburst integration installation is minimally disruptive for existing Starburst deployments, allowing you to then add Immuta data sources and update configuration to enforce more controls as you see fit.

However, the access-control.config-files property can be configured to allow Immuta to work with existing Starburst installations that have already configured an access control provider. For example, if the Starburst integration is configured to allow users write access to tables that are not protected by Immuta, you can still lock down write access for specific non-Immuta tables using an additional access control provider.

TLS Certificate Generation

If you provided your own TLS certificates during Immuta installation, you must ensure that the hostname in your certificate matches the hostname specified in the Starburst (Trino) configuration.

If you did not provide your own TLS certificates, Immuta generated these certificates for you during installation. See notes about your specific deployment method below for details.

Kubernetes Deployment: Immuta generates a local certificate authority (CA) that signs certificates for each service by default. Ensure that the externalHostname you specified in the Immuta Enterprise Helm chart matches the Immuta hostname name specified in the Starburst (Trino) configuration.

If the hostnames in your certificate don't match the hostname specified in your Starburst (Trino) integration, you can set immuta.disable-hostname-verification to true in the Immuta access control config file to get the integration working in the interim.

The Starburst (Trino) integration uses the immuta.ca-file property to communicate with Immuta. When configuring the plugin in Starburst (outlined below), specify a path to your CA file using the immuta.ca-file property in the Immuta access control configuration file.

Create the Immuta access control configuration file in the Starburst configuration directory (/etc/starburst/immuta-access-control.properties for Docker installations or <starburst_install_directory>/etc/immuta-access-control.properties for standalone installations).
The table below describes the properties that can be set during configuration.
Property
Starburst version
Required or optional
Description
access-control.name
392 and newer
Required
This property enables the integration.
access-control.config-files
392 and newer
Optional
immuta.allowed.immuta.datasource.operations
413 and newer
Optional
immuta.allowed.non.immuta.datasource.operations
392 and newer
Optional
immuta.apikey
392 and newer
Required
immuta.audit.legacy.enabled
392 and newer
Optional
This property allows you to turn off the legacy Starburst (Trino) audit if you do not have Elasticsearch set up in your install.
immuta.ca-file
392 and newer
Optional
This property allows you to specify a path to your CA file.
immuta.cache.views.seconds
392 and newer
Optional
Amount of time in seconds for which a user's specific representation of an Immuta data source will be cached for. Changing this will impact how quickly policy changes are reflected for users actively querying Starburst. By default, cache expires after 30 seconds.
immuta.cache.datasource.seconds
392 and newer
Optional
Amount of time in seconds for which a user's available Immuta data sources will be cached for. Changing this will impact how quickly data sources will be available due to changing projects or subscriptions. By default, cache expires after 30 seconds.
immuta.endpoint
392 and newer
Required
The protocol and fully qualified domain name (FQDN) for the Immuta instance used by Starburst (for example, https://my.immuta.instance.io). This should be set to the endpoint displayed when enabling the integration on the app settings page.
immuta.filter.unallowed.table.metadata
392 and newer
Optional
When set to false, Immuta won't filter unallowed table metadata, which helps ensure Immuta remains noninvasive and performant. If this property is set to true, running show catalogs, for example, will reflect what that user has access to instead of returning all catalogs. By default, this property is set to false.
immuta.group.admin
420 and newer
Required if immuta.user.admin is not set
This property identifies the Starburst group that is the Immuta administrator. The users in this group will not have Immuta policies applied to them. Therefore, data sources should be created by users in this group so that they have access to everything. This property can be used in conjunction with the immuta.user.admin property, and regex filtering can be used (with a | delimiter at the end of each expression) to assign multiple groups as the Immuta administrator. Note that you must escape regex special characters (for example, john\\.doe+svcacct@immuta\\.com).
immuta.user.admin
392 and newer
Required if immuta.group.admin is not set
This property identifies the Starburst user who is an Immuta administrator (for example, immuta.user.admin=immuta_system_account). This user will not have Immuta policies applied to them because this account will run the subqueries. Therefore, data sources should be created by this user so that they have access to everything. This property can be used in conjunction with the immuta.group.admin property, and regex filtering can be used with a | delimiter at the end of each expression) to assign multiple users as the Immuta administrator. Note that you must escape regex special characters (for example, john\\.doe+svcacct@immuta\\.com).
Enable the Immuta access control plugin in Starburst's configuration file (/etc/starburst/config.properties for Docker installations or <starburst_install_directory>/etc/config.properties for standalone installations). For example,

access-control.config-files=/etc/starburst/immuta-access-control.properties

Example Immuta System Access Control Configuration

The example configuration snippet below uses the default configuration settings for immuta.allowed.immuta.datasource.operations and immuta.allowed.non.immuta.datasource.operations, which allow read access for data registered as Immuta data sources and read and write access on data that is not registered in Immuta. See the Granting Starburst (Trino) privileges section for details about customizing and enforcing read and write access controls in Starburst.

# Enable the Immuta System Access Control (v2) implementation.
access-control.name=immuta

# The Immuta endpoint that was displayed when enabling the Starburst integration in Immuta.
immuta.endpoint=http://service.immuta.com:3000

# The Immuta API key that was displayed when enabling the Starburst integration in Immuta.
immuta.apikey=45jdljfkoe82b13eccfb9c

# The administrator user regex. Starburst usernames matching this regex will not be subject to
# Immuta policies. This regex should match the user name provided at Immuta data source
# registration.
immuta.user.admin=immuta_system_account

# Optional argument (default is shown).
# A CSV list of operations allowed on schemas/tables registered as Immuta data sources.
immuta.allowed.immuta.datasource.operations=READ

# Optional argument (default is shown).
# A CSV list of operations allowed on schemas/tables not registered as Immuta data sources.
# Set to empty to allow no operations on non-Immuta data sources.
immuta.allowed.non.immuta.datasource.operations=READ,WRITE

# Optional argument (default is shown).
# Controls table metadata filtering for inaccessible tables.
#   - When this property is enabled and non-Immuta reads are also enabled, a user performing
#     'show catalogs/schemas/tables' will not see metadata for a table that is registered as
#     an Immuta data source but the user does not have access to through Immuta.
#   - When this property is enabled and non-Immuta reads and writes are disabled, a user
#     performing 'show catalogs/schemas/tables' will only see metadata for tables that the
#     user has access to through Immuta.
#   - When this property is disabled, a user performing 'show catalogs/schemas/tables' can see
#     all metadata.
immuta.filter.unallowed.table.metadata=false

3 - Add Starburst Users to Immuta

Configure your external IAM to add users to Immuta.
Map their Starburst usernames when configuring your IAM (or map usernames manually) to Immuta.
- All Starburst users must map to Immuta users or match the immuta.user.admin regex configured on the cluster, and their Starburst username must be mapped to Immuta so they can query policy-enforced data.
- A user impersonating a different user in Starburst requires the IMPERSONATE_USER permission in Immuta. Both users must be mapped to an Immuta user, or the querying user must match the configured immuta.user.admin regex.

4 - Register data

Trino Cluster Configuration

1 - Enable the Integration

Click the App Settings icon in the left sidebar.
Click the Integrations tab.
Click Add Integration and select Trino from the dropdown menu.
Click Save.

OAuth Authentication

Click the App Settings page icon.
Click Advanced Settings and scroll to Advanced Configuration.
Paste the following YAML configuration snippet in the text box, replacing the email address below with your admin username:
```
trino:
  globalAdminUsername: "admins@trino.com"
```

2 - Configure the Immuta System Access Control Plugin in Trino

A user with access to Immuta's Archives site is required to conduct the download in this step at https://archives.immuta.com. If you are prompted to log in and need basic authentication credentials, contact your Immuta support professional.

Default configuration property values

If you use the default property values in the configuration file described in this section,

you will give users read and write access to tables that are not registered in Immuta and
results for SHOW queries will not be filtered on table metadata.

These default settings help ensure that a new Starburst integration installation is minimally disruptive for existing Trino deployments, allowing you to then add Immuta data sources and update configuration to enforce more controls as you see fit.

However, the access-control.config-files property can be configured to allow Immuta to work with existing Trino installations that have already configured an access control provider. For example, if the Starburst (Trino) integration is configured to allow users write access to tables that are not protected by Immuta, you can still lock down write access for specific non-Immuta tables using an additional access control provider.

TLS Certificate Generation

If you provided your own TLS certificates during Immuta installation, you must ensure that the hostname in your certificate matches the hostname specified in the Starburst (Trino) configuration.

If you did not provide your own TLS certificates, Immuta generated these certificates for you during installation. See notes about your specific deployment method below for details.

Kubernetes Deployment: Immuta generates a local certificate authority (CA) that signs certificates for each service by default. Ensure that the externalHostname you specified in the Immuta Helm Chart matches the Immuta hostname name specified in the Starburst (Trino) configuration.

The Immuta Trino plugin version is updated alongside Trino so that a matching version of the plugin is published for corresponding Trino releases. For example, the Immuta plugin version supporting Trino version 403 is simply version 403. Download the plugin from version from Immuta's Archives site that corresponds with the Trino version you use.
Enable Immuta on your cluster. Select the tab below that corresponds to your installation method for instructions:

Docker installations

Follow Trino's documentation to install the plugin archive on all nodes in your cluster.
Create the Immuta access control configuration file in the Trino configuration directory: /etc/trino/immuta-access-control.properties.

immuta-trino Docker image

For Trino versions 414 and newer, an immuta-trino Docker image that includes the Trino plugin jars is available from registry.immuta.com. Before using this image, consider the following factors:

This image was designed to provide a method for customers to quickly set up and validate the integration, so it should be used in a development environment. Use the Docker installation method above for production environments.
Immuta only supports the Immuta Trino plugin on the Docker image, not any other software packaged on the image.
If you experience an issue with the image outside of the scope of the Immuta plugin, you must rebuild your own version of the image using the Docker installation method above.

To use this image,

Pull the image and start the container. The example below specifies the Immuta Trino plugin version 414 with the 414 tag, but any supported Trino version newer than 414 can be used:
```
docker run registry.immuta.com/immuta/immuta-trino:414
```
Create the Immuta access control configuration file in the Trino configuration directory: /etc/trino/immuta-access-control.properties.

Standalone installations

Follow Trino's documentation to install the plugin archive on all nodes in your cluster.
Create the Immuta access control configuration file in the Trino configuration directory: <trino_install_directory>/etc/immuta-access-control.properties.

Configure the properties described in the table below.

Property

Trino version

Required or optional

Description

access-control.name

392 and newer

Required

This property enables the integration.

access-control.config-files

392 and newer

Optional

Trino allows you to enable multiple system access control providers at the same time. To do so, add providers to this property as comma-separated values. This approach allows Immuta to work with existing Trino installations that have already configured an access control provider. Immuta does not manage all permissions in Trino and will default to allowing access to anything Immuta does not manage so that the Starburst (Trino) integration complements existing controls. For example, if the Starburst (Trino) integration is configured to allow users write access to tables that are not protected by Immuta, you can still lock down write access for specific non-Immuta tables using an additional access control provider.

immuta.allowed.immuta.datasource.operations

413 and newer

Optional

immuta.allowed.non.immuta.datasource.operations

392 and newer

Optional

immuta.apikey

392 and newer

Required

immuta.audit.legacy.enabled

392 and newer

Optional

This property allows you to turn off the legacy Starburst (Trino) audit if you do not have Elasticsearch set up in your install.

immuta.ca-file

392 and newer

Optional

This property allows you to specify a path to your CA file.

immuta.cache.views.seconds

392 and newer

Optional

Amount of time in seconds for which a user's specific representation of an Immuta data source will be cached for. Changing this will impact how quickly policy changes are reflected for users actively querying Trino. By default, cache expires after 30 seconds.

immuta.cache.datasource.seconds

392 and newer

Optional

Amount of time in seconds for which a user's available Immuta data sources will be cached for. Changing this will impact how quickly data sources will be available due to changing projects or subscriptions. By default, cache expires after 30 seconds.

immuta.endpoint

392 and newer

Required

The protocol and fully qualified domain name (FQDN) for the Immuta instance used by Trino (for example, https://my.immuta.instance.io). This should be set to the endpoint displayed when enabling the integration on the app settings page.

immuta.filter.unallowed.table.metadata

392 and newer

Optional

When set to false, Immuta won't filter unallowed table metadata, which helps ensure Immuta remains noninvasive and performant. If this property is set to true, running show catalogs, for example, will reflect what that user has access to instead of returning all catalogs. By default, this property is set to false.

immuta.group.admin

420 and newer

Required if immuta.user.admin is not set

This property identifies the Trino group that is the Immuta administrator. The users in this group will not have Immuta policies applied to them. Therefore, data sources should be created by users in this group so that they have access to everything. This property can be used in conjunction with the immuta.user.admin property, and regex filtering can be used (with a | delimiter at the end of each expression) to assign multiple groups as the Immuta administrator. Note that you must escape regex special characters (for example, john\\.doe+svcacct@immuta\\.com).

immuta.user.admin

392 and newer

Required if immuta.group.admin is not set

This property identifies the Trino user who is an Immuta administrator (for example, immuta.user.admin=immuta_system_account). This user will not have Immuta policies applied to them because this account will run the subqueries. Therefore, data sources should be created by this user so that they have access to everything. This property can be used in conjunction with the immuta.group.admin property, and regex filtering can be used with a | delimiter at the end of each expression) to assign multiple users as the Immuta administrator. Note that you must escape regex special characters (for example, john\\.doe+svcacct@immuta\\.com).

Enable the Immuta access control plugin in Trino's configuration file (/etc/trino/config.properties for Docker installations or <trino_install_directory>/etc/config.properties for standalone installations). For example,
```
access-control.config-files=/etc/trino/immuta-access-control.properties
```

Example Immuta System Access Control Configuration

# Enable the Immuta System Access Control (v2) implementation.
access-control.name=immuta

# The Immuta endpoint that was displayed when enabling the Starburst integration in Immuta.
immuta.endpoint=http://service.immuta.com:3000

# The Immuta API key that was displayed when enabling the Starburst integration in Immuta.
immuta.apikey=45jdljfkoe82b13eccfb9c

# The administrator user regex. Starburst usernames matching this regex will not be subject to
# Immuta policies. This regex should match the user name provided at Immuta data source
# registration.
immuta.user.admin=immuta_system_account

# Optional argument (default is shown).
# A CSV list of operations allowed on schemas/tables registered as Immuta data sources.
immuta.allowed.immuta.datasource.operations=READ

# Optional argument (default is shown).
# A CSV list of operations allowed on schemas/tables not registered as Immuta data sources.
# Set to empty to allow no operations on non-Immuta data sources.
immuta.allowed.non.immuta.datasource.operations=READ,WRITE

# Optional argument (default is shown).
# Controls table metadata filtering for inaccessible tables.
#   - When this property is enabled and non-Immuta reads are also enabled, a user performing
#     'show catalogs/schemas/tables' will not see metadata for a table that is registered as
#     an Immuta data source but the user does not have access to through Immuta.
#   - When this property is enabled and non-Immuta reads and writes are disabled, a user
#     performing 'show catalogs/schemas/tables' will only see metadata for tables that the
#     user has access to through Immuta.
#   - When this property is disabled, a user performing 'show catalogs/schemas/tables' can see
#     all metadata.
immuta.filter.unallowed.table.metadata=false

3 - Add Trino Users to Immuta

Configure your external IAM to add users to Immuta.
Map their Trino usernames when configuring your IAM (or map usernames manually) to Immuta.
- All Trino users must map to Immuta users or match the immuta.user.admin regex configured on the cluster, and their Trino username must be mapped to Immuta so they can query policy-enforced data.
- A user impersonating a different user in Trino requires the IMPERSONATE_USER permission in Immuta. Both users must be mapped to an Immuta user, or the querying user must match the configured immuta.user.admin regex.

4 - Register data

Amazon S3

Private preview: The Amazon S3 integration is available to select accounts. Reach out to your Immuta representative for details.

Getting started

Immuta's Amazon S3 integration allows users to apply subscription policies to data in S3 to restrict what prefixes, buckets, or objects users can access. To enforce access controls on this data, Immuta creates S3 grants that are administered by S3 Access Grants, an AWS feature that defines access permissions to data in S3.

Requirements

No location is registered in your S3 Access Grants instance before configuring the integration in Immuta
Write policies private preview enabled for your account; contact your Immuta representative to get this feature enabled

Permissions

APPLICATION_ADMIN Immuta permission to configure the integration
CREATE_S3_DATASOURCE Immuta permission to register S3 prefixes
The AWS account credentials or optional AWS IAM role you provide Immuta to configure the integration must
- have ownership of the buckets Immuta will enforce policies on
- have the permissions to perform the following actions to create locations and issue grants:
  - accessgrantslocation resource:
    s3:CreateAccessGrant
    s3:DeleteAccessGrantsLocation
    s3:GetAccessGrantsLocation
    s3:UpdateAccessGrantsLocation
  - accessgrantsinstance resource:
    s3:CreateAccessGrantsInstance
    s3:CreateAccessGrantsLocation
    s3:DeleteAccessGrantsInstance
    s3:GetAccessGrantsInstance
    s3:GetAccessGrantsInstanceForPrefix
    s3:GetAccessGrantsInstanceResourcePolicy
    s3:ListAccessGrants
    s3:ListAccessGrantsLocations
  - accessgrant resource:
    s3:DeleteAccessGrant
    s3:GetAccessGrant
  - bucket resource: s3:ListBucket
  - role resource:
    iam:GetRole
    iam:PassRole
  - all resources: s3:ListAccessGrantsInstances

Set up S3 Access Grants instance

Follow AWS documentation to create an Access Grants instance using the S3 console, AWS CLI, AWS SDKs, or the REST API. AWS supports one Access Grants instance per region per AWS account.
Follow the instructions at the top of the "Register a location" page in AWS documentation to create an AWS IAM role and edit the trust policy to give the S3 Access Grants service principal access to this role in the resource policy file. You will add this role to your integration configuration in Immuta so that Immuta can register this role with your Access Grants location. The policy should include at least the following permissions, but might need additional permissions depending on other local setup factors. An example trust policy is provided below.
- sts:AssumeRole
- sts:SetSourceIdentity

IAM role trust policy example

{
  "Version": "2012-10-17",
    "Statement": [
    {
      "Sid": "Stmt1234567891011",
      "Effect": "Allow",
      "Principal": {
        "Service":"access-grants.s3.amazonaws.com"
      },
      "Action": [
        "sts:AssumeRole", 
        "sts:SetSourceIdentity"
      ]
    }
  ]
}

Follow the instructions at the top of the "Register a location" page in AWS documentation to create an IAM policy with the following permissions, and attach the policy to the IAM role you created to grant the permissions to the role. The policy should include the following permissions. An example policy is provided below.

s3:GetObject
s3:GetObjectVersion
s3:GetObjectAcl
s3:GetObjectVersionAcl
s3:ListMultipartUploadParts
s3:PutObject
s3:PutObjectAcl
s3:PutObjectVersionAcl
s3:DeleteObject
s3:DeleteObjectVersion
s3:AbortMultipartUpload
s3:ListBucket
s3:ListAllMyBuckets

IAM policy example

Replace <bucket_arn> in the example below with the ARN of the bucket scope that contains data you want to grant access to.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ObjectLevelReadPermissions",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:GetObjectAcl",
                "s3:GetObjectVersionAcl",
                "s3:ListMultipartUploadParts"
            ],
            "Resource": [
                <bucket arn>
            ]
        },
        {
            "Sid": "ObjectLevelWritePermissions",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:PutObjectAcl",
                "s3:PutObjectVersionAcl",
                "s3:DeleteObject",
                "s3:DeleteObjectVersion",
                "s3:AbortMultipartUpload"
            ],
            "Resource": [
                <bucket arn>
            ]
        },
        {
            "Sid": "BucketLevelReadPermissions",
            "Effect": "Allow",
            "Action": [
                "s3:ListAllMyBuckets",
                "s3:ListBucket"
            ],
            "Resource": [
                <bucket arn>
            ]
        }
    ]
}

If you use server-side encryption with AWS Key Management Service (AWS KMS) keys to encrypt your data, the following permissions are required for the IAM role in the policy. If you do not use this feature, do not include these permissions in your IAM policy:

kms:Decrypt
kms:GenerateDataKey

Opt to create an AWS IAM role that Immuta can use to create Access Grants locations and issue grants. This role must have the S3 permissions listed in the permissions section. An example policy is provided below.

IAM policy example

Replace <role_arn> and <access_grants_instance_arn> in the example below with the ARNs of the role you created and your Access Grants instance, respectively. The Access Grants instance resource ARN should be scoped to apply to any future locations that will be created under this Access Grants instance. For example, "Resource": "arn:aws:s3:us-east-2:6********499:access-grants/default*" ensures that the role would have permissions for both of these locations:

arn:aws:s3:us-east-2:6********499:access-grants/default/newlocation1
arn:aws:s3:us-east-2:6********499:access-grants/default/newlocation2

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "RolePermissions",
            "Effect": "Allow",
            "Action": [
                "iam:GetRole",
                "iam:PassRole"
            ],
            "Resource": "<role_arn>"
        },
        {
            "Sid": "AccessGrants",
            "Effect": "Allow",
            "Action": [
                "s3:CreateAccessGrant",
                "s3:DeleteAccessGrantsLocation",
                "s3:GetAccessGrantsLocation",
                "s3:CreateAccessGrantsLocation",
                "s3:GetAccessGrantsInstance",
                "s3:GetAccessGrantsInstanceForPrefix",
                "s3:GetAccessGrantsInstanceResourcePolicy",
                "s3:ListAccessGrants",
                "s3:ListAccessGrantsLocations",
                "s3:ListAccessGrantsInstances",
                "s3:DeleteAccessGrant",
                "s3:GetAccessGrant"
            ],
            "Resource": [
                "<access_grants_instance_arn>"
            ]
        }
    ]
}

If you use AWS IAM Identity Center, associate your IAM Identity Center instance with your S3 Access Grants instance. Then add the permissions listed in the sample policy below to your IAM policy, and attach the policy to the IAM role you created to grant the permissions to the role.

IAM policy example

Copy the JSON below and replace the following bracketed placeholder values with your own. For details about the actions and resource values, see the IAM Identity Center API reference documentation.

<iam_identity_center_instance_arn>: The ARN of the instance of IAM Identity Center (InstanceArn) that is configured with the application.
<iam_identity_center_application_arn_for_s3_access_grants>: The ARN of the S3 Access Grants instance (ApplicationArn) configured with IAM Identity Center.
<aws_account>: Your AWS account ID.
<identity_store_id>: The globally unique identifier for the identity store (IdentityStoreId) that is connected to the Identity Center instance. This value is generated when a new identity store is created.

{
  "Sid": "sso",
  "Effect": "Allow",
  "Action": [
    "sso:DescribeInstance",
    "sso:DescribeApplication",
    "sso-directory:DescribeUsers"
  ],
  "Resource": [
    "<iam_identity_center_instance_arn>",
    "<iam_identity_center_application_arn_for_s3_access_grants>",
    "arn:aws:identitystore:::user/*",
    "arn:aws:identitystore::<aws_account>:identitystore/<identity_store_id>"
  ]
}, {
  "Sid": "idc",
  "Effect": "Allow",
  "Action": [
    "identitystore:DescribeUser",
    "identitystore:DescribeGroup"
  ],
  "Resource": [
    "<iam_identity_center_instance_arn>",
    "<iam_identity_center_application_arn_for_s3_access_grants>",
    "arn:aws:identitystore:::user/*",
    "arn:aws:identitystore::<aws_account>:identitystore/<identity_store_id>"
  ]
}

Configure the integration in Immuta

In Immuta, click App Settings in the navigation menu and click the Integrations tab.
Click + Add Integration.
Select Amazon S3 from the dropdown menu and click Continue Configuration.
Complete the connection details fields, where
- Friendly Name is a name for the integration that is unique across all Amazon S3 integrations configured in Immuta.
- AWS Account ID is the ID of your AWS account.
- AWS Region is the AWS region to use.
- S3 Access Grants Location IAM Role ARN is the role the S3 Access Grants service assumes to vend credentials to the grantee. When a grantee accesses S3 data, the Access Grants service attaches session policies and assumes this role in order to vend credentials scoped to a prefix or bucket to the grantee. This role needs full access to all paths under the S3 location prefix.
- S3 Access Grants S3 Location Scope is the base S3 location that Immuta will use for this connection when registering S3 prefixes. This path must be unique across all S3 integrations configured in Immuta. During data source registration, this prefix is prepended to the data source prefixes to build the final path used to grant or revoke access to that data in S3. For example, a location prefix of s3://research-data would be prepended to the data source prefix /demographics to generate a final path of s3://research-data/demographics.
Select your authentication method:
- Automatically discover AWS credentials: Searches and obtains credentials using the AWS SDK's default credential provider chain. This method requires a configured IAM role for a service account. Work with your Immuta representative to customize your deployment and set up an IAM role for a service account that can give Immuta the credentials to set up the integration.
- Access using access key and secret access key: Provide your AWS Access Key ID and AWS Secret Access Key.
Click Verify Credentials.
Click Next to review and confirm your connection information, and then click Complete Setup.

Register S3 data

Follow the Create an S3 data source guide to register prefixes in Immuta.
Recommended: Organize your data sources into domains and assign domain permissions to accountable teams.

To create an S3 data source using the API, see the Configure an S3 integration and create an S3 data source API guide.

Editing an integration

You can edit the following settings for an existing Amazon S3 integration on the app settings page:

friendly name
authentication type and values (access key, secret, and role)

To edit settings for an existing integration via the API, see the Configure an Amazon S3 integration API guide.

Protect data

Requirements: USER_ADMIN Immuta permission and either the GOVERNANCE or CREATE_S3_DATASOURCE Immuta permission

Build read or write subscription policies in Immuta to enforce access controls.
Map AWS IAM principals to each Immuta user to ensure Immuta properly enforces policies:
1. Click People and select Users in the navigation menu.
2. Navigate to the user's page and click the more actions icon next to their username.
3. Select Change S3 User or AWS IAM Role from the dropdown menu.
4. Use the dropdown menu to select the User Type. Then complete the S3 field. User and role names are case-sensitive. See the AWS documentation for details.
  - AWS IAM role principals: Only a single Immuta user can be mapped to an IAM role. This restriction prohibits enforcing policies on AWS users who could assume that role. Therefore, if using role principals, create a new user in Immuta that represents the role so that the role then has the permissions applied specifically to it.
  - AWS IAM user principals
  - AWS Identity Center user IDs: You must use the numeric User ID value found in AWS IAM Identity Center, not the user's email address. Ensure that you have added the content to your IAM policy JSON as outlined in the Set up S3 Access Grants instance section above to allow Immuta to use AWS Identity Center.
  - Unset (fallback to Immuta username): When selecting this option, the S3 username is assumed to be the same as the Immuta username.
5. Click Save.
See the Mapping IAM principals in Immuta section for details about supported principals.

Access data

Requirement: User must be subscribed to the data source in Immuta

Request access to Amazon S3 data through S3 Access Grants. If you're accessing S3 data through one of the supported S3 Access Grants integrations (such as Amazon EMR on EC2), that application will make this request on your behalf, so you can skip this step.
Use the temporary credentials you received in the previous step to access the data in S3.

S3 integration overview

With this integration, users can avoid

hand-writing AWS IAM policies
managing AWS IAM role limits
manually tracking what user or role has access to what files in AWS S3 and verifying those are consistent with intent

S3 Access Grants components

To enforce controls on S3 data, Immuta interacts with several S3 Access Grants components:

Access Grants instance: An Access Grants instance is a logical container for individual grants that specify who can access what level of data in S3 in your AWS account and region. AWS supports one Access Grants instance per region per AWS account.
Location: A location specifies what data the Access Grants instance can grant access to. For example, registering a location with a scope of s3:// allows Access Grants to manage access to all S3 buckets in that AWS account and region, whereas setting the bucket s3://research-data as the scope limits Access Grants to managing access to that single bucket for that location. When you configure the S3 integration in Immuta, you specify a location's scope and IAM assumed role, and Immuta registers the location in your Access Grants instance and associates it with the provided IAM role for you. Each S3 integration you configure in Immuta is associated with one location, and Immuta manages all grants in that location. Therefore, grants cannot be manually created by users in an Access Grants instance location that Immuta has registered and manages. During data source registration, this location scope is prepended to the data source prefixes to build the final path used to grant or revoke access to that data in S3. For example, a location scope of s3://research-data would be prepended to the data source prefix /demographics to generate a final path of s3://research-data/demographics.
Individual grants: Individual permission grants in S3 Access Grants specify the identity that can access the data, the access level, and the location of the S3 data. Immuta creates a grant for each user subscribed to a prefix, bucket, or object by interacting with the Access Grants API. Each grant has its own ID and gives the user or role principle access to the data.
IAM assumed role: This is an IAM role you create in S3 that has full access to all prefixes, buckets, and objects in the Access Grants location registered by Immuta. This IAM role is used to vend temporary credentials to users or applications. When a grantee requests temporary credentials, the S3 Access Grants service assumes this role to vend credentials scoped to the prefix, bucket, or object specified in the grant to the grantee. The grantee then uses these credentials to access S3 data. When configuring the integration in Immuta, you specify this role, and then Immuta associates this role with the registered location in the Access Grants instance.
Temporary credentials: These just-in-time access credentials provide access to a prefix, bucket, or object with a permission level of READ or READWRITE in S3. When a user or application requests temporary credentials to access S3 data, the S3 Access Grants instance evaluates the request against the grants Immuta has created for that user. If a matching grant exists, S3 Access Grants assumes the IAM role associated with the location of the matching grant and scopes the permissions of the IAM session to the S3 prefix, bucket, or object specified by the grant and vends these temporary credentials to the requester. These credentials have a default timeout of 1 hour, but this duration can be changed by the requester.

The diagram below illustrates how these S3 Access Grants components interact.

For more details about these Access Grants concepts, see the S3 Access Grants documentation.

How does the integration work?

After an administrator creates an Access Grants instance and an assumed IAM role in their AWS account, an application administrator configures the Amazon S3 integration in Immuta. During configuration, the administrator provides the following connection information so that Immuta can create and register a location in that Access Grants instance:

AWS account ID and region
ARN for the existing Access Grants instance
ARN for the assumed IAM role

When Immuta registers this location, it associates the assumed IAM role with the location. This allows the IAM role to create temporary credentials with access scoped to a particular S3 prefix, bucket, or object in the location. The IAM role you create for this location must have all the object- and bucket-level permissions listed in the set up S3 Access Grants instance section on all buckets and objects in the location; if it is missing permissions, the IAM role will not be able to grant those missing permissions to users or applications requesting temporary credentials.

In the example below, an application administrator registers the following location prefix and IAM role for their Access Grants instance in AWS account 123456:

Location path: s3://. This path allows a single Amazon S3 integration to manage all objects in S3 in that AWS account and region. Data owners can scope down access further when registering specific S3 prefixes and applying policies.
Location IAM role: The arn:aws:iam::123456:role/access-grants-role IAM role will be used to vend temporary credentials to users and applications.

Immuta registers this location and associated IAM role in the user's Access Grants instance:

After the S3 integration is configured, a data owner can register S3 prefixes and buckets that are in the configured Access Grants location path to enforce access controls on resources. Immuta stores the connection information for the prefix so that the metadata can be used to create and enforce subscription policies on S3 data.

A data owner or governor can apply a subscription policy to a registered prefix, bucket, or object to control who can access objects beginning with that prefix or in that bucket after it is registered in Immuta. Once a subscription policy is created and Immuta users are subscribed to the prefix, bucket, or object, Immuta calls the Access Grants API to create a grant for each subscribed user, specifying the following parameters in the payload so that Access Grants can create and store a grant for each user:

Access Grants location
READ access
User or role principle
Registered prefix, bucket, or object

In the example below, a data owner registers the s3://research-data/* bucket, and Immuta stores the connection information in the Immuta metadata database. Once the user, Taylor, is subscribed to s3://research-data/*, Immuta calls the Access Grants API to create a grant for that user to allow them to read and write S3 data in that bucket:

Accessing S3 data

To access S3 data registered in Immuta, users must be subscribed to the prefix, bucket, or object in Immuta, and their principals must be mapped to their Immuta user accounts. Once users are subscribed, they request temporary credentials from S3 Access Grants. Access Grants looks up the grant ID associated with the requester. If no matching grant exists, they receive an access denied error. If one exists, Access Grants assumes the IAM role associated with the location and requests temporary credentials that are scoped to the prefix, bucket, or object and permissions specified by the individual grant. Access Grants vends the credentials to the requester, who uses those temporary credentials to access the data in S3.

In the example below, Taylor requests temporary credentials from S3 Access Grants. Access Grants looks up the grant ID (1) for that user, assumes the arn:aws:iam::123456:role/access-grants-role IAM role for the location, and vends temporary credentials to Taylor, who then uses the credentials to access the research-data bucket in S3:

Note that when accessing data through S3 Access Grants, the user or application interacts directly with the Access Grants API to request temporary credentials; Immuta does not act in this process at all. See the diagram below for an illustration of the process for accessing data through S3 Access Grants.

AWS services that support S3 Access Grants will request temporary credentials for users automatically. If users are not using a service that supports S3 Access Grants, they must have the permissions listed in the AWS documentation to call the Access Grants API directly themselves to request temporary credentials to access data through the access grant.

For a list of AWS services that support S3 Access Grants, see the AWS documentation.

Policy enforcement

Immuta's S3 integration allows data owners and governors to apply object-level access controls on data in S3 through subscription policies. When a user is subscribed to a registered prefix, bucket, or object, Immuta calls the Access Grants API to create an individual grant that narrows the scope of access within the location to that registered prefix, bucket, or object. See the diagram below for a visualization of this process.

When a user's entitlements change or a subscription policy is added to, updated, or deleted from a prefix, Immuta performs one of the following processes for each user subscribed to the registered prefix:

User added to the prefix: Immuta specifies a permission (READ or READWRITE) for each user and uses the Access Grants API to create an individual grant for each user.
User updated: Immuta deletes the current grant ID and creates a new one using the Access Grants API.
User deleted: Immuta deletes the grant ID using the Access Grants API.

Immuta offers two subscription policy access types to manage read and write access to data in S3:

Read access policies manage who can get objects from S3.
Write access policies manage who can modify data in S3.

Data policies, which provide more granular controls by redacting or masking values in a table, are not supported for S3.

Prefix registration

Data owners can register an S3 prefix at any level in the S3 path by creating an Immuta data source. During this process, Immuta stores the connection information for use in subscription policies.

Each prefix added in the data registration workflow is created as a single Immuta data source, and a subscription policy added to a data source applies to any objects in that bucket or beginning with that prefix:

Therefore, data owners should register prefixes or buckets at the lowest level of access control they need for that data. Using the example above, if the data owner needed to allow different users to access s3://yellow-bucket/research-data/* than those who should access s3://yellow-bucket/analyst-data/*, the data owner must register the research-data/* and analyst-data/* prefixes separately and then apply a subscription policy to those prefixes:

Deleting registered prefixes

When an S3 data source is deleted, Immuta deletes all the grants associated with that prefix, bucket, or object in that location.

Mapping IAM principals in Immuta

Names are case-sensitive

The IAM role name and IAM user name are case-sensitive. See the AWS documentation for details.

Immuta supports mapping an Immuta user to AWS in one of the following ways:

AWS IAM Identity Center user IDs
IAM role principals: Only a single Immuta user can be mapped to an IAM role. This restriction prohibits enforcing policies on AWS users who could assume that role. Therefore, if using role principals, create a new user in Immuta that represents the role so that the role then has the permissions applied specifically to it.
IAM user principals

See the protect data section for instructions on mapping principals to user accounts in Immuta.

Existing S3 integrations

The Amazon S3 integration will not interfere with existing legacy S3 integrations, and multiple S3 integrations can exist in a single Immuta tenant.

Supported AWS services

For a list of AWS services that support S3 Access Grants, see the AWS documentation.

Limitations

During private preview, Immuta supports up to 500 prefixes (data sources) and up to 20 Immuta users that are mapped to S3 identities principals. This is a preview limitation that will be removed in a future phase of the integration.
S3 Access Grants allows 100,000 grants per region per account. Thus, if you have 5 Immuta users with access to 20,000 registered prefixes, you would reach this limit. See AWS documentation for details.
The following Immuta features are not currently supported by the integration in private preview:
- Audit
- Automatically syncing Immuta with AWS IAM identities: you cannot set the S3 User Type field to AWS IAM User when configuring your identity provider (IdP) in Immuta
- Data policies
- Schema monitoring
- Tag ingestion

Integrations

Immuta Integrations

Feature support

Policy support

Audit support for platform queries

Snowflake

How-to guides

Reference guides

Getting Started

Configure your Snowflake integration

Register metadata

How-to Guides

Configure a Snowflake Integration

Permissions

Automatic setup permissions

Manual setup permissions

Configure the integration

Select your configuration method

Automatic setup

Manual setup

Select available warehouses (optional)

Select excepted roles and users

Save the configuration

Opt to enable Snowflake tag ingestion

Register data

Snowflake Table Grants Migration

Edit or Remove Your Snowflake Integration

Edit a Snowflake integration

Automatic edit

Manual edit

Remove a Snowflake integration

Automatic removal

Manual removal

Integration Settings

Enable Snowflake Table Grants

Use Snowflake Data Sharing with Immuta

Create Immuta Policies to Protect the Data

Register the Snowflake Data Consumer with Immuta

Create the Snowflake Data Share

Configure Snowflake Lineage Tag Propagation

Configure the Snowflake integration

Trigger Snowflake lineage sync job

Prerequisite

Trigger the lineage job

Next steps

Enable Snowflake Low Row Access Policy Mode

Configure your Snowflake integration

Upgrade Snowflake Low Row Access Policy Mode

Prerequisites

Upgrade to Snowflake low row access policy mode

Reference Guides

Snowflake Integration

How the integration works

Data flow

Policy enforcement

Comply with column length and precision requirements in a Snowflake masking policy

Query performance

Snowflake privileges

Registering data sources

Snowflake bulk data source creation

Resource allocations

Limitations

Excepted roles/users

Authentication methods

Snowflake External OAuth

Workflow

Supported Snowflake feature

Supported Immuta features

Immuta project workspaces

Caveat

Tag ingestion

Caveats

Query audit

Multiple Snowflake instances

Caveats

Limitations

Custom WHERE clause limitations

Requirements for a custom WHERE policy

Subquery limitations

Snowflake Data Sharing