1 of 100

Configure Your Integration

This section guides you through configuring your integrations. Once configuration is complete, data owners and governors can use tags to create policies.

Best practices for users, permissions, attributes, and tags

The best practices outlined below will also appear in callouts within relevant tutorials.

If sensitive data discovery has been enabled, then manually adding tags to columns in the data dictionary will be unnecessary in most cases. The data owner will need to verify that the Discovered tags are correct.
Turning on sensitive data discovery can improve your data's security with its automated tagging. Immuta highly recommends the use of this feature in tandem with vigilant verification of tags on all data sources.
Use an external IAM for authentication and Immuta's internal IAM to manage attributes.
Use the minimum number of tags possible to achieve the data privacy needed.
Start organizing attributes and groups in Immuta and transfer them to your IAM.

Section contents

This section includes concept, reference, and how-to guides for configuring your integrations, connecting your IAM and external catalog, and enabling sensitive data discovery. Some of these guides are provided below. See the left navigation for a complete list of resources.

Configure your integrations:
Connect an external IAM.
Connect an external catalog.
Enable sensitive data discovery.

Snowflake

Use case

While you're onboarding Snowflake data sources and designing policies, you don't want to disrupt your Snowflake users' existing workflows.

Instead, you want to gradually onboard Immuta through a series of successive changes that will not impact your existing Snowflake users.

A phased onboarding approach to configuring the Snowflake integration ensures that your users will not be immediately affected by changes as you add data sources and configure policies.

Several features allow you to gradually onboard data sources and policies in Immuta:

Subscription policy of “None” by default: By default, no policy is applied at registration time; instead of applying a restrictive policy immediately upon registration, the table is registered in Immuta and waits for a policy to be applied, if ever.
There are several benefits to this design:
- All existing roles maintain access to the data and registration of the table or view with Immuta has zero impact on your data platform.
- It gives you time to configure tags on the Immuta registered tables and views, either manually or through automatic means, such as Immuta’s sensitive data detection (SDD), or an external catalog integration to include Snowflake tags.
- It gives you time to assess and validate the sensitive data tags that were applied.
- You can build only row and column controls with Immuta and let your existing roles manage table access instead of using Immuta subscription policies for table access.
Snowflake table grants coupled with Snowflake low row access policy mode: With these features enabled, Immuta manages access to tables (subscription policies) through GRANTs. This works by assigning each user their own unique role created by Immuta and all table access is managed using that single role.
Without these two features enabled, Immuta uses a Snowflake row access policy (RAP) to manage table access. A RAP only allows users to access rows in the table if they were explicitly granted access through an Immuta subscription policy; otherwise, the user sees no rows. This behavior means all existing Snowflake roles lose access to the table contents until explicitly granted access through Immuta subscription policies. Essentially, roles outside of Immuta don't control access anymore.
By using table grants and the low row access policy mode, users and roles outside Immuta continue to work.
There are two benefits to this approach:
- All pre-existing Snowflake roles retain access to the data until you explicitly revoke access (outside Immuta).
- It provides a way to test that Immuta GRANTs are working without impacting production workloads.

Requirements

The following configuration is required for phased Snowflake onboarding:

Impersonation is disabled
Project workspaces are disabled

If either user impersonation or project workspaces is necessary for your use case, you cannot do phased Snowflake onboarding as described below.

Configure your Snowflake integration

Configure your Snowflake integration with the following features enabled:
- Snowflake table grants (enabled by default)
- Snowflake low row access policy mode
Select None as your default subscription policy.

Register data and validate tags

Plan the policies you need to have in place, the tags that will apply to your data, and how the tags will be applied to your data.
Enable sensitive data discovery (SDD).
Register a subset of your tables to configure and validate SDD.
Configure SDD to discover entities of interest for your policy needs.
Validate that the SDD tags are applied correctly.
Register your remaining tables at the schema level with schema detection turned on. This setting allows Immuta to continuously monitor for schema changes (new tables, column, dropped tables, columns, changed column types).
Let SDD or external catalog synchronization complete, and then validate that SDD tags are applied correctly.
Further customize SDD as necessary.

Write policies

At this point, no policies are in place because of the default subscription policy setting. Now you can write and apply the policies you planned. You do not have to do all policies at once.

In the steps below, you do not have to validate every policy you create in Immuta; instead, examine a few to validate the behavior you expect to see.

Subscription policies

Subscription policies grant or revoke access to Snowflake tables.

If necessary, you could use your existing roles for table access and only use Immuta for row access policies and masking policies.

Immuta roles are created for users once they are subscribed to a table by a policy. SECONDARY ROLES ALL allows you to combine warehouse access with the Immuta role.

Create a global subscription policy.
Validate that the Immuta users impacted now have an Immuta role in Snowflake dedicated to them.
Validate that when acting under the Immuta role those users have access to the table(s) in question.
Validate that users without access in Immuta can still access the table with a different Snowflake role that has access.
Validate that a user with SECONDARY ROLES ALL enabled retains access if
- they were not granted access by Immuta and
- they have a role that provides them access, even if they are not currently acting under that role.

Data policies

Data policies enforce fine-grained access controls on a table (for example, row access policies or masking policies).

Create a global data policy.
Validate that a user with a role that can access the table in question (whether it's an Immuta role or not) sees the impact of that data policy.

Remove or alter old roles

Once all Immuta policies are in place, remove or alter old roles.

Delete irrelevant roles instead of revoking access to avoid confusion.
Ensure deleting roles will not have other implications, like impacting warehouse access. If deleting those roles will have unintended effects alter those roles to remove the access control logic instead of deleting them.

Concept Guide

Phased Snowflake Onboarding Approach

Use Case

While you're onboarding Snowflake data sources and designing policies, you don't want to disrupt your Snowflake users' existing workflows.

Instead, you want to gradually onboard Immuta through a series of successive changes that will not impact your existing Snowflake users.

A phased onboarding approach to configuring the Snowflake integration ensures that your users will not be immediately affected by changes as you add data sources and configure policies.

Several features allow you to gradually onboard data sources and policies in Immuta:

Subscription policy of “None” by default: By default, no policy is applied at registration time; instead of applying a restrictive policy immediately upon registration, the table is registered in Immuta and waits for a policy to be applied, if ever.
There are several benefits to this design:
- All existing roles maintain access to the data and registration of the table or view with Immuta has zero impact on your data platform.
- It gives you time to configure tags on the Immuta registered tables and views, either manually or through automatic means, such as Immuta’s sensitive data detection (SDD), or an external catalog integration to include Snowflake tags.
- It gives you time to assess and validate the sensitive data tags that were applied.
- You can build only row and column controls with Immuta and let your existing roles manage table access instead of using Immuta subscription policies for table access.
Snowflake table grants coupled with Snowflake low row access policy mode: With these features enabled, Immuta manages access to tables (subscription policies) through GRANTs. This works by assigning each user their own unique role created by Immuta and all table access is managed using that single role.
Without these two features enabled, Immuta uses a Snowflake row access policy (RAP) to manage table access. A RAP only allows users to access rows in the table if they were explicitly granted access through an Immuta subscription policy; otherwise, the user sees no rows. This behavior means all existing Snowflake roles lose access to the table contents until explicitly granted access through Immuta subscription policies. Essentially, roles outside of Immuta don't control access anymore.
By using table grants and the low row access policy mode, users and roles outside Immuta continue to work.
There are two benefits to this approach:
- All pre-existing Snowflake roles retain access to the data until you explicitly revoke access (outside Immuta).
- It provides a way to test that Immuta GRANTs are working without impacting production workloads.

Requirements

The following configuration is required for phased Snowflake onboarding:

Impersonation is disabled
Project workspaces are disabled

If either of these capabilities is necessary for your use case, you cannot do phased Snowflake onboarding as described below.

See the Getting started page for step-by-step guidance to implement phased Snowflake onboarding.

Reference Guides

Snowflake Integration Reference Guide

Snowflake Enterprise Edition required

This integration requires the Snowflake Enterprise Edition.

In this integration, Immuta manages access to Snowflake tables by administering Snowflake row access policies and column masking policies on those tables, allowing users to query tables directly in Snowflake while dynamic policies are enforced.

Like with all Immuta integrations, Immuta can inject its ABAC model into policy building and administration to remove policy management burden and significantly reduce role explosion.

Architecture

When an administrator configures the Snowflake integration with Immuta, Immuta creates an IMMUTA database and schemas (immuta_procedures, immuta_policies, and immuta_functions) within Snowflake to contain policy definitions and user entitlements. Immuta then creates a system role and gives that system account the following privileges:

APPLY MASKING POLICY
APPLY ROW ACCESS POLICY
ALL PRIVILEGES ON DATABASE "IMMUTA" WITH GRANT OPTION
ALL PRIVILEGES ON ALL SCHEMAS IN DATABASE "IMMUTA" WITH GRANT OPTION
USAGE ON FUTURE PROCEDURES IN SCHEMA "IMMUTA".immuta_procedures WITH GRANT OPTION
USAGE ON WAREHOUSE
OWNERSHIP ON SCHEMA "IMMUTA".immuta_policies TO ROLE "IMMUTA_SYSTEM" COPY CURRENT GRANTS
OWNERSHIP ON SCHEMA "IMMUTA".immuta_procedures TO ROLE "IMMUTA_SYSTEM" COPY CURRENT GRANTS
OWNERSHIP ON SCHEMA "IMMUTA".immuta_functions TO ROLE "IMMUTA_SYSTEM" COPY CURRENT GRANTS
OWNERSHIP ON SCHEMA "IMMUTA".public TO ROLE "IMMUTA_SYSTEM" COPY CURRENT GRANTS

Optional features, like automatic object tagging, native query auditing, etc., require additional permissions to be granted to the Immuta system account, are listed in the supported features section.

Policy enforcement

Snowflake is a policy push integration with Immuta. When Immuta users create policies, they are then pushed into the Immuta database within Snowflake; there, the Immuta system account applies Snowflake row access policies and column masking policies directly onto Snowflake tables. Changes in Immuta policies, user attributes, or data sources trigger webhooks that keep the Snowflake policies up-to-date.

For a user to query Immuta-protected data, they must meet two qualifications:

They must be subscribed to the Immuta data source.
They must be granted SELECT access on the table by the Snowflake object owner or automatically via the Snowflake table grants feature.

After a user has met these qualifications they can query Snowflake tables directly.

Comply with column length and precision requirements in a Snowflake masking policy

When a user applies a masking policy to a Snowflake data source, Immuta truncates masked values to align with Snowflake column length (VARCHAR(X) types) and precision (NUMBER (X,Y) types) requirements.

Consider these columns in a data source that have the following masking policies applied:

Column A (VARCHAR(6)): Mask using hashing for everyone
Column B (VARCHAR(5)): Mask using a constant REDACTED for everyone
Column C (VARCHAR(6)): Mask by making null for everyone
Column D (NUMBER(3, 0)): Mask by rounding to the nearest 10 for everyone

Querying this data source in Snowflake would return the following values:

Hashing collisions

Hashing collisions are more likely to occur across or within Snowflake columns restricted to short lengths, since Immuta truncates the hashed value to the limit of the column. (Hashed values truncated to 5 characters have a higher risk of collision than hashed values truncated to 20 characters.) Therefore, avoid applying hashing policies to Snowflake columns with such restrictions.

For more details about Snowflake column length and precision requirements, see the Snowflake behavior change release documentation.

Query performance

When a policy is applied to a column, Immuta uses Snowflake memoizable functions to cache the result of the called function. Then, when a user queries a column that has that policy applied to it, Immuta uses that cached result to dramatically improve query performance.

Registering data sources

Best practice

Use a dedicated Snowflake role to register Snowflake tables as Immuta data sources. Then, include this role in the excepted roles/users list.

Register Snowflake data sources using a dedicated Snowflake role. No policies will apply to that role, ensuring that your integration works with the following use cases:

Snowflake project workspaces: Snowflake workspaces generate static views with the credentials used to register the table as an Immuta data source. Those tables must be registered in Immuta by an excepted role so that policies applied to the backing tables are not applied to the project workspace views.
Using views and tables within Immuta: Because this integration uses Snowflake governance policies, users can register tables and views as Immuta data sources. However, if you want to register views and apply different policies to them than their backing tables, the owner of the view must be an excepted role; otherwise, the backing table’s policies will be applied to that view.

Snowflake bulk data source creation

Private preview

This feature is only available to select accounts. Reach out to your Immuta representative to enable this feature.

Bulk data source creation is the more efficient process when loading more than 5000 data sources from Snowflake and allows for data sources to be registered in Immuta before running sensitive data discovery or applying policies.

To use this feature, see the Bulk create Snowflake data sources guide.

Resource allocations

Based on performance tests that create 100,000 data sources, the following minimum resource allocations need to be applied to the appropriate pods in your Kubernetes environment for successful bulk data source creation.

Limitations

Performance gains are limited when enabling sensitive data discovery at the time of data source creation.
External catalog integrations are not recognized during bulk data source creation. Users must manually trigger a catalog sync for tags to appear on the data source through the data source's health check.

Excepted roles/users

Excepted roles and users are assigned when the integration is installed, and no policies will apply to these users' queries, despite any Immuta policies enforced on the tables they are querying. Credentials used to register a data source in Immuta will be automatically added to this excepted list for that Snowflake table. Consequently, roles and users added to this list and used to register data sources in Immuta should be limited to service accounts.

Immuta excludes the listed roles and users from policies by wrapping all policies in a CASE statement that will check if a user is acting under one of the listed usernames or roles. If a user is, then the policy will not be acted on the queried table. If the user is not, then the policy will be executed like normal. Immuta does not distinguish between role and username, so if you have a role and user with the exact same name, both the user and any user acting under that role will have full access to the data sources and no policies will be enforced for them.

Data flow

An Immuta application administrator configures the Snowflake integration and registers Snowflake warehouse and databases with Immuta.
Immuta creates a database inside the configured Snowflake warehouse that contains Immuta policy definitions and user entitlements.
A data owner registers Snowflake tables in Immuta as data sources.
If Snowflake tag ingestion was enabled during the configuration, Immuta uses the host provided in the configuration and ingests internal tags on Snowflake tables registered as Immuta data sources.
A data owner, data governor, or administrator creates or changes a policy or a user's attributes change in Immuta.
The Immuta web service calls a stored procedure that modifies the user entitlements or policies.
Immuta manages and applies Snowflake governance column and row access policies to Snowflake tables that are registered as Immuta data sources.
If Snowflake table grants is not enabled, Snowflake object owner or user with the global MANAGE GRANTS privilege grants SELECT privilege on relevant Snowflake tables to users. Note: Although they are GRANTed access, if they are not subscribed to the table via Immuta-authored policies, they will not see data.
A Snowflake user who is subscribed to the data source in Immuta queries the corresponding table directly in Snowflake and sees policy-enforced data.

Authentication methods

The Snowflake integration supports the following authentication methods to install the integration and create data sources:

Username and password: Users can authenticate with their Snowflake username and password.
Key pair: Users can authenticate with a Snowflake key pair authentication.
Snowflake External OAuth: Users can authenticate with Snowflake External OAuth when using Snowflake with governance features.

Snowflake External OAuth

Immuta's OAuth authentication method uses the Client Credentials Flow to integrate with Snowflake External OAuth. When a user configures the Snowflake integration or connects a Snowflake data source, Immuta uses the token credentials (obtained using a certificate or passing a client secret) to craft an authenticated access token to connect with Snowflake. This allows organizations that already use Snowflake External OAuth to use that secure authentication with Immuta.

Workflow

An Immuta application administrator configures the Snowflake integration or creates a data source.
Immuta creates a custom token and sends it to the authorization server.
The authorization server confirms the information sent from Immuta and issues an access token to Immuta.
Immuta sends the access token it received from the authorization server to Snowflake.
Snowflake authenticates the token and grants access to the requested resources from Immuta.
The integration is connected and users can query data.

Supported Snowflake feature

The Immuta Snowflake integration supports Snowflake external tables. However, you cannot add a masking policy to an external table column while creating the external table in Snowflake because masking policies cannot be attached to virtual columns.

Supported Immuta features

The Snowflake integration with Snowflake governance features supports the Immuta features outlined below. Click the links provided for more details.

Immuta project workspaces: Users can have additional write access in their integration using project workspaces.
Tag ingestion: Immuta automatically ingests Snowflake object tags from your Snowflake instance and adds them to the appropriate data sources.
User impersonation: Native impersonation allows users to natively query data as another Immuta user. To enable native user impersonation, see the Integration user impersonation page.
Native query audit: Immuta audits queries run natively in Snowflake against Snowflake data registered as Immuta data sources.
Multiple Snowflake instances
Snowflake low row access policy mode: The Snowflake low row access policy mode improves query performance in Immuta's Snowflake integration by decreasing the number of Snowflake row access policies Immuta creates.
Snowflake table grants: This feature allows Immuta to manage privileges on your Snowflake tables and views according to the subscription policies on the corresponding Immuta data sources.

Immuta project workspaces

Immuta system account required Snowflake privileges

CREATE [OR REPLACE] PROCEDURE
DROP ROLE
REVOKE ROLE

Users can have additional write access in their integration using project workspaces. For more details, see the Snowflake project workspaces page.

Caveat

To use project workspaces with the Snowflake integration with governance features, the default role of the account used to create data sources in the project must be added to the "Excepted Roles/Users List." If the role is not added, you will not be able to query the equalized view using the project role in Snowflake.

Tag ingestion

Immuta system account required Snowflake privileges

GRANT IMPORTED PRIVILEGES ON DATABASE snowflake
GRANT APPLY TAG ON ACCOUNT

When configuring a Snowflake integration, you can enable Snowflake tag ingestion as well. With this feature enabled, Immuta will automatically ingest Snowflake object tags from your Snowflake instance into Immuta and add them to the appropriate data sources.

The Snowflake tags' key and value pairs will be reflected in Immuta as two levels: the key will be the top level and the value the second. As Snowflake tags are hierarchical, Snowflake tags applied to a database will also be applied to all of the schemas in that database, all of the tables within those schemas, and all of the columns within those tables. For example: If a database is tagged PII, all of the tables and columns in that database will also be tagged PII.

To enable Snowflake tag ingestion, follow one of the tutorials below:

Manually enable Snowflake tag ingestion: This tutorial is intended for users who want Snowflake tags to be ingested into Immuta but do not want users to query data sources natively in Snowflake.
Automatically enable Snowflake tag ingestion: This tutorial illustrates how to enable Snowflake tag ingestion when configuring a Snowflake integration.

Caveats

Snowflake has some natural data latency. If you manually refresh the governance page to see all tags created globally, users can experience a delay of up to two hours. However, if you run schema detection or a health check to find where those tags are applied, the delay will not occur because Immuta will only refresh tags for those specific tables.

Native query audit

Immuta system account required Snowflake privileges

IMPORTED PRIVILEGES ON DATABASE snowflake

Once this feature has been enabled with the Snowflake integration, Immuta will query Snowflake to retrieve user query histories. These histories provide audit records for queries against Snowflake data sources that are queried natively in Snowflake.

This process will happen automatically every hour by default but can be configured to a different frequency when configuring or editing the integration. Additionally, audit ingestion can be manually requested at any time from the Immuta audit page. When manually requested, it will only search for new queries that were created since the last native query that had been audited. The job is run in the background, so the new queries will not be immediately available.

For details about prompting these logs and the contents of these audit logs, see the Snowflake query audit logs page.

Multiple Snowflake instances

A user can configure multiple integrations of Snowflake to a single Immuta instance and use them dynamically or with workspaces.

Caveats

There can only be one integration connection with Immuta per host.
The host of the data source must match the host of the integration for the view to be created.
Projects can only be configured to use one Snowflake host.

Limitations

If there are errors in generating or applying policies natively in Snowflake, the data source will be locked and only users on the excepted roles/users list and the credentials used to create the data source will be able to access the data.
Once a Snowflake integration is disabled in Immuta, the user must remove the access that was granted in Snowflake. If that access is not revoked, users will be able to access the raw table in Snowflake.
Migration must be done using the credentials and credential method (automatic or bootstrap) used to install the integration.
When configuring one Snowflake instance with multiple Immuta instances, the user or system account that enables the integration on the app settings page must be unique for each Immuta instance.
A Snowflake table can only have one set of policies enforced at a given time, so creating multiple data sources pointing to the same table is not supported. If this is a use case you need to support, create views in Snowflake and expose those instead.
You cannot add a masking policy to an external table column while creating the external table because a masking policy cannot be attached to a virtual column.
If you create an Immuta data source from a Snowflake view created using a select * from query, Immuta column detection will not work as expected because Snowflake views are not automatically updated based on backing table changes. To remedy this, you can create views that have the specific columns you want or you can CREATE AND REPLACE the view in Snowflake whenever the backing table is updated and manually run the column detection job on the data source page.
If a user is created in Snowflake after that user is already registered in Immuta, Immuta does not grant usage on the per-user role automatically - meaning Immuta does not govern this user's access without manual intervention. If a Snowflake user is created after that user is registered in Immuta, the user account must be disabled and re-enabled to trigger a sync of Immuta policies to govern that user. Whenever possible, Snowflake users should be created before registering those users in Immuta.
Snowflake tables from imported databases are not supported. Instead, create a view of the table and register that view as a data source.

Custom WHERE clause limitations

The Immuta Snowflake integration uses Snowflake governance features to let users query data natively in Snowflake. This means that Immuta also inherits some Snowflake limitations using correlated subqueries with row access policies and column-level security. These limitations appear when writing custom WHERE policies, but do not remove the utility of row-level policies.

Requirements for a custom WHERE policy

All column names must be fully qualified:
1. Any column names that are unqualified (i.e., just the column name) will default to a column of the data source the policy is being applied to (if one matches the name).
The Immuta system account must have SELECT privileges on all tables/views referenced in a subquery:
1. The Immuta system role name is specified by the user, and the role is created when the Snowflake instance is integrated.

Subquery limitations

Any subqueries that error in Snowflake will also error in Immuta.

Including one or more subqueries in the Immuta policy condition may cause errors in Snowflake. If an error occurs, it may happen during policy creation or at query-time. To avoid these errors, limit the number of subqueries, limit the number of JOIN operations, and simplify WHERE clause conditions.
For more information on the Snowflake subquery limitations see
- Understanding column-level security
- Understanding row access policies

Legacy Snowflake Integration

In this legacy integration, all enforcement is done by creating views that contain all policy logic. Each view has a 1-to-1 relationship with the original table. All policy-enforced views are accessible through the PUBLIC role and access controls are applied in the view, allowing customers to leverage Immuta's powerful set of attribute-based policies. Additionally, users can continue using roles to enforce compute-based policies through "warehouse" roles, without needing to grant each of those roles access to the underlying data or create multiple views of the data for each specific business unit.

Architecture

This integration leverages webhooks to keep Snowflake views up-to-date with the corresponding Immuta data sources. Whenever a data source or policy is created, updated, or disabled, a webhook will be called that will create, modify, or delete the Snowflake view with Immuta policies.

The SQL that makes up all views includes a join to the secure view: immuta_system.user_profile. This view is a select from the immuta_system.profile table (which contains all Immuta users and their current groups, attributes, projects, and a list of valid tables they have access to) with a constraint immuta__userid = current_user() to ensure it only contains the profile row for the current user. This secure view is readable by all users and will only display the data that corresponds to the user executing the query.

Note: The immuta_system.profile table is updated through webhooks whenever a user's groups or attributes change, they switch projects, they acknowledge a purpose, or when their data source access is approved or revoked. The profile table can only be read and updated by the Immuta system account.

By default, all views are created within the immuta database, which is accessible by the PUBLIC role, so users acting under any Snowflake role can connect. All views within the database have the SELECT permission granted to the PUBLIC role as well, and access is enforced by the access_check function built into the individual views. Consequently, there is no need for users to manage any role-based access to any of the database objects managed by Immuta.

Secure and non-secure views

When creating a Snowflake data source, users have the option to use a regular view (traditional database view) or a secure view; however, according to Snowflake's documentation , "the Snowflake query optimizer, when evaluating secure views, bypasses certain optimizations used for regular views. This may result in some impact on query performance for secure views." To use the data source with both Snowflake and Snowflake workspaces, secure views are necessary. Note: If HIPAA compliance is required, secure views must be used.

Non-secure view policy implications

When using a non-secure view, certain policies may leak sensitive information. In addition to the concerns outlined here, there is also a risk of someone exploiting the query optimizer to discover that a row exists in a table that has been excluded by row-level policies. This attack is mentioned here in the Snowflake documentation.

Policies that will not leak sensitive information

Masking by making NULL, using a constant, or by rounding (date/numeric)
Minimization row-level policies
Date-based row-level policies
K-anonymization masking policies

Policies that could leak sensitive information

Masking using a regex will show the regex being applied. In general this should be safe, but if you have a regex policy that removes a specific selector to redact (e.g., a regex of /123-45-6789/g to specifically remove a single SSN from a column), then someone would be able to identify columns with that value.
In conditional masking and custom WHERE clauses including “Right To Be Forgotten,” the custom SQL will be visible, so for a policy like "only show rows where COUNTRY NOT IN(‘UK’, ‘AUS’)," users will know that it’s possible there is data in that table containing those values.

Policies that will leak potentially sensitive information

These policies leak information sensitive to Immuta, but in most cases would require an attacker to reverse the algorithm. In general these policies should be used with secure views:

Masking using hashing will include the salt used.
Numeric and categorical randomized response will include the salt used.
Reversible masking will include both a key and an IV.
Format preserving masking will include a tweak, key, an alphabet range, prefix, pad to length, and checksum id if used.

Policy enforcement

The data sources themselves have all the Data policies included in the SQL through a series of CASE statements that determine which view of the data a user will see. Row-level policies are applied as top-level WHERE clauses, and usage policies (purpose-based or subscription-level) are applied as WHERE clauses against the user_profile JOIN. The access_check function allows Immuta to throw custom errors when a user lacks access to a data source because they are not subscribed to the data source, they are operating under the wrong project, or they cannot view any data because of policies enforced on the data source.

Integration migration

Migration troubleshooting

If multiple Snowflake integrations are enabled, they will all migrate together. If one fails, they will all revert to the Snowflake Standard integration.
If an error occurs during migration and the integration cannot be reverted, the integration must be disabled and re-enabled.

You can migrate from a Snowflake integration without governance features to a Snowflake integration with governance features on the app settings page. Once prompted, Immuta will migrate the integration, allowing users to seamlessly transition workloads from the legacy Immuta views to the direct Snowflake tables.

After the migration is complete, Immuta views will still exist for pre-existing Snowflake data sources to support existing workflows. However, disabling the Immuta data source will drop the Immuta view, and, if the data source is re-enabled, the view will not be recreated.

Limitations

Certain interpolation functions can also block the creation of a view, specifically @interpolatedComparison() and @iam.
When configuring one Snowflake instance with multiple Immuta instances, the user or system account that enables the integration on the app settings page must be unique for each Immuta instance.

Snowflake Data Sharing with Immuta

Immuta is compatible with Snowflake Secure Data Sharing. Using both Immuta and Snowflake, organizations can share the policy-protected data of their Snowflake database with other Snowflake accounts with Immuta policies enforced in real time. This integration gives data consumers a live connection to the data and relieves data providers of the legal and technical burden of creating static data copies that leave their Snowflake environment.

There are two options to use Snowflake Data Sharing with Immuta:

Snowflake Data Shares with Immuta Users (Public Preview): This option utilizes Snowflake table grants and requires the data viewer to be registered as an Immuta user.
Snowflake Data Shares with Non-Immuta Users: This option utilizes Snowflake project workspaces to share policy-protected data without data viewers being registered as Immuta users.

Snowflake Data Shares with Immuta Users (Public Preview)

This method allows data providers to share policy-enforced data with data consumers registered in Immuta.

The data consumer will register in Immuta as a user with the appropriate Immuta attributes and groups. Once that user has subscribed to the data source, they will be able to see the policy-protected data of a Snowflake data share.

For a tutorial on this workflow, see the Using Snowflake Data Sharing page.

Requirements

Snowflake Enterprise Edition or higher
Immuta's table grants feature

Benefits

Using Immuta users with Snowflake Data Sharing allows the sharer to

Only need limited knowledge of the context or goals of the existing policies in place: Because the sharer is not editing or creating policies to share their data, they only need a limited knowledge of how the policies work. Their main responsibility is making sure they properly represent the attributes of the data consumer.
Leave policies untouched.

Snowflake Data Shares with Non-Immuta Users

In this method, Immuta projects can be used to protect and share data with data consumers, even without those users being registered in Immuta.

Using Immuta projects, organizations can create projects and then adjust the equalized entitlements of the project to represent attributes and groups of the data consumer. This allows the project to function as a user, with the data being protected for a particular set of attributes and groups. Once the entitlements have been set, the project owner can enable a project workspace that will create a Snowflake secure view of that policy-protected data that is ready to share with the data consumer. Because of the Immuta project, equalized entitlements, and workspace, the data is restricted to data consumers who possess the relevant attributes and groups.

For a tutorial on this workflow, see the Using Snowflake Data Sharing page.

Requirements

Any Snowflake integration
Immuta attribute based access control (ABAC) data policies

Benefits

Using Immuta project workspaces with Snowflake Data Sharing allows the sharer to

Only need limited knowledge of the context or goals of the existing policies in place: Because the sharer is not editing or creating policies to share their data, they only need a limited knowledge of how the policies work. Their main responsibility is making sure they properly represent the attributes of the data consumer.
Leave policies untouched.
Only share data that the sharer is allowed to see: Users who can create data shares shouldn’t necessarily be the same users who can make policy changes.
Let Immuta create the policy-enforced secure view, ready to share.

Limitations

Project workspaces are generally recommended to allow WRITE access; however, Snowflake's Data Sharing feature does not support WRITE access to shared data.
Actions of the data consumer after the data has been shared are not audited when using project workspaces.

Snowflake Lineage Tag Propagation

Private preview

This feature is only available to select accounts. Reach out to your Immuta representative to enable this feature.

Snowflake column lineage specifies how data flows from source tables or columns to the target tables in write operations. When Snowflake lineage tag propagation is enabled in Immuta, Immuta automatically applies tags added to a Snowflake table to its descendant data source columns in Immuta so you can build policies using those tags to restrict access to sensitive data.

Snowflake Access History tracks user read and write operations. Snowflake column lineage extends this Access History to specify how data flows from source columns to the target columns in write operations, allowing data stewards to understand how sensitive data moves from ancestor tables to target tables so that they can

trace data back to its source to validate the integrity of dashboards and reports,
identify who performed write operations to meet compliance requirements,
evaluate data quality and pinpoint points of failure, and
tag sensitive data on source tables without having tag columns on their descendant tables.

However, tagging sensitive data doesn’t innately protect that data in Snowflake; users need Immuta to disseminate these lineage tags automatically to descendant tables registered in Immuta so data stewards can build policies using the semantic and business context captured by those tags to restrict access to sensitive data. When Snowflake lineage tag propagation is enabled, Immuta propagates tags applied to a data source to its descendant data source columns in Immuta, which keeps your data inventory in Immuta up-to-date and allows you to protect your data with policies without having to manually tag every new Snowflake data source you register in Immuta.

Data flow

An application administrator enables the feature on the Immuta app settings page.
Snowflake lineage metadata (column names and tags) for the Snowflake tables is stored in the metadata database.
A data owner creates a new data source (or adds a new column to a Snowflake table) that initiates a job that applies all tags for each column from its ancestor columns.
A data owner or governor adds a tag to a column in Immuta that has descendants, which initiates a job that propagates the tag to all descendants.
An audit record is created that includes which tags were applied and from which columns those tags originated.

Snowflake access history view and Immuta lineage job

The Snowflake Account Usage ACCESS_HISTORY view contains column lineage information.

To appropriately propagate tags to descendant data sources, Immuta fetches Access History metadata to determine what column tags have been updated, stores this metadata in the Immuta metadata database, and then applies those tags to relevant descendant columns of tables registered in Immuta.

Consider the following example using the Customer, Customer 2, and Customer 3 tables that were all registered in Immuta as data sources.

Customer: source table
Customer 2: descendant of Customer
Customer 3: descendant of Customer 2

If the Discovered.Electronic Mail Address tag is added to the Customer data source in Immuta, that tag will propagate through lineage to the Customer 2 and Customer 3 data sources.

Data source registration

After an application administrator has enabled Snowflake lineage tag propagation, data owners can register data in Immuta and have tags in Snowflake propagated from ancestor tables to descendant data sources. Whenever new tags are added to those tables in Immuta, those upstream tags will propagate to descendant data sources.

By default all tags are propagated, but these tags can be filtered on the app settings page or using the Immuta API.

Managing tags

Lineage tag propagation works with any tag added to the data dictionary. Tags can be manually added, synced from an external catalog, or discovered by SDD. Consider the following example using the Customer, Customer 2, and Customer 3 tables that were all registered in Immuta as data sources.

Customer: source table
Customer 2: descendant of Customer
Customer 3: descendant of Customer 2

Immuta added the Discovered.Electronic Mail Address tag to the Customer data source, and that tag propagated through lineage to the Customer 2 and Customer 3 data sources.

Removing the tag from the Customer 2 table soft deletes it from the Customer 2 data source. When a tag is deleted, downstream lineage tags are removed, unless another parent data source still has that tag. The tag remains visible, but it will not be re-added if a future propagation event specifies the same tag again. Immuta prevents you from removing Snowflake object tags from data sources. You can only remove Immuta-managed tags. To remove Snowflake object tags from tables, you must remove them in Snowflake.

However the Discovered.Electronic Mail Address tag still applies to the Customer 3 data source because Customer still has the tag applied. The only way a tag will be removed from descendant data sources is if no other ancestor of the descendant still prescribes the tag.

If the Snowflake lineage tag propagation feature is disabled, tags will remain on Immuta data sources.

Sensitive data discovery

Sensitive data discovery will still run on data sources and can be manually triggered. Tags applied through sensitive data discovery will propagate as tags added through lineage to descendant Immuta data sources.

Snowflake lineage audit

Immuta audit records include Snowflake lineage tag events when a tag is added or removed.

The example audit record below illustrates the SNOWFLAKE_TAGS.pii tag successfully propagating from the Customer table to Customer 2:

{
  "id": "c8e020cb-232c-4ba9-a0d8-f3a84ba6808d",
  "dateTime": "1670355170336",
  "month": 1475,
  "profileId": 1,
  "userId": "immuta_system_account",
  "dataSourceId": 2,
  "dataSourceName": "Customer 2",
  "count": 1,
  "recordType": "nativeLineageDataSourceTagUpdate",
  "success": true,
  "component": "dataSource",
  "extra": {
    "sourceColumn": {
      "nativeColumnName": "\"MY_DATABASE\".\"PUBLIC\".\"CUSTOMER\".\"C_FIRST_NAME\"",
      "dataSourceId": 1,
      "columnName": "c_first_name"
    },
    "dataSourceId": 2,
    "columnName": "c_first_name",
    "tagPropagationDirection": "downstream",
    "tags": [
      {
        "name": "SNOWFLAKE_TAGS.pii",
        "source": "immuta-us-east-1"
      }
    ]
  },
  "newAuditServiceFields": {
    "actorIp": null,
    "sessionId": null
  },
  "createdAt": "2022-12-06T19:32:50.372Z",
  "updatedAt": "2022-12-06T19:32:50.372Z"
}

Limitations

Without tableFilter set, Immuta will ingest lineage for every table on the Snowflake instance.
Tag propagation based on lineage is not retroactive. For example, if you add a table, add tags to that table, and then run the lineage ingestion job, tags will not get propagated. However, if you add a table, run the lineage ingestion job, and then add tags to the table, the tags will get propagated.
The native lineage job needs to pull in lineage data before any tag is applied in Immuta. When Immuta gets new lineage information from Snowflake, Immuta does not update existing tags in Immuta.
There can be up to a 3-hour delay in Snowflake for a lineage event to make it into the ACCESS_HISTORY view.
Immuta does not ingest lineage information for views.
Snowflake only captures lineage events for CTAS, CLONE, MERGE, and INSERT write operations. Snowflake does not capture lineage events for DROP, RENAME, ADD, or SWAP. Instead of using these latter operations, you need to recreate a table with the same name if you need to make changes.
Immuta cannot enforce coherence of your Snowflake lineage. If a column, table, or schema in the middle of the lineage graph gets dropped, Immuta will not do anything unless a table with that same name gets recreated. This means a table that gets dropped but not recreated could live in Immuta’s system indefinitely.

Snowflake Table Grants

Snowflake table grants simplifies the management of privileges in Snowflake when using Immuta. Instead of having to manually grant users access to tables registered in Immuta, you allow Immuta to manage privileges on your Snowflake tables and views according to subscription policies. Then, users subscribed to a data source in Immuta can view and query the Snowflake table, while users who are not subscribed to the data source cannot view or query the Snowflake table.

Snowflake privileges

Enabling Snowflake table grants gives the following privileges to the Immuta Snowflake role:

MANAGE GRANTS ON ACCOUNT allows the Immuta Snowflake role to grant and revoke SELECT privileges on Snowflake tables and views that have been added as data sources in Immuta.
CREATE ROLE ON ACCOUNT allows for the creation of a Snowflake role for each user in Immuta, enabling fine-grained, attribute-based access controls to determine which tables are available to which individuals.

Table grants role

Since table privileges are granted to roles and not to users in Snowflake, Immuta's Snowflake table grants feature creates a new Snowflake role for each Immuta user. This design allows Immuta to manage table grants through fine-grained access controls that consider the individual attributes of users.

Each Snowflake user with an Immuta account will be granted a role that Immuta manages. The naming convention for this role is <IMMUTA>_USER_<username>, where

<IMMUTA> is the prefix you specified when enabling the feature on the Immuta app settings page.
<username> is the user's Immuta username.

Querying Snowflake tables managed by Immuta

Users are granted access to each Snowflake table or view automatically when they are subscribed to the corresponding data source in Immuta.

Users have two options for querying Snowflake tables that are managed by Immuta:

that Immuta creates and manages. (For example, USE ROLE IMMUTA_USER_<username>. See the for details about the role and name conventions.) If the current active primary role is used to query tables, USAGE on a Snowflake warehouse must be granted to the Immuta-managed Snowflake role for each user.
, which allows users to use the privileges from all roles that they have been granted, including IMMUTA_USER_<username>, in addition to the current active primary role. Users may also set a value for DEFAULT_SECONDARY_ROLES as an on a Snowflake user. To learn more about primary roles and secondary roles in Snowflake, see .

Limitations

If an Immuta instance is connected to an external IAM and that external IAM has a username identical to another username in Immuta's built-in IAM, those users will have the same Snowflake role, leading both to see the same data.

How-to Guides

Installation

Snowflake Governance Features Integration

This page details how to install the for users on Snowflake Enterprise. If you currently use Snowflake Standard, see the .

Snowflake resource names

Use uppercase for the names of the Snowflake resources you create below.

Click the Integrations tab on the app settings page.
Click the +Add Native Integration button and select Snowflake from the dropdown menu.
Complete the Host, Port, and Default Warehouse fields.
Opt to check the Enable Project Workspace box. This will allow for managed write access within Snowflake. Note: Project workspaces still use Snowflake views, so the default role of the account used to create the data sources in the project must be added to the Excepted Roles List. This option is unavailable when is enabled.
Opt to check the Enable Impersonation box and customize the Impersonation Role to allow users to natively impersonate another user. You cannot edit this choice after you configure the integration.
is enabled by default; you can disable it by clicking the Enable Native Query Audit checkbox.
1. Configure the by scrolling to Integrations Settings and find the Snowflake Audit Sync Schedule section.
2. Enter how often, in hours, you want Immuta to ingest audit events from Snowflake as an integer between 1 and 24.
3. Continue with your integration configuration.
Opt to check the Automatically ingest Snowflake object tags box to allow Immuta to automatically import table and column tags from Snowflake.

Select your configuration method

in Snowflake at the account level may cause unexpected behavior of the Snowflake integration in Immuta

The must be set to false (the default setting in Snowflake) at the account level. Changing this value to true causes unexpected behavior of the Snowflake integration.

You have two options for configuring your Snowflake environment:

Automatic setup

Known issue

To configure your Snowflake integration using password-only authentication in the automatic setup option, upgrade to Immuta v2024.2.7 or newer. Otherwise, Immuta will return an error.

Immuta requires temporary, one-time use of credentials with specific permissions.

When performing an automated installation, Immuta requires temporary, one-time use of credentials with the following permissions:

CREATE DATABASE ON ACCOUNT WITH GRANT OPTION
CREATE ROLE ON ACCOUNT WITH GRANT OPTION
CREATE USER ON ACCOUNT WITH GRANT OPTION
MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION
APPLY MASKING POLICY ON ACCOUNT WITH GRANT OPTION
APPLY ROW ACCESS POLICY ON ACCOUNT WITH GRANT OPTION

These permissions will be used to create and configure a new IMMUTA database within the specified Snowflake instance. The credentials are not stored or saved by Immuta, and Immuta doesn’t retain access to them after initial setup is complete.

You can create a new account for Immuta to use that has these permissions, or you can grant temporary use of a pre-existing account. By default, the pre-existing account with appropriate permissions is ACCOUNTADMIN. If you create a new account, it can be deleted after initial setup is complete.

From the Select Authentication Method Dropdown, select one of the following authentication methods:

Username and Password: Complete the Username, Password, and Role fields.
Key Pair Authentication:
1. Complete the Username field.
2. When using a private key, enter the private key file password in the Additional Connection String Options. Use the following format: PRIV_KEY_FILE_PWD=<your_pw>
3. Click Key Pair (Required), and upload a Snowflake key pair file.
4. Complete the Role field.

Manual setup

Best practices: account creation

The account you create for Immuta should only be used for the integration and should not be used as the credentials for creating data sources in Immuta; doing so will cause issues. Instead, create a separate, dedicated READ-ONLY account for creating and registering data sources within Immuta.

Required privileges

The specified role used to run the bootstrap needs to have the following privileges:

CREATE DATABASE ON ACCOUNT WITH GRANT OPTION
CREATE ROLE ON ACCOUNT WITH GRANT OPTION
CREATE USER ON ACCOUNT WITH GRANT OPTION
MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION
APPLY MASKING POLICY ON ACCOUNT WITH GRANT OPTION
APPLY ROW ACCESS POLICY ON ACCOUNT WITH GRANT OPTION

It will create a user called IMMUTA_SYSTEM_ACCOUNT, and grant the following privileges to that user:

APPLY MASKING POLICY ON ACCOUNT
APPLY ROW ACCESS POLICY ON ACCOUNT
Additional grants associated with the IMMUTA database
- GRANT IMPORTED PRIVILEGES ON DATABASE snowflake
- GRANT APPLY TAG ON ACCOUNT

Run the script

Select Manual.
Use the Dropdown Menu to select your Authentication Method:
- Username and password: Enter the Username and Password and set them in the bootstrap script for the Immuta system account credentials.
- Key pair authentication: Upload the Key Pair file and when using a private key, enter the private key file password in the Additional Connection String Options. Use the following format: PRIV_KEY_FILE_PWD=<your_pw>
- Snowflake External OAuth:
  2. Fill out the Token Endpoint. This is where the generated token is sent and is also known as aud (Audience) and iss (Issuer).
  3. Fill out the Client ID. This is the subject of the generated token and is also known as sub (Subject).
  4. Select the method Immuta will use to obtain an access token:
    Certificate:
    Keep the Use Certificate checkbox enabled.
    Opt to fill out the Resource field with a URI of the resource where the requested token will be used.
    Enter the x509 Certificate Thumbprint. This identifies the corresponding key to the token and is often abbreviated as x5t or is called sub (Subject).
    Upload the PEM Certificate, which is the client certificate that is used to sign the authorization request.
    Client secret:
    Uncheck the Use Certificate checkbox.
    Enter the Client Secret (string). Immuta uses this secret to authenticate with the authorization server when it requests a token.
Download, fill out the appropriate fields, and run the bootstrap script linked in the Setup section.

Warning: different accounts

Select available warehouses (optional)

If you enabled a Snowflake workspace, select Warehouses from the dropdown menu that will be available to project owners when creating native Snowflake workspaces. Select from a list of all the warehouses available to the privileged account entered above. Note that any warehouse accessible by the PUBLIC role does not need to be explicitly added.

Select excepted roles and users

Enter the Excepted Roles/User List. Each role or username (both case-sensitive) in this list should be separated by a comma.

Excepted roles/users will have no policies applied to queries.

Any user with the username or acting under the role in this list will have no policies applied to them when querying Immuta protected Snowflake tables in Snowflake. Therefore, this list should be used for service or system accounts and the default role of the account used to create the data sources in the Immuta projects (if you have Snowflake workspace enabled).

Test the connection and save the configuration

Click Test Snowflake Connection.
Once the credentials are successfully tested, click Save and Confirm your changes.

Register data

Legacy Snowflake Integration

Deprecation notice

Support for this integration has been deprecated.

This page details how to install the for users on Snowflake Standard. If you currently use Snowflake Enterprise, see the .

Snowflake resource names

Use uppercase for the names of the Snowflake resources you create below.

Click the App Settings icon in the left sidebar.
Click the Integrations tab.
Click the +Add Native Integration button and select Snowflake from the dropdown menu.
Scroll down and uncheck the box for Snowflake Governance Features.
Scroll back up and complete the Host, Port, and Default Warehouse fields.
Opt to check the Enable Project Workspace box. This will allow for managed Write access within Snowflake.
Opt to check the Enable Impersonation box and customize the Impersonation Role name as needed. This will allow users to natively impersonate another user. Note you cannot edit this choice after you configure the integration.
is enabled by default; you can disable it by clicking the Enable Native Query Audit checkbox.
1. Configure the by scrolling to Integrations Settings and find the Snowflake Audit Sync Schedule section.
2. Enter how often, in hours, you want Immuta to ingest audit events from Snowflake as an integer between 1 and 24.
3. Continue with your integration configuration.
Opt to check the Automatically ingest Snowflake object tags box. This will enable Immuta to automatically import table and column tags from Snowflake. Note this feature requires an Enterprise Edition of Snowflake.
You have two options for installing the Snowflake and Snowflake Workspace access patterns: automatic or manual setup.

Known issue

To configure your Snowflake integration using password-only authentication in the automatic setup option, upgrade to Immuta v2024.2.7 or newer. Otherwise, Immuta will return an error.

Immuta requires temporary, one-time use of credentials with specific permissions.

When performing an automated installation, Immuta requires temporary, one-time use of credentials with the following permissions:

CREATE DATABASE ON ACCOUNT WITH GRANT OPTION
CREATE ROLE ON ACCOUNT WITH GRANT OPTION
CREATE USER ON ACCOUNT WITH GRANT OPTION
MANAGE GRANTS ON ACCOUNT

Alternatively, you can create the IMMUTA database within the specified Snowflake instance manually using the Manual Setup option.

From the Select Authentication Method Dropdown, select either Username and Password or Key Pair Authentication:
- Username and Password: Fill out the Username, Password, and Role fields.
- Key Pair Authentication:
  1. Complete the Username field.
  2. When using a private key, enter the private key file password in the Additional Connection String Options. Use the following format: PRIV_KEY_FILE_PWD=<your_pw>
  3. Click Key Pair (Required), and upload a Snowflake key pair file.
  4. Complete the Role field.

Best Practices: Account Creation

The account you create for Immuta should only be used for the integration and should NOT be used as the credentials when creating data sources within Immuta. This will cause issues.

Create a dedicated READ-ONLY account for creating and registering data sources within Immuta. This account should also not be the account used to configure the integration.

The specified role used to run the bootstrap needs to have the following privileges:

CREATE DATABASE ON ACCOUNT WITH GRANT OPTION
CREATE ROLE ON ACCOUNT WITH GRANT OPTION
CREATE USER ON ACCOUNT WITH GRANT OPTION
MANAGE GRANTS ON ACCOUNT

Warning: Different Accounts

Download and run the bootstrap script linked in the Setup section. Take note of the username and password used in the script.
Use the Dropdown Menu to select your Authentication Method:
- Username and Password: Enter the Username and Password that were that were set in the bootstrap script for the Immuta System Account Credentials.
- Key Pair Authentication: Upload the Key Pair file and when using a private key, enter the private key file password in the Additional Connection String Options. Use the following format: PRIV_KEY_FILE_PWD=<your_pw>

If you enabled a Snowflake workspace, select Warehouses from the dropdown menu that will be available to project owners when creating native Snowflake workspaces. Select from a list of all the warehouses available to the privileged account entered above. Note that any warehouse accessible by the PUBLIC role does not need to be explicitly added.
Click Test Snowflake Connection.
Once the credentials are successfully tested, click Save.

Now that Snowflake has been enabled, all future Snowflake data sources will also be created natively within the immuta database of the linked Snowflake instance. In addition to creating views, Immuta will also periodically sync user metadata to a system table within the Snowflake instance.

Register data

Migration

Snowflake Governance Features Integration Migration

Migration troubleshooting

If multiple Snowflake integrations are enabled, they will all migrate together. If one fails, they will all revert to the Snowflake Standard integration.
If an error occurs during migration and the integration cannot be reverted, the integration must be disabled and re-enabled.

Click the App Settings icon in the left sidebar.
Click Preview Features in the left panel.
Scroll to the Native Snowflake Governance Controls modal and check the checkbox.
Using the credentials entered to enable the Snowflake integration, fill out the Username and Password or Key Pair.
Click Save.
Click Confirm.

Snowflake Table Grants Migration

To migrate from the private preview version of table grants (available before September 2022) to the GA version, complete the steps below.

Navigate to the App Settings page.
Click Integration Settings in the left panel, and scroll to the Global Integration Settings section.
Uncheck the Snowflake Table Grants checkbox to disable the feature.
Click Save. Wait for about 1 minute per 1000 users. This gives time for Immuta to drop all the previously created user roles.
Use the Enable Snowflake table grants tutorial to re-enable the feature.

Edit or Remove a Snowflake Integration

To edit or remove a Snowflake integration, you have two options:

Automatic: Grant Immuta one-time use of credentials to automatically edit or remove the integration.
The credentials provided must have the following permissions:
- CREATE DATABASE ON ACCOUNT WITH GRANT OPTION
- CREATE ROLE ON ACCOUNT WITH GRANT OPTION
- CREATE USER ON ACCOUNT WITH GRANT OPTION
- MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION
Manual: Run the Immuta script in your Snowflake environment yourself to edit or remove the integration.
The specified role used to run the bootstrap needs to have the following privileges:
- CREATE DATABASE ON ACCOUNT WITH GRANT OPTION
- CREATE ROLE ON ACCOUNT WITH GRANT OPTION
- CREATE USER ON ACCOUNT WITH GRANT OPTION
- MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION
- APPLY MASKING POLICY ON ACCOUNT WITH GRANT OPTION
- APPLY ROW ACCESS POLICY ON ACCOUNT WITH GRANT OPTION

Edit a Snowflake integration

Select one of the following options for editing your integration:

Automatic: Grant Immuta one-time use of credentials to automatically edit the integration.
Manual: Run the Immuta script in your Snowflake environment yourself to edit the integration.

Automatic edit

Click the App Settings icon in the left sidebar.
Click the Integrations tab and click the down arrow next to the Snowflake integration.
Edit the field you want to change or check a checkbox of a feature you would like to enable. Note any field shadowed is not editable, and the integration must be disabled and re-installed to change it.
From the Select Authentication Method Dropdown, select either Username and Password or Key Pair Authentication:
- Username and Password option: Complete the Username, Password, and Role fields.
- Key Pair Authentication option:
  1. Complete the Username field.
  2. When using a private key, enter the private key file password in the Additional Connection String Options. Use the following format: PRIV_KEY_FILE_PWD=<your_pw>
  3. Click Key Pair (Required), and upload a Snowflake key pair file.
  4. Complete the Role field.
Click Save.

Manual edit

Click the App Settings icon in the left sidebar.
Click the Integrations tab and click the down arrow next to the Snowflake integration.
Edit the field you want to change or check a checkbox of a feature you would like to enable. Note any field shadowed is not editable, and the integration must be disabled and re-installed to change it.
Download the Edit Script and run it in Snowflake.
Click Save.

Remove a Snowflake integration

Select one of the following options for deleting your integration:

Automatic: Grant Immuta one-time use of credentials to automatically remove the integration and Immuta-managed resources from your Snowflake environment.
Manual: Run the Immuta script in your Snowflake environment yourself to remove Immuta-managed resources and policies from Snowflake.

Automatic removal

Click the App Settings icon in the left sidebar.
Click the Integrations tab and click the down arrow next to the Snowflake integration.
Click the checkbox to disable the integration.
Enter the Username, Password, and Role that was entered when the integration was configured.
Click Validate Credentials.
Click Save.

Manual removal

Click the App Settings icon in the left sidebar.
Click the Integrations tab and click the down arrow next to the Snowflake integration.
Click the checkbox to disable the integration.
Download the Cleanup Script.
Click Save.
Run the cleanup script in Snowflake.

Integration Settings

Enable Snowflake Table Grants

Navigate to the App Settings page.
Scroll to the Global Integration Settings section.
Ensure the Snowflake Governance Features checkbox is checked. It is enabled by default.
Ensure the Snowflake Table Grants checkbox is checked. It is enabled by default.
Opt to change the Role Prefix. Snowflake table grants creates a new Snowflake role for each Immuta user. To ensure these Snowflake role names do not collide with existing Snowflake roles, each Snowflake role created for Snowflake table grants requires a common prefix. When using multiple Immuta accounts within a single Snowflake account, the Snowflake table grants role prefix should be unique for each Immuta account. The prefix must adhere to Snowflake identifier requirements and be less than 50 characters. Once the configuration is saved, the prefix cannot be modified; however, the Snowflake table grants feature can be disabled and re-enabled to change the prefix.
Finish configuring your integration by following one of these guidelines:
- New Snowflake integration: Set up a new Snowflake integration by following the configuration tutorial.
- Existing Snowflake integration (automatic setup): You will be prompted to enter connection information for a Snowflake user. Immuta will execute the migration to Snowflake table grants using a connection established with this Snowflake user. The Snowflake user you provide here must have Snowflake privileges to run these privilege grants.
- Existing Snowflake integration (manual setup): Immuta will display a link to a migration script you must run in Snowflake and a link to a rollback script for use in the event of a failed migration. Important: Execute the migration script in Snowflake before clicking Save on the app settings page.

Snowflake table grants private preview migration

To migrate from the private preview version of Snowflake table grants (available before September 2022) to the generally available version of Snowflake table grants, follow the steps in the migration guide.

Use Snowflake Data Sharing with Immuta

Immuta is compatible with . Using both Immuta and Snowflake, organizations can share the policy-protected data of their Snowflake database with other Snowflake accounts with Immuta policies enforced in real time. See below for instructions on using and .

Workflow with Immuta Users (Public Preview)

Prerequisites:

1 - Create Immuta Policies to Protect the Data

Required Permission: Immuta: GOVERNANCE

to fit your organization's compliance requirements.

2 - Register the Snowflake Data Consumer with Immuta

Required Permission: Immuta: USER_ADMIN

To register the Snowflake data consumer in Immuta,

.
to match the account ID for the data consumer. This value is the output on the data consumer side when SELECT CURRENT_ACCOUNT() is run in Snowflake.
for your organization's policies.
.

Required Permission: Snowflake: ACCOUNTADMIN

To share the policy-protected data source,

Grant reference usage on the Immuta database to the share you created:
Replace the content in angle brackets above with the name of your Immuta database and Snowflake data share.

Workflow with Non-Immuta Users

Prerequisites:

Use Case

As you follow this tutorial, these callouts will have examples centered around the same use case and will further explain the steps necessary to meet the following compliance requirement:

Compliance Requirement: Users can only see data from their country.

1 - Create Immuta Policies

Use Case: Create Policies

The Immuta user will create a global data policy that restricts the rows users can see based on their attributes, which identify their country. In the example below, users with the attribute Country.JP would only see rows that have JP as a value in the CREDIT POINT OF SALE column.

Required Permission: Immuta: GOVERNANCE

2 - Create an Immuta Project

Use Case: Create Project

The Immuta user will create a project for the data share. In the example below, the user creates a Japan Data Share project that will only be shared with data consumers in Japan.

Required Permission: Immuta: CREATE_PROJECT

Use Case

Because data consumers have the attribute "Country.JP", this will be the equalized entitlement added to the project. The Immuta user editing the equalized entitlement must also have the attribute "Country.JP" to ensure they have access to the data they will share.

Required Permission: Immuta: CREATE_PROJECT or PROJECT_MANAGEMENT

Required Permission: Snowflake: ACCOUNTADMIN

The commands run in Snowflake should look similar to this:

Snowflake Lineage Tag Propagation

Private preview

This feature is only available to select accounts. Reach out to your Immuta representative to enable this feature.

Requirement

Snowflake Enterprise Edition

Prerequisite

Contact your Immuta representative to enable this feature in your Immuta tenant.

Configure the Snowflake integration

Navigate to the App Setting page and click the Integration tab.
Click +Add Native Integration and select Snowflake from the dropdown menu.
Complete the Host, Port, and Default Warehouse fields.
Enable Native Query Audit.
Enable Native Lineage and complete the following fields:
- Ingest Batch Sizes: This setting configures the number of rows Immuta ingests per batch when streaming Access History data from your Snowflake instance.
- Table Filter: This filter determines which tables Immuta will ingest lineage for. Enter a regular expression that excludes / from the beginning and end to filter tables. Without this filter, Immuta will attempt to ingest lineage for every table on your Snowflake instance.
- Tag Filter: This filter determines which tags to propagate using lineage. Enter a regular expression that excludes / from the beginning and end to filter tags. Without this filter, Immuta will ingest lineage for every tag on your Snowflake instance.
Opt to enable Automatically ingest Snowflake object tags.
Select Manual or Automatic Setup and follow the steps in this guide to configure the Snowflake integration

Trigger Snowflake lineage sync job

Prerequisite

Authenticate with the Immuta API.

Trigger the lineage job

The Snowflake lineage sync endpoint triggers the native lineage ingestion job that allows Immuta to propagate Snowflake tags added through lineage to Immuta data sources.

Copy the example and replace the Immuta URL and API key with your own.
Change the payload attribute values to your own, where
- tableFilter (string): This regular expression determines which tables Immuta will ingest lineage for. Enter a regular expression that excludes / from the beginning and end to filter tables. Without this filter, Immuta will attempt to ingest lineage for every table on your Snowflake instance.
- batchSize (integer): This parameter configures the number of rows Immuta ingests per batch when streaming Access History data from your Snowflake instance. Minimum 1.
- lastTimestamp (string): Setting this parameter will only return lineage events later than the value provided. Use a format like 2022-06-29T09:47:06.012-07:00.
```
curl -X 'POST' \
    'https://www.organization.immuta.com/lineage/ingest/snowflake' \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -H 'Authorization: 846e9e43c86a4ct1be14290d95127d13f' \
    -d '{
    "tableFilter": "MY_DATABASE\\MY_SCHEMA\\..*",
    "batchSize": 1,
    "lastTimestamp": "2022-06-29T09:47:06.012-07:00"
    }'
```

Next steps

Once the sync job is complete, you can complete the following steps:

Snowflake Low Row Access Policy Mode

Upgrade Snowflake Low Row Access Policy Mode

Prerequisites

The steps outlined on this page are necessary if you meet both of the following criteria:

You have the Snowflake low row access policy mode enabled in private preview.
You have user impersonation enabled.

If you do not meet this criteria, follow the instructions on the configuration guide.

Upgrade to Snowflake low row access policy mode

To upgrade to generally available version of the feature, either

disable your Snowflake integration on the app settings page and then re-enable it, OR
disable Snowflake low row access policy mode on the app settings page and re-enable it.

Databricks Unity Catalog

Databricks Unity Catalog allows you to manage and access data in your Databricks account across all of your workspaces. With Immuta’s Databricks Unity Catalog integration, you can write your policies in Immuta and have them enforced automatically by Databricks across data in your Unity Catalog metastore.

Permissions

APPLICATION_ADMIN Immuta permission for the user configuring the integration in Immuta.
Databricks privileges:
- An account with the CREATE CATALOG privilege on the Unity Catalog metastore to create an Immuta-owned catalog and tables. For automatic setups, this privilege must be granted to the Immuta system account user. For manual setups, the user running the Immuta script must have this privilege.
- An Immuta system account user requires the following Databricks privileges:
  - OWNER permission on the Immuta catalog you configure.
  - OWNER permission on catalogs with schemas and tables registered as Immuta data sources so that Immuta can administer Unity Catalog row-level and column-level security controls. This permission can be applied by granting OWNER on a catalog to a Databricks group that includes the Immuta system account user to allow for multiple owners. If the OWNER permission cannot be applied at the catalog- or schema-level, each table registered as an Immuta data source must individually have the OWNER permission granted to the Immuta system account user.
  - USE CATALOG and USE SCHEMA on parent catalogs and schemas of tables registered as Immuta data sources so that the Immuta system account user can interact with those tables.
  - SELECT and MODIFY on all tables registered as Immuta data sources so that the system account user can grant and revoke access to tables and apply Unity Catalog row- and column-level security controls.
  - USE CATALOG on the system catalog for native query audit.
  - USE SCHEMA on the system.access schema for native query audit.
  - SELECT on the following system tables for native query audit:
    system.access.audit
    system.access.table_lineage
    system.access.column_lineage

Requirements

Before you configure the Databricks Unity Catalog integration, ensure that you have fulfilled the following requirements:

Unity Catalog metastore created and attached to a Databricks workspace. Immuta supports configuring a single metastore for each configured integration, and that metastore may be attached to multiple Databricks workspaces.
Unity Catalog enabled on your Databricks cluster or SQL warehouse. All SQL warehouses have Unity Catalog enabled if your workspace is attached to a Unity Catalog metastore. Immuta recommends linking a SQL warehouse to your Immuta instance rather than a cluster for both performance and availability reasons.
Personal access token generated for the user that Immuta will use to manage policies in Unity Catalog.
No Databricks SQL integrations are configured in your Immuta instance. The Databricks Unity Catalog integration replaces the Databricks SQL integration entirely and cannot coexist with it. If there are configured Databricks SQL integrations, remove them and add a Databricks Unity Catalog integration in its place. Databricks data sources will also need to be migrated if they are defined in the hive_metastore catalog.
No Databricks Spark integrations with Unity Catalog support are configured in your Immuta instance. Immuta does not support that integration and the Databricks Unity Catalog integration concurrently. See the Unity Catalog overview for supported cluster configurations.
Unity Catalog system tables enabled for native query audit.

Best practices

Ensure your integration with Unity Catalog goes smoothly by following these guidelines:

Use a Databricks SQL warehouse to configure the integration. Databricks SQL warehouses are faster to start than traditional clusters, require less management, and can run all the SQL that Immuta requires for policy administration. A serverless warehouse provides nearly instant startup time and is the preferred option for connecting to Immuta.
Move all data into Unity Catalog before configuring Immuta with Unity Catalog. The default catalog used once Unity Catalog support is enabled in Immuta is the hive_metastore, which is not supported by the Unity Catalog native integration. Data sources in the Hive Metastore must be managed by the Databricks Spark integration. Existing data sources will need to be re-created after they are moved to Unity Catalog and the Unity Catalog integration is configured.

Migrate data to Unity Catalog

Disable existing Databricks SQL and Databricks Spark with Unity Catalog Support integrations.
Ensure that all Databricks clusters that have Immuta installed are stopped and the Immuta configuration is removed from the cluster. Immuta-specific cluster configuration is no longer needed with the Databricks Unity Catalog integration.
Move all data into Unity Catalog before configuring Immuta with Unity Catalog. Existing data sources will need to be re-created after they are moved to Unity Catalog and the Unity Catalog integration is configured. If you don't move all data before configuring the integration, metastore magic will protect your existing data sources throughout the migration process.

Configure the Databricks Unity Catalog integration

Existing data source migration

If you have existing Databricks data sources, complete these migration steps before proceeding.

You have two options for configuring your Databricks Unity Catalog integration:

Automatic setup: Immuta creates the catalogs, schemas, tables, and functions using the integration's configured personal access token.
Manual setup: Run the Immuta script in Databricks yourself to create the catalog. You can also modify the script to customize your storage location for tables, schemas, or catalogs.

Automatic setup

Required permissions

When performing an automatic setup, the Databricks personal access token you configure below must be attached to an account with the following permissions for the metastore associated with the specified Databricks workspace:

USE CATALOG and USE SCHEMA on parent catalogs and schemas of tables registered as Immuta data sources so that the Immuta system account user can interact with those tables.
SELECT and MODIFY on all tables registered as Immuta data sources so that the system account user can grant and revoke access to tables and apply Unity Catalog row- and column-level security controls.
OWNER permission on the Immuta catalog created below.
OWNER permission on catalogs with schemas and tables registered as Immuta data sources so that Immuta can administer Unity Catalog row-level and column-level security controls. This permission can be applied by granting OWNER on a catalog to a Databricks group that includes the Immuta system account user to allow for multiple owners. If the OWNER permission cannot be applied at the catalog- or schema-level, each table registered as an Immuta data source must individually have the OWNER permission granted to the Immuta system account user.
CREATE CATALOG on the workspace metastore.
USE CATALOG on the system catalog for native query audit.
USE SCHEMA on the system.access schema for native query audit.
SELECT on the following system tables for native query audit:
- system.access.audit
- system.access.table_lineage
- system.access.column_lineage

Click the App Settings icon in the left sidebar.
Scroll to the Global Integration Settings section and check the Enable Databricks Unity Catalog support in Immuta checkbox. The additional settings in this section are only relevant to the Databricks Spark with Unity Catalog integration and will not have any effect on the Unity Catalog integration. These can be left with their default values.
Click the Integrations tab.
Click + Add Native Integration and select Databricks Unity Catalog from the dropdown menu.
Complete the following fields:
- Server Hostname is the hostname of your Databricks workspace.
- HTTP Path is the HTTP path of your Databricks cluster or SQL warehouse.
- Immuta Catalog is the name of the catalog Immuta will create to store internal entitlements and other user data specific to Immuta. This catalog will only be readable for the Immuta service principal and should not be granted to other users. The catalog name may only contain letters, numbers, and underscores and cannot start with a number.
If using a proxy server with Databricks Unity Catalog, click the Enable Proxy Support checkbox and complete the Proxy Host and Proxy Port fields. The username and password fields are optional.
Opt to fill out the Exemption Group field with the name of a group in Databricks that will be excluded from having data policies applied and must not be changed from the default value. Create this account-level group for privileged users and service accounts that require an unmasked view of data before configuring the integration in Immuta.
Unity Catalog query audit is enabled by default; you can disable it by clicking the Enable Native Query Audit checkbox. Ensure you have enabled system tables in Unity Catalog and provided the required access to the Immuta system account.
1. Configure the audit frequency by scrolling to Integrations Settings and find the Unity Catalog Audit Sync Schedule section.
2. Enter how often, in hours, you want Immuta to ingest audit events from Unity Catalog as an integer between 1 and 24.
3. Continue with your integration configuration.
Enter a Databricks Personal Access Token. This is the access token for the Immuta service principal. This service principal must have the metastore privileges listed above for the metastore associated with the Databricks workspace. If this token is configured to expire, update this field regularly for the integration to continue to function.
Click Test Databricks Unity Catalog Connection.
Save and Confirm your changes.

Manual setup

Required permissions

When performing a manual setup, the following Databricks permissions are required:

The user running the script must have the CREATE CATALOG permission on the workspace metastore.
The Databricks personal access token you configure below must be attached to an account with the following permissions:
- USE CATALOG and USE SCHEMA on parent catalogs and schemas of tables registered as Immuta data sources so that the Immuta system account user can interact with those tables.
- SELECT and MODIFY on all tables registered as Immuta data sources so that the system account user can grant and revoke access to tables and apply Unity Catalog row- and column-level security controls.
- OWNER permission on the Immuta catalog created below.
- OWNER permission on catalogs with schemas and tables registered as Immuta data sources so that Immuta can administer Unity Catalog row-level and column-level security controls. This permission can be applied by granting OWNER on a catalog to a Databricks group that includes the Immuta system account user to allow for multiple owners. If the OWNER permission cannot be applied at the catalog- or schema-level, each table registered as an Immuta data source must individually have the OWNER permission granted to the Immuta system account user.
- USE CATALOG on the system catalog for native query audit.
- USE SCHEMA on the system.access schema for native query audit.
- SELECT on the following system tables for native query audit:
  - system.access.audit
  - system.access.table_lineage
  - system.access.column_lineage

Click the App Settings icon in the left sidebar.
Scroll to the Global Integration Settings section and check the Enable Databricks Unity Catalog support in Immuta checkbox. The additional settings in this section are only relevant to the Databricks Spark with Unity Catalog integration and will not have any effect on the Unity Catalog integration. These can be left with their default values.
Click the Integrations tab.
Click + Add Native Integration and select Databricks Unity Catalog from the dropdown menu.
Complete the following fields:
- Server Hostname is the hostname of your Databricks workspace.
- HTTP Path is the HTTP path of your Databricks cluster or SQL warehouse.
- Immuta Catalog is the name of the catalog Immuta will create to store internal entitlements and other user data specific to Immuta. This catalog will only be readable for the Immuta service principal and should not be granted to other users. The catalog name may only contain letters, numbers, and underscores and cannot start with a number.
If using a proxy server with Databricks Unity Catalog, click the Enable Proxy Support checkbox and complete the Proxy Host and Proxy Port fields. The username and password fields are optional.
Opt to fill out the Exemption Group field with the name of a group in Databricks that will be excluded from having data policies applied and must not be changed from the default value. Create this account-level group for privileged users and service accounts that require an unmasked view of data before configuring the integration in Immuta.
Unity Catalog query audit is enabled by default; you can disable it by clicking the Enable Native Query Audit checkbox. Ensure you have enabled system tables in Unity Catalog and provided the required access to the Immuta system account.
1. Configure the audit frequency by scrolling to Integrations Settings and find the Unity Catalog Audit Sync Schedule section.
2. Enter how often, in hours, you want Immuta to ingest audit events from Unity Catalog as an integer between 1 and 24.
3. Continue with your integration configuration.
Enter a Databricks Personal Access Token. This is the access token for the Immuta service principal. This service principal must have the metastore privileges listed above for the metastore associated with the Databricks workspace. If this token is configured to expire, update this field regularly for the integration to continue to function.
Select the Manual toggle and copy or download the script. You can modify the script to customize your storage location for tables, schemas, or catalogs.
Run the script in Databricks.
Click Test Databricks Unity Catalog Connection.
Save and Confirm your changes.

Enable native query audit for Unity Catalog

To enable native query audit for Unity Catalog, complete the following steps before configuring the integration:

Enable a system schema where the <SCHEMA_NAME> is access.
Grant your Immuta system account user access to the Databricks Unity Catalog system tables. For Databricks Unity Catalog audit to work, Immuta must have, at minimum, the following access.
- USE CATALOG on the system catalog
- USE SCHEMA on the system.access schema
- SELECT on the following system tables:
  - system.access.audit
  - system.access.table_lineage
  - system.access.column_lineage
Enable verbose audit logs in Unity Catalog.
Use the Databricks Personal Access Token in the configuration above for the account you just granted system table access. This account will be the Immuta system account user.

Register your data

External data connectors and query-federated tables are preview features in Databricks. See the Databricks documentation for details about the support and limitations of these features before registering them as data sources in the Unity Catalog integration.

Protect your data

Map Databricks usernames to Immuta to ensure Immuta properly enforces policies and audits user queries.
Build global policies in Immuta to enforce table-, column-, and row-level security.

Unity Catalog Integration Reference

Databricks Unity Catalog allows you to manage and access data in your Databricks account across all of your workspaces and introduces fine-grained access controls in Databricks.

Immuta’s integration with Unity Catalog allows you to manage multiple Databricks workspaces through Unity Catalog while protecting your data with Immuta policies. Instead of manually creating UDFs or granting access to each table in Databricks, you can author your policies in Immuta and have Immuta manage and enforce Unity Catalog access-control policies on your data in Databricks clusters or SQL warehouses:

Subscription policies: Immuta subscription policies automatically grant and revoke access to Databricks tables.
Data policies: Immuta data policies enforce row- and column-level security without creating views, so users can query tables as they always have without their workflows being disrupted.

Unity Catalog object model

Unity Catalog uses the following hierarchy of data objects:

Metastore: Created at the account level and is attached to one or more Databricks workspaces. The metastore contains metadata of all the catalogs, schemas, and tables available to query. All clusters on that workspace use the configured metastore and all workspaces that are configured to use a single metastore share those objects.
Catalog: A catalog sits on top of schemas (also called databases) and tables to manage permissions across a set of schemas.
Schema: Organizes tables and views.
Table: Tables can be managed or external tables.

For details about the Unity Catalog object model, see the Databricks Unity Catalog documentation.

Feature support

The Databricks Unity Catalog integration supports

managing and accessing data across multiple Databricks workspaces
enforcing Unity Catalog row-, column-, and table-level access controls on Databricks clusters and SQL warehouses:
- applying column masking and row-redaction policies on tables
- applying subscription polices on tables and views
enforcing Unity Catalog access controls, even if Immuta becomes disconnected
auditing activity of both Immuta users and non-Immuta users
Delta and Parquet files
allowing non-Immuta reads and writes
using Photon
using a proxy server

Architecture

Unity Catalog supports managing permissions at the Databricks account level through controls applied directly to objects in the metastore. To interact with the metastore and apply controls to any table, Immuta requires a personal access token (PAT) for an Immuta system account user with permissions to manage all data protected by Immuta. See the permissions requirements section for a list of specific Databricks privileges.

Immuta uses this Immuta system account user to run queries that set up all the tables, user-defined functions (UDFs), and other data necessary for policy enforcement. Upon enabling the native integration, Immuta will create a catalog named after your provided workspaceName that contains two schemas:

immuta_system: Contains internal Immuta data.
immuta_policies: Contains policy UDFs.

When policies require changes to be pushed to Unity Catalog, Immuta updates the internal tables in the immuta_system schema with the updated policy information. If necessary, new UDFs are pushed to replace any out-of-date policies in the immuta_policies schema and any row filters or column masks are updated to point at the new policies. Many of these operations require compute on the configured Databricks cluster or SQL endpoint, so compute must be available for these policies to succeed.

Policy enforcement

Immuta’s Unity Catalog integration applies Databricks table-, row-, and column-level security controls that are enforced natively within Databricks. Immuta's management of these Databricks security controls is automated and ensures that they synchronize with Immuta policy or user entitlement changes.

Table-level security: Immuta manages REVOKE and GRANT privileges on securable objects in Databricks through subscription policies. When you create a subscription policy in Immuta, Immuta uses the Unity Catalog API to issue GRANTS or REVOKES against the catalog, schema, or table in Databricks for every user affected by that subscription policy.
Row-level security: Immuta applies SQL UDFs to restrict access to rows for querying users.
Column-level security: Immuta applies column-mask SQL UDFs to tables for querying users. These column-mask UDFs run for any column that requires masking.

The Unity Catalog integration supports the following policy types:

Subscription policies
Select masking policies
- Conditional masking
- Constant
- Custom masking
- Hashing
- Null
- Regex: You must use the global regex flag (g) when creating a regex masking policy in this integration. You cannot use the case insensitive regex flag (i) when creating a regex masking policy in this integration. See the limitations section for examples.
- Rounding (date and numeric rounding)
Row-level policies
- Matching (only show rows where)
  - Custom WHERE
  - Never
  - Where user
  - Where value in column
- Minimization
- Time-based restrictions

Policy exemption groups

Some users may need to be exempt from masking and row-level policy enforcement. When you add user accounts to the configured exemption group in Databricks, Immuta will not enforce policies for those users. Exemption groups are created when the Unity Catalog integration is configured, and no policies will apply to these users' queries, despite any policies enforced on the tables they query.

The principal used to register data sources in Immuta will be automatically added to this exemption group for that Databricks table. Consequently, users added to this list and used to register data sources in Immuta should be limited to service accounts.

Policy support with `hive_metastore`

When enabling Unity Catalog support in Immuta, the catalog for all Databricks data sources will be updated to point at the default hive_metastore catalog. Internally, Databricks exposes this catalog as a proxy to the workspace-level Hive metastore that schemas and tables were kept in before Unity Catalog. Since this catalog is not a real Unity Catalog catalog, it does not support any Unity Catalog policies. Therefore, Immuta will ignore any data sources in the hive_metastore in any Databricks Unity Catalog integration, and policies will not be applied to tables there.

However, with Databricks metastore magic you can use hive_metastore and enforce subscription and data policies with the Databricks Spark integration.

Authentication method

The Databricks Unity Catalog integration supports the access token method to configure the integration and create data sources in Immuta. This is the access token for the Immuta service principal. This service principal must have the metastore privileges listed in the permissions section for the metastore associated with the Databricks workspace. If this token is configured to expire, update this field regularly for the integration to continue to function.

Immuta data sources in Unity Catalog

The Unity Catalog data object model introduces a 3-tiered namespace, as outlined above. Consequently, your Databricks tables registered as data sources in Immuta will reference the catalog, schema (also called a database), and table.

External data connectors and query-federated tables

Native query audit

Access requirements

For Databricks Unity Catalog audit to work, Immuta must have, at minimum, the following access.

USE CATALOG on the system catalog
USE SCHEMA on the system.access schema
SELECT on the following system tables:
- system.access.audit
- system.access.table_lineage
- system.access.column_lineage

The Databricks Unity Catalog integration audits user queries run in clusters or SQL warehouses for deployments configured with the Databricks Unity Catalog integration. The audit ingest is set when configuring the integration and the audit logs can be scoped to only ingest specific workspaces if needed.

See the Unity Catalog native audit page for details about manually prompting ingest of audit logs and the contents of the logs.

Configuration requirements

See the Enable Unity Catalog guide for a list of requirements.

Supported Databricks cluster configurations

The table below outlines the integrations supported for various Databricks cluster configurations. For example, the only integration available to enforce policies on a cluster configured to run on Databricks Runtime 9.1 is the Databricks Spark integration.

Legend:

Unity Catalog caveats

Unity Catalog row- and column-level security controls are unsupported for single-user clusters. See the Databricks documentation for details about this limitation.
Row access policies with more than 1023 columns are unsupported. This is an underlying limitation of UDFs in Databricks. Immuta will only create row access policies with the minimum number of referenced columns. This limit will therefore apply to the number of columns referenced in the policy and not the total number in the table.
If you disable table grants, Immuta revokes the grants. Therefore, if users had access to a table before enabling Immuta, they’ll lose access.
You must use the global regex flag (g) when creating a regex masking policy in this integration, and you cannot use the case insensitive regex flag (i) when creating a regex masking policy in this integration. See the examples below for guidance:
- regex with a global flag (supported): /^ssn|social ?security$/g
- regex without a global flag (unsupported): /^ssn|social ?security$/
- regex with a case insensitive flag (unsupported): /^ssn|social ?security$/gi
- regex without a case insensitive flag (supported): /^ssn|social ?security$/g

Azure Databricks Unity Catalog limitation

If a registered data source is owned by a Databricks group at the table level, then the Unity Catalog integration cannot apply data masking policies to that table in Unity Catalog.

Therefore, set all table-level ownership on your Unity Catalog data sources to an individual user or service principal instead of a Databricks group. Catalogs and schemas can still be owned by a Databricks group, as ownership at that level doesn't interfere with the integration.

Feature limitations

The following features are currently unsupported:

Databricks change data feed support
Immuta projects
Multiple IAMs on a single cluster
Column masking policies on views
Mixing masking policies on the same column
Row-redaction policies on views
R and Scala cluster support
Scratch paths
User impersonation
Policy enforcement on raw Spark reads
Python UDFs for advanced masking functions
Direct file-to-SQL reads
Data policies on ARRAY, MAP, or STRUCT type columns

Known issue

Snippets for Databricks data sources may be empty in the Immuta UI.

Configure the Databricks Unity Catalog integration.

Migrate to Unity Catalog

When you enable Unity Catalog, Immuta automatically migrates your existing Databricks data sources in Immuta to reference the legacy hive_metastore catalog to account for Unity Catalog's . New data sources will reference the Unity Catalog metastore you create and attach to your Databricks workspace.

Because the hive_metastore catalog is not managed by Unity Catalog, existing data sources in the hive_metastore cannot have Unity Catalog access controls applied to them. .

To allow Immuta to administer Unity Catalog access controls on that data, move the data to Unity Catalog and re-register those tables in Immuta by completing the steps below. If you don't move all data before configuring the integration, will protect your existing data sources throughout the migration process.

Disable all existing Databricks Spark integrations with Unity Catalog support or Databricks SQL integrations. Note: Immuta supports running the Databricks Spark integration with the Unity Catalog integration concurrently, so Databricks Spark integrations do not have to be disabled before migrating to Unity Catalog.
Ensure that all Databricks clusters that have Immuta installed are stopped and the Immuta configuration is removed from the cluster. Immuta-specific cluster configuration is no longer needed with the Databricks Unity Catalog integration.
Move all data into Unity Catalog before configuring Immuta with Unity Catalog. Existing data sources will need to be re-created after they are moved to Unity Catalog and the Unity Catalog integration is configured.
.

Databricks Spark

Audience: Data Owners and Data Users
Content Summary: This page provides an overview of the Databricks integration. For installation instructions, see the and the .

Overview

Databricks is a plugin integration with Immuta. This integration allows you to protect access to tables and manage row-, column-, and cell-level controls without enabling table ACLs or credential passthrough. Policies are applied to the plan that Spark builds for a user's query and enforced live on-cluster.

Architecture

An Application Admin will configure Databricks with either the

on the Immuta App Settings page
where Immuta artifacts must be downloaded and staged to your Databricks clusters

In both configuration options, the Immuta init script adds the Immuta plugin in Databricks: the Immuta Security Manager, wrappers, and Immuta analysis hook plan rewrite. Once an administrator gives users Can Attach To entitlements on the cluster, they can query Immuta-registered data source directly in their Databricks notebooks.

Simplified Databricks Configuration Additional Entitlements

The credentials used to do the Simplified Databricks configuration with automatic cluster policy push must have the following entitlement:

Allow cluster creation

This will give Immuta temporary permission to push the cluster policies to the configured Databricks workspace and overwrite any cluster policy templates previously applied to the workspace.

Policy Enforcement

Immuta Best Practices: Test User

Test the integration on an Immuta-enabled cluster with a user that is not a Databricks administrator.

Registering Data Sources

Table Access

The `immuta` Database

When a table is registered in Immuta as a data source, users can see that table in the native Databricks database and in the immuta database. This allows for an option to use a single database (immuta) for all tables.

Fine-grained Access Control

Accessing Data

All access controls must go through SQL.

Note: With R, you must load the SparkR library in a cell before accessing the data.

Mapping Users

Data Flow

The Immuta init script adds the immuta plugin in Databricks: the Immuta SecurityManager, wrappers, and Immuta analysis hook plan rewrite.
Data source metadata, tags, user metadata, and policy definitions are stored in Immuta's Metadata Database.
During Spark Analysis, Spark calls down to the Metastore to get table metadata.
Immuta intercepts the call to retrieve table metadata from the Metastore.
Immuta modifies the Logical Plan to enforce policies that apply to that user.
Immuta wraps the Physical Plan with specific Java classes to signal to the SecurityManager that it is a trusted node and is allowed to scan raw data. Immuta blocks direct access to S3 unless it backs a registered table in Immuta.
The Physical Plan is applied and filters out and transforms raw data coming back to the user.
The user sees policy-enforced data.

How-to Guides

Installation

This page contains references to the term whitelist, which Immuta no longer uses. When the term is removed from the software, it will be removed from this page.

Prerequisites

Databricks instance: Premium tier workspace and
Databricks instance has network level access to Immuta instance
Access to
Permissions and access to download (outside Internet access) or transfer files to the host machine

Recommended Databricks Workspace Configurations:

Note: Azure Databricks authenticates users with Microsoft Entra ID. Be sure to configure your Immuta instance with an IAM that uses the same user ID as does Microsoft Entra ID. Immuta's Spark security plugin will look to match this user ID between the two systems. See this for details.

Supported Databricks Runtime Versions

Use the table below to determine which version of Immuta supports your Databricks Runtime version:

Databricks Runtime Version

Immuta Version

Supported Databricks Cluster Configurations

Legend:

Supported Access Mode and Languages

Immuta supports the Custom access mode.

Supported Languages:
- Python
- SQL
- R (requires advanced configuration; work with your Immuta support professional to use R)
- Scala (requires advanced configuration; work with your Immuta support professional to use Scala)

Databricks Installation Overview

Users Who Can Read Raw Tables On-Cluster

If a Databricks Admin is tied to an Immuta account, they will have the ability to read raw tables on-cluster.
If a Databricks user is listed as an "ignored" user, they will have the ability to read raw tables on-cluster. Users can be added to the immuta.spark.acl.whitelist configuration to become ignored users.

The Immuta Databricks integration injects an Immuta plugin into the SparkSQL stack at cluster startup. The Immuta plugin creates an "immuta" database that is available for querying and intercepts all queries executed against it. For these queries, policy determinations will be obtained from the connected Immuta instance and applied before returning the results to the user.

The Databricks cluster init script provided by Immuta downloads the Immuta artifacts onto the target cluster and puts them in the appropriate locations on local disk for use by Spark. Once the init script runs, the Spark application running on the Databricks cluster will have the appropriate artifacts on its CLASSPATH to use Immuta for policy enforcement.

The cluster init script uses environment variables in order to

Determine the location of the required artifacts for downloading.
Authenticate with the service/storage containing the artifacts.

Note: Each target system/storage layer (HTTPS, for example) can only have one set of environment variables, so the cluster init script assumes that any artifact retrieved from that system uses the same environment variables.

Limitations

Installation Methods

There are two installation options for Databricks. Click a link below to navigate to a tutorial for your chosen method:

1. Adding the integration on the App Settings page.
2. Downloading or automatically pushing cluster policies to your Databricks workspace.
3. Creating or restarting your cluster.
1. Downloading and configuring Immuta artifacts.
2. Staging Immuta artifacts somewhere the cluster can read from during its startup procedures.
3. Protecting Immuta environment variables with Databricks Secrets.
4. Creating and configuring the cluster to start with the init script and load Immuta into its SparkSQL environment.

Debugging Immuta Installation Issues

For easier debugging of the Immuta Databricks installation, enable cluster init script logging. In the cluster page in Databricks for the target cluster, under Advanced Options -> Logging, change the Destination from NONE to DBFS and change the path to the desired output location. Note: The unique cluster ID will be added onto the end of the provided path.

For debugging issues between the Immuta web service and Databricks, you can view the Spark UI on your target Databricks cluster. On the cluster page, click the Spark UI tab, which shows the Spark application UI for the cluster. If you encounter issues creating Databricks data sources in Immuta, you can also view the JDBC/ODBC Server portion of the Spark UI to see the result of queries that have been sent from Immuta to Databricks.

Using the Validation and Debugging Notebook

The Validation and Debugging Notebook (immuta-validation.ipynb) is packaged with other Databricks release artifacts (for manual installations), or it can be downloaded from the App Settings page when configuring native Databricks through the Immuta UI. This notebook is designed to be used by or under the guidance of an Immuta Support Professional.

Import the notebook into a Databricks workspace by navigating to Home in your Databricks instance.
Click the arrow next to your name and select Import.
Once you have executed commands in the notebook and populated it with debugging information, export the notebook and its contents by opening the File menu, selecting Export, and then selecting DBC Archive.

Simplified Databricks Configuration

Audience: System Administrators
Content Summary: This guide details the simplified installation method for enabling native access to Databricks with Immuta policies enforced.
Prerequisites: Ensure your Databricks workspace, instance, and permissions meet the guidelines outlined in the .

Databricks Unity Catalog

If Unity Catalog is enabled in a Databricks workspace, you must use an Immuta cluster policy when you setup the integration to create an Immuta-enabled cluster.

1 - Add the Integration on the App Settings Page

Log in to Immuta and click the App Settings icon in the left sidebar.
Scroll to the System API Key subsection under HDFS and click Generate Key.
Click Save and then Confirm.
Scroll to the Integration Settings section, and click + Add a Native Integration.
Select Databricks Integration from the dropdown menu.
Complete the Hostname field.
Enter a Unique ID for the integration. By default, your Immuta instance URL populates this field. This ID is used to tie the set of cluster policies to your instance of Immuta and allows multiple instances of Immuta to access the same Databricks workspace without cluster policy conflicts.
Select your configured Immuta IAM from the dropdown menu.
Choose one of the following options for your data access model:
- Protected until made available by policy: All tables are hidden until a user is permissioned through an Immuta policy. This is how most databases work and assumes least privileged access and also means you will have to register all tables with Immuta.
- Available until protected by policy: All tables are open until explicitly registered and protected by Immuta. This makes a lot of sense if most of your tables are non-sensitive and you can pick and choose which to protect.
Select the Storage Access Type from the dropdown menu.
Opt to add any Additional Hadoop Configuration Files.
Click Add Native Integration.

2 - Configure Cluster Policies

Several cluster policies are available on the App Settings page when configuring this integration:

Click a link above to read more about each of these cluster policies before continuing with the tutorial.

Click Configure Cluster Policies.
Select one or more cluster policies in the matrix by clicking the Select button(s).
Opt to make changes to these cluster policies by clicking Additional Policy Changes and editing the text field.
Use one of the two Installation Types described in the tabs below to apply the policies to your cluster:

Automatically Push Cluster Policies

This option allows you to automatically push the cluster policies to the configured Databricks workspace. This will overwrite any cluster policy templates previously applied to this workspace.

Select the Automatically Push Cluster Policies radio button.
Enter your Admin Token. This token must be for a user who can create cluster policies in Databricks.
Click Apply Policies.

Manually Push Cluster Policies

Enabling this option will allow you to manually push the cluster policies to the configured Databricks workspace. There will be various files to download and manually push to the configured Databricks workspace.

Select the Manually Push Cluster Policies radio button.
Click Download Init Script.
Follow the steps in the Instructions to upload the init script to DBFS section.
Click Download Policies, and then manually add these Cluster Policies in Databricks.

Opt to click the Download the Benchmarking Suite to compare a regular Databricks cluster to one protected by Immuta. Detailed instructions are available in the first notebook, which will require an Immuta and non-Immuta cluster to generate test data and perform queries.
Click Close, and then click Save and Confirm.

3 - Add Policies to Your Cluster

In the Policy dropdown, select the Cluster Policies you pushed or manually added from Immuta.
Select the Custom Access mode.
Opt to adjust Autopilot Options and Worker Type settings: The default values provided here may be more than what is necessary for non-production or smaller use-cases. To reduce resource usage you can enable/disable autoscaling, limit the size and number of workers, and set the inactivity timeout to a lower value.
Opt to configure the Instances tab in the Advanced Options section:
Click Create Cluster.

4 - Register Data

5 - Query Immuta Data

Before users can query an Immuta data source, an administrator must give the user Can Attach To permissions on the cluster.

Example Queries

Manual Databricks Installation

Audience: System Administrators
Content Summary: This guide details the manual installation method for enabling native access to Databricks with Immuta policies enforced.
Prerequisites: Ensure your Databricks workspace, instance, and permissions meet the guidelines outlined in the .

Databricks Unity Catalog

If Unity Catalog is enabled in a Databricks workspace, you must use an Immuta cluster policy when you setup the integration to create an Immuta-enabled cluster.

The immuta_conf.xml is no longer required.

The immuta_conf.xml file that was previously used to configure the native Databricks integration is no longer required to install Immuta, so it is no longer staged as a deployment artifact. However, you can use if you wish to deploy an immuta_conf.xml file to set properties.

The required Immuta base URL and Immuta system API key properties, along with any other valid properties, can still be specified as Spark environment variables or in the optional immuta_conf.xml file. As before, if the same property is specified in both locations, the Spark environment variable takes precedence.

If you have an existing immuta_conf.xml file, you can continue using it. However, it's recommended that you delete any default properties from the file that you have not explicitly overridden, or remove the file completely and rely on Spark environment variables. Either method will ensure that any property defaults changed in upcoming Immuta releases are propagated to your environment.

1 - Download and Configure Immuta Artifacts

Spark Version

Use Spark 2 with Databricks Runtime prior to 7.x. Use Spark 3 with Databricks Runtime 7.x or later. Attempting to use an incompatible jar and Databricks Runtime will fail.

Navigate to the page. If you are prompted to log in and need basic authentication credentials, contact your Immuta support professional.
Navigate to the Databricks folder for your Immuta version. Ex: https://archives.immuta.com/hadoop/databricks/2024.1.13/.
Download the .jar file (Immuta plugin) as well as the other scripts listed below, which will load the plugin at cluster startup.
The immuta-benchmark-suite.dbc is a collection of notebooks packaged as a .dbc file. After you have added cluster policies to your cluster, you can import this file into Databricks to run performance tests and compare a regular Databricks cluster to one protected by Immuta. Detailed instructions are available in the first notebook, which will require an Immuta and non-Immuta cluster to generate test data and perform queries.
Specify the following properties as Spark environment variables or in the optional immuta_conf.xml file. If the same property is specified in both locations, the Spark environment variable takes precedence. The variable names are the config names in all upper case with _ instead of .. For example, to set the value of immuta.base.url via an environment variable, you would set the following in the Environment Variables section of cluster configuration: IMMUTA_BASE_URL=https://immuta.mycompany.com
- immuta.system.api.key: Obtain this value from the under HDFS > System API Key. You will need to be a user with the APPLICATION_ADMIN role to complete this action. Warning: Generating a key will destroy any previously generated HDFS keys. This will cause previously integrated HDFS systems to lose access to your Immuta console. The key will only be shown once when generated.
- immuta.base.url: The full URL for the target Immuta instance Ex: https://immuta.mycompany.com.
- immuta.user.mapping.iamid: If users authenticate to Immuta using an IAM different from Immuta's built-in IAM, you need to update the configuration file to reflect the ID of that IAM. The IAM ID is shown within the Immuta App Settings page within the Identity Management section. See for more details.

Environment Variables with Google Cloud Platform

Do not use environment variables to set sensitive properties when using Google Cloud Platform. Set them directly in immuta_conf.xml.

2 - Stage Immuta Artifacts

When configuring the Databricks cluster, a path will need to be provided to each of the artifacts downloaded/created in the previous step. To do this, those artifacts must be hosted somewhere that your Databricks instance can access. The following methods can be used for this step:

These artifacts will be downloaded to the required location within the clusters file-system by the init script downloaded in the previous step. In order for the init script to find these files, a URI will have to be provided through environment variables configured on the cluster. Each method's URI structure and setup is explained below.

AWS/S3

URI Structure: s3://[bucket]/[path]

Upload the configuration file, JSON file, and JAR file to an S3 bucket that the role from step 1 has access to.

Authenticating with Access Keys or Session Tokens (Optional)

If you wish to authenticate using access keys, add the following items to the cluster's environment variables:

If you've assumed a role and received a session token, that can be added here as well:

Azure

ADL Gen 2

URI Structure: abfs(s)://[container]@[account].dfs.core.windows.net/[path]

Environment Variables:

If you want to authenticate using an account key, add the following to your cluster's environment variables:

If you want to authenticate using an Azure SAS token, add the following to your cluster's environment variables:

ADL Gen 1

URI Structure: adl://[account].azuredatalakestore.net/[path]

Environment Variables:

If authenticating as a Microsoft Entra ID user,

If authenticating using a service principal,

HTTPS

URI Structure: http(s)://[host](:port)/[path]

Artifacts are available for download from Immuta using basic authentication. Your basic authentication credentials can be obtained from your Immuta support professional.

Environment Variables (Optional)

DBFS

DBFS does not support access control. Any Databricks user can access DBFS via the Databricks command line utility. Files containing sensitive materials (such as Immuta API keys) should not be stored there in plain text. Use other methods described herein to properly secure such materials.

URI Structure: dbfs:/[path]

Since any user has access to everything in DBFS:

The artifacts can be stored anywhere in DBFS.
It's best to have a cluster-specific place for your artifacts in DBFS if you are testing to avoid overwriting or reusing someone else's artifacts accidentally.

3 - Protect Immuta Environment Variables with Databricks Secrets

Databricks secrets can be used in the Environment Variables configuration section for a cluster by referencing the secret path rather than the actual value of the environment variable. For example, if a user wanted to make the following value secret

they could instead create a Databricks secret and reference it as the value of that variable. For instance, if the secret scope my_secrets was created, and the user added a secret with the key my_secret_env_var containing the desired sensitive environment variable, they would reference it in the Environment Variables section:

Then, at runtime, {{secrets/my_secrets/my_secret_env_var}} would be replaced with the actual value of the secret if the owner of the cluster has access to that secret.

Best Practice: Replace Sensitive Variables with Secrets

Immuta recommends that ANY SENSITIVE environment variables listed below in the various artifact deployment instructions be replaced with secrets.

4 - Create and Configure the Cluster

Cluster creation in an Immuta-enabled organization or Databricks workspace should be limited to administrative users to avoid allowing users to create non-Immuta enabled clusters.

Select the Custom Access mode.
Opt to adjust the Autopilot Options and Worker Type settings. The default values provided here may be more than what is necessary for non-production or smaller use-cases. To reduce resource usage you can enable/disable autoscaling, limit the size and number of workers, and set the inactivity timeout to a lower value.
In the Advanced Options section, click the Instances tab.
Click the Spark tab. In Spark Config field, add your configuration.
- Cluster Configuration Requirements:
Click the Init Scripts tab and set the following configurations:
- Destination: Specify the service you used to host the Immuta artifacts.
- File Path: Specify the full URI to the immuta_cluster_init_script.sh.
- Add the new key/value to the configuration.
Click the Permissions tab and configure the following setting:
- Who has access: Users or groups will need to have the permission Can Attach To to execute queries against Immuta configured data sources.
(Re)start the cluster.

Additional Hadoop Configuration File (Optional)

As mentioned in the "Environment Variables" section of the cluster configuration, there may be some cases where it is necessary to add sensitive configuration to SparkSession.sparkContext.hadoopConfiguration in order to read the data composing Immuta data sources.

As an example, when accessing external tables stored in Azure Data Lake Gen 2, Spark must have credentials to access the target containers/filesystems in ADLg2, but users must not have access to those credentials. In this case, an additional configuration file may be provided with a storage account key that the cluster may use to access ADLg2.

The additional configuration file looks very similar to the Immuta Configuration file referenced above. Some example configuration files for accessing different storage layers are below.

Amazon S3

IAM Role for S3 Access

Azure Data Lake Gen 2

Azure Data Lake Gen 1

ADL Prefix

Prior to Databricks Runtime version 6, the following configuration items should have a prefix of dfs.adls rather than fs.adl

Azure Blob Storage

5 - Register Data

6 - Query Immuta Data

When the Immuta enabled Databricks cluster has been successfully started, users will see a new database labeled "immuta". This database is the virtual layer provided to access data sources configured within the connected Immuta instance.

Before users can query an Immuta data source, an administrator must give the user Can Attach To permissions on the cluster and GRANT the user access to the immuta database.

The following SQL query can be run as an administrator within a journal to give the user access to "Immuta":

Creating a Databricks Data Source

Databricks to Immuta User Mapping

By default, the IAM used to map users between Databricks and Immuta is the BIM (Immuta's internal IAM). The Immuta Spark plugin will check the Databricks username against the username within the BIM to determine access. For a basic integration, this means the users email address in Databricks and the connected Immuta instance must match.

Install a Trusted Library

Audience: System Administrators
Content Summary: This page outlines how to install and configure trusted third-party libraries for Databricks.

1 - Install the Library

Specifying More than One Trusted Library

To specify more than one trusted library, comma delimit the URIs:

In the Databricks Clusters UI, install your third-party library .jar or Maven artifact with Library Source Upload, DBFS, DBFS/S3, or Maven. Alternatively, use the Databricks libraries API.
In the Databricks Clusters UI, add the IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URIS property as a Spark environment variable and set it to your artifact's URI:

Maven Artifacts

For Maven artifacts, the URI is maven:/<maven_coordinates>, where <maven_coordinates> is the Coordinates field found when clicking on the installed artifact on the Libraries tab in the Databricks Clusters UI. Here's an example of an installed artifact:

In this example, you would add the following Spark environment variable:

.jar Artifacts

For jar artifacts, the URI is the Source field found when clicking on the installed artifact on the Libraries tab in the Databricks Clusters UI. For artifacts installed from DBFS or S3, this ends up being the original URI to your artifact. For uploaded artifacts, Databricks will rename your .jar and put it in a directory in DBFS. Here's an example of an installed artifact:

In this example, you would add the following Spark environment variable:

Restart the cluster.

2 - Execute a Command in a Notebook

Once the cluster is up, execute a command in a notebook. If the trusted library installation is successful, you should see driver log messages like this:

Hiding the Immuta Database in Databricks

Audience: System Administrators
Content Summary: This page describes how to hide the immuta database in Databricks.

Hiding the database does not disable access to it

Queries can still be performed against tables in the immuta database using the Immuta-qualified table name (e.g., immuta.my_schema_my_table) regardless of whether or not this feature is enabled.

Overview

The immuta database on Immuta-enabled clusters allows Immuta to track Immuta-managed data sources separately from remote Databricks tables so that policies and other security features can be applied. However, Immuta supports raw tables in Databricks, so table-backed queries do not need to reference this database. When configuring a Databricks cluster, you can hide immuta from any calls to SHOW DATABASES so that users are not confused or misled by that database.

Hide the `immuta` Database

When configuring a Databricks cluster, hide immuta by using the following environment variable in the :

Then, Immuta will not show this database when a SHOW DATABASES query is performed.

Run spark-submit Jobs on Databricks

Audience: System Administrators
Content Summary: This guide illustrates how to run R and Scala spark-submit jobs on Databricks, including prerequisites and caveats.

Language Support

R and Scala are supported, but require advanced configuration; work with your Immuta support professional to use these languages. Python spark-submit jobs are not supported by the Databricks Spark integration.

Using R in a Notebook

Because of how some user properties are populated in Databricks, users should load the SparkR library in a separate cell before attempting to use any SparkR functions.

R `spark-submit`

Prerequisites

Before you can run spark-submit jobs on Databricks you must initialize the Spark session with the settings outlined below.

Initialize the Spark session by entering these settings into the R submit script immuta.spark.acl.assume.not.privileged="true" and spark.hadoop.immuta.databricks.config.update.service.enabled="false".
This will enable the R script to access Immuta data sources, scratch paths, and workspace tables.
Once the script is written, upload the script to a location in dbfs/S3/ABFS to give the Databricks cluster access to it.

Create the R `spark submit` Job

To create the R spark-submit job,

Go to the Databricks jobs page.
Create a new job, and select Configure spark-submit.
Set up the parameters:
Note: The path dbfs:/path/to/script.R can be in S3 or ABFS (on Azure Databricks), assuming the cluster is configured with access to that path.
Edit the cluster configuration, and change the Databricks Runtime to be a supported version (5.5, 6.4, 7.3, or 7.4).
Configure the Environment Variables section as you normally would for an .

Scala spark-submit

Prerequisites

Before you can run spark-submit jobs on Databricks you must initialize the Spark session with the settings outlined below.

Configure the Spark session with immuta.spark.acl.assume.not.privileged="true" and spark.hadoop.immuta.databricks.config.update.service.enabled="false".
Note: Stop your Spark session (spark.stop()) at the end of your job or the cluster will not terminate.
The spark submit job needs to be launched using a different classloader which will point at the designated user JARs directory. The following Scala template can be used to handle launching your submit code using a separate classloader:

Create the Scala `spark-submit` Job

To create the Scala spark-submit job,

Build and upload your JAR to dbfs/S3/ABFS where the cluster has access to it.
Select Configure spark-submit, and configure the parameters:
Note: The fully-qualified class name of the class whose main function will be used as the entry point for your code in the --class parameter.
Note: The path dbfs:/path/to/code.jar can be in S3 or ABFS (on Azure Databricks) assuming the cluster is configured with access to that path.
Edit the cluster configuration, and change the Databricks Runtime to a supported version (5.5, 6.4, 7.3, or 7.4).
Include IMMUTA_INIT_ADDITIONAL_JARS_URI=dbfs:/path/to/code.jar in the "Environment Variables" (where dbfs:/path/to/code.jar is the path to your jar) so that the jar is uploaded to all the cluster nodes.

Caveats

The user mapping works differently from notebooks because spark-submit clusters are not configured with access to the Databricks SCIM API. The cluster tags are read to get the cluster creator and match that user to an Immuta user.
Privileged users (Databricks Admins and Whitelisted Users) must be tied to an Immuta user and given access through Immuta to access data through spark-submit jobs because the setting immuta.spark.acl.assume.not.privileged="true" is used.
Currently when an API key is generated it invalidates the previous key. This can cause issues if a user is using multiple clusters in parallel, since each cluster will generate a new API key for that Immuta user. To avoid these issues, manually generate the API key in Immuta and set the immuta.api.key on all the clusters or use a specified job user for the submit job.

Project UDFs Cache Settings

This page outlines the configuration for setting up project UDFs, which allow users to set their current project in Immuta through Spark. For details about the specific functions available and how to use them, see the .

Use Project UDFs in Databricks

Currently, caches are not all invalidated outside of Databricks because Immuta caches information pertaining to a user's current project in the NameNode plugin and in Vulcan. Consequently, this feature should only be used in Databricks.

Web Service and On-Cluster Caches

Immuta caches a mapping of user accounts and users' current projects in the Immuta Web Service and on-cluster. When users change their project with UDFs instead of the Immuta UI, Immuta invalidates all the caches on-cluster (so that everything changes immediately) and the cluster submits a request to change the project context to a web worker. Immediately after that request, another call is made to a web worker to refresh the current project.

To allow use of project UDFs in Spark jobs, raise the caching on-cluster and lower the cache timeouts for the Immuta Web Service. Otherwise, caching could cause dissonance among the requests and calls to multiple web workers when users try to change their project contexts.

Recommended Configuration

1 - Lower Web Service Cache Timeout

Click the App Settings icon in the left sidebar and scroll to the HDFS Cache Settings section.
Lower the Cache TTL of HDFS user names (ms) to 0.
Click Save.

2 - Raise Cache Timeout On-Cluster

In the Spark environment variables section, set the IMMUTA_CURRENT_PROJECT_CACHE_TIMEOUT_SECONDS and IMMUTA_PROJECT_CACHE_TIMEOUT_SECONDS to high values (like 10000).

Note: These caches will be invalidated on cluster when a user calls immuta.set_current_project, so they can effectively be cached permanently on cluster to avoid periodically reaching out to the web service.

Blocking UDFs

Reference Guides

Cluster Policies

Python & SQL

Audience: System Administrators
Content Summary: This page describes the Python & SQL cluster policy.

Performance

This is the most performant policy configuration.

In this configuration, Immuta is able to rely on Databricks-native security controls, reducing overhead. The key security control here is the enablement of process isolation. This prevents users from obtaining unintentional access to the queries of other users. In other words, masked and filtered data is consistently made accessible to users in accordance with their assigned attributes. This Immuta cluster configuration relies on Py4J security being enabled.

Many Python ML classes (such as LogisticRegression, StringIndexer, and DecisionTreeClassifier) and dbutils.fs are unfortunately not supported with Py4J security enabled. Users will also be to use the Databricks Connect client library. Additionally, only Python and SQL are available as supported languages.

For full details on Databricks’ best practices in configuring clusters, please read their .

Python & SQL & R

Audience: System Administrators
Content Summary: This page describes the Python & SQL & R cluster policy.

Additional Overhead

In relation to the Python & SQL cluster policy, this configuration trades some additional overhead for added support of the R language.

In this configuration, you are able to rely on the Databricks-native security controls. The key security control here is the enablement of process isolation. This prevents users from obtaining unintentional access to the queries of other users. In other words, masked and filtered data is consistently made accessible to users in accordance with their assigned attributes.

Like the Python & SQL configuration, Py4j security is enabled for the Python & SQL & R configuration. However, because R has been added Immuta enables the SecurityManager, in addition to Py4j security, to provide more security guarantees. For example, by default all actions in R execute as the root user; among other things, this permits access to the entire filesystem (including sensitive configuration data), and, without iptable restrictions, a user may freely access the cluster’s cloud storage credentials. To address these security issues, Immuta’s initialization script wraps the R and Rscript binaries to launch each command as a temporary, non-privileged user with limited filesystem and network access and installs the Immuta SecurityManager, which prevents users from bypassing policies and protects against the above vulnerabilities from within the JVM.

Consequently, the cost of introducing R is that the SecurityManager incurs a small increase in performance overhead; however, average latency will vary depending on whether the cluster is homogeneous or heterogeneous. (In homogeneous clusters, all users are at the same level of groups/authorizations; this is enforced externally, rather than directly by Immuta.)

When users install third-party Java/Scala libraries, they will be denied access to sensitive resources by default. However, cluster administrators can specify which of the installed Databricks libraries should be by Immuta.

For full details on Databricks’ best practices in configuring clusters, please read their .

Scala

Audience: System Administrators
Content Summary: This page describes the Scala cluster policy.

Scala Clusters

This configuration is for Scala-only clusters.

Where Scala language support is needed, this configuration can be used in the Custom .

According to Databricks’ cluster type support documentation, Scala clusters are intended for . However, nothing inherently prevents a Scala cluster from being configured for multiple users. Even with the Immuta SecurityManager enabled, there are limitations to user isolation within a Scala job.

For a secure configuration, it is recommended that clusters intended for Scala workloads are limited to Scala jobs only and are made homogeneous through the use of or externally via convention/cluster ACLs. (In homogeneous clusters, all users are at the same level of groups/authorizations; this is enforced externally, rather than directly by Immuta.)

For full details on Databricks’ best practices in configuring clusters, please read their .

Sparklyr

Audience: System Administrators
Content Summary: This page describes the sparklyr cluster policy.

Single-User Clusters Recommended

Like Databricks, Immuta recommends single-user clusters for sparklyr when user isolation is required. A single-user cluster can either be a job cluster or a cluster with credential passthrough enabled. Note: spark-submit jobs are not currently supported.

Two cluster types can be configured with sparklyr: Single-User Clusters (recommended) and Multi-User Clusters (discouraged).

: Credential Passthrough (required on Databricks) allows a single-user cluster to be created. This setting automatically configures the cluster to assume the role of the attached user when reading from storage (S3). Because Immuta requires that raw data is readable by the cluster, the instance profile associated with the cluster should be used rather than a role assigned to the attached user.
: Because Immuta cannot guarantee user isolation in a multi-user sparklyr cluster, it is not recommended to deploy a multi-user cluster. To force all users to act under the same set of attributes, groups, and purposes with respect to their data access and eliminate the risk of a data leak, all sparklyr multi-user clusters must be equalized either by convention (all users able to attach to the cluster have the same level of data access in Immuta) or by configuration (detailed below).

Single-User Cluster Configuration

1 - Enable sparklyr

In addition to the configuration for an Immuta cluster with R, add this environment variable to the Environment Variables section of the cluster:

This configuration makes changes to the iptables rules on the cluster to allow the sparklyr client to connect to the required ports on the JVM used by the sparklyr backend service.

2 - Set Up a sparklyr Connection in Databricks

Install and load libraries into a notebook. Databricks includes the stable version of sparklyr, so library(sparklyr) in an R notebook is sufficient, but you may opt to install the latest version of sparklyr from CRAN. Additionally, loading library(DBI) will allow you to execute SQL queries.
Set up a sparklyr connection:
Pass the connection object to execute queries:

3 - Configure a Single-User Cluster

Add the following items to the Spark Config section of the cluster:

The trustedFileSystems setting is required to allow Immuta’s wrapper FileSystem (used in conjunction with the ImmutaSecurityManager for data security purposes) to be used with credential passthrough. Additionally, the InstanceProfileCredentialsProvider must be configured to continue using the cluster’s instance profile for data access, rather than a role associated with the attached user.

Multi-User Cluster Configuration

Immuta Discourages Deploying Multi-User Clusters with sparklyr Configuration

It is possible, but not recommended, to deploy a multi-user cluster sparklyr configuration. Immuta cannot guarantee user isolation in a multi-user sparklyr configuration.

The configurations in this section enable sparklyr, require project equalization, map sparklyr sessions to the correct Immuta user, and prevent users from accessing Immuta native workspaces.

Add the following environment variables to the Environment Variables section of your cluster configuration:
Add the following items to the Spark Config section:

Limitations

Immuta’s integration with sparklyr does not currently support

spark-submit jobs,
UDFs, or
Databricks Runtimes 5, 6, or 7.

Databricks Change Data Feed

Audience: Databricks Users
Content Summary: This page describes Immuta's support of .

Overview

CDF shows the row-level changes between versions of a Delta table. The changes displayed include row data and metadata that indicates whether the row was inserted, deleted, or updated.

Immuta does not support applying policies to the changed data, and the CDF cannot be read for data source tables if the user does not have access to the raw data in Databricks. However, the CDF can be read if the querying user is allowed to read the raw data and one of the following statements is true:

the table is in the current workspace,
the table is in a scratch path,
non-Immuta reads are enabled AND the table does not intersect with a workspace under which the current user is not acting, or
non-Immuta reads are enabled AND the table is not part of an Immuta data source.

Configure Change Data Feed

There are no configuration changes necessary to use this feature.

Limitation

Immuta does not support reading changes in .

Databricks Libraries

Audience: Databricks Administrators
Content Summary: This page provides an overview of Immuta's feature and support of .

Databricks Libraries and Immuta's Security Manager

The Immuta security manager blocks users from executing code that could allow them to gain access to sensitive data by only allowing select code paths to access sensitive files and methods. These select code paths provide Immuta's code access to sensitive resources while blocking end users from these sensitive resources directly.

Similarly, when users install third-party libraries those libraries will be denied access to sensitive resources by default. However, cluster administrators can specify which of the installed Databricks libraries should be .

Databricks Trusted Libraries

The trusted libraries feature allows Databricks cluster administrators to avoid Immuta security manager errors when using third-party libraries. An administrator can specify an installed library as "trusted," which will enable that library's code to bypass the Immuta security manager. Contact your Immuta support professional for custom security configurations for your libraries.

This feature does not impact Immuta's ability to apply policies; trusting a library only allows code through what previously would have been blocked by the security manager.

Security Vulnerability

Using this feature could create a security vulnerability, depending on the third-party library. For example, if a library exposes a public method named readProtectedFile that displays the contents of a sensitive file, then trusting that library would allow end users access to that file. Work with your Immuta support professional to determine if the risk does not apply to your environment or use case.

Support

Databricks Libraries API

Installing trusted libraries outside of the Databricks Libraries API (e.g., ADD JAR ...) is not supported.

The following types of libraries are supported when installing a third-party library using the Databricks UI or the Databricks Libraries API:

Library source is Upload, DBFS or DBFS/S3 and the Library Type is Jar.
Library source is Maven.

Limitations

Databricks installs libraries right after a cluster has started, but there is no guarantee that library installation will complete before a user's code is executed. If a user executes code before a trusted library installation has completed, Immuta will not be able to identify the library as trusted. This can be solved by either
- waiting for library installation to complete before running any third-party library commands or
- executing a Spark query. This will force Immuta to wait for any trusted Immuta libraries to complete installation before proceeding.
When installing a library using Maven as a library source, Databricks will also install any transitive dependencies for the library. However, those transitive dependencies are installed behind the scenes and will not appear as installed libraries in either the Databricks UI or using the Databricks Libraries API. Only libraries specifically listed in the IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URIS environment variable will be trusted by Immuta, which does not include installed transitive dependencies. This effectively means that any code paths that include a class from a transitive dependency but do not include a class from a trusted third-party library can still be blocked by the Immuta security manager. For example, if a user installs a trusted third-party library that has a transitive dependency of a file-util library, the user will not be able to directly use the file-util library to read a sensitive file that is normally protected by the Immuta security manager.
In many cases, it is not a problem if dependent libraries aren't trusted because code paths where the trusted library calls down into dependent libraries will still be trusted. However, if the dependent library needs to be trusted, there is a workaround:
1. Add the transitive dependency jar paths to the IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URIS environment variable. In the driver log4j logs, Databricks outputs the source jar locations when it installs transitive dependencies. In the cluster driver logs, look for a log message similar to the following:
2. In the above example, where slf4j is the transitive dependency, you would add the path dbfs:/FileStore/jars/maven/org/slf4j/slf4j-api-1.7.25.jar to the IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URIS environment variable and restart your cluster.

Troubleshooting

In case of failure, check the driver logs for details. Some possible causes of failure include

One of the Immuta configured trusted library URIs does not point to a Databricks library. Check that you have configured the correct URI for the Databricks library.
For trusted Maven artifacts, the URI must follow this format: maven:/group.id:artifact-id:version.
Databricks failed to install a library. Any Databricks library installation errors will appear in the Databricks UI under the Libraries tab.

Configuration

Notebook-Scoped Libraries on Machine Learning Clusters

Configuration

No additional configuration is needed to enable this feature. Users only need to be running on clusters with DBR 8+.

Ephemeral Overrides

Audience: System Administrators
Content Summary: This page describes ephemeral overrides for Databricks data sources.

Best Practices: Ephemeral Overrides

for clusters when using multiple workspaces and dedicate a single cluster to serve queries from Immuta in a single workspace.
If you use multiple E2 workspaces without disabling ephemeral overrides, avoid applying the where user row-level policy to data sources.

Overview

In Immuta, a Databricks data source is considered ephemeral, meaning that the compute resources associated with that data source will not always be available.

Ephemeral data sources allow the use of ephemeral overrides, user-specific connection parameter overrides that are applied to Immuta metadata operations and queries that the user runs through the Query Editor.

When a user runs a Spark job in Databricks, Immuta plugins automatically submit ephemeral overrides for that user to Immuta for all applicable data sources to use the current cluster as compute for all subsequent metadata operations for that user against the applicable data sources.

Example Query and Ephemeral Override Request

A user runs a query on cluster B.
The Immuta plugins on the cluster check if there is a source in the Metastore with a matching database, table name, and location for its underlying data. Note: If tables are dynamic or change over time, users can disable the comparison of the location of the underlying data by setting immuta.ephemeral.table.path.check.enabled to false; disabling allows users to avoid keeping the relevant data sources in Immuta up-to-date (which would require API calls and automation).
The Immuta plugins on the cluster detect that the user is subscribed to data sources 1, 2, and 3 and that data sources 1 and 3 are both present in the Metastore for cluster B, so the plugins submit ephemeral override requests for data sources 1 and 3 to override their connections with the HTTP path from cluster B.
Since data source 2 is not present in the Metastore, it is marked as a JDBC source.

If the user attempts to query data source 2 and they have not enabled JDBC sources, they will be presented with an error message telling them to do so:

com.immuta.spark.exceptions.ImmutaConfigurationException: This query plan will cause data to be pulled over JDBC. This spark context is not configured to allow this. To enable JDBC set immuta.enable.jdbc=true in the spark context hadoop configuration.

Immuta Operations that Use Ephemeral Overrides

Ephemeral overrides are enabled by default because Immuta must be aware of a cluster that is running to serve metadata queries. The operations that use the ephemeral overrides include

Visibility checks on the data source for a particular user. These checks assess how to apply row-level policies for specific users.
Stats collection triggered by a specific user.
Validating a custom WHERE clause policy against a data source. When owners or governors create custom WHERE clause policies, Immuta uses compute resources to validate the SQL in the policy. In this case, the ephemeral overrides for the user writing the policy are used to contact a cluster for SQL validation.
High Cardinality Column detection. Certain advanced policy types (e.g., minimization and randomized response) in Immuta require a High Cardinality Column, and that column is computed on data source creation. It can be recomputed on demand and, if so, will use the ephemeral overrides for the user requesting computation.

However, ephemeral overrides can be problematic in environments that have a dedicated cluster to handle maintenance activities, since ephemeral overrides can cause these operations to execute on a different cluster than the dedicated one.

Configure Overrides in Immuta-Enabled Clusters

To reduce the risk that a user has overrides set to a cluster (or multiple clusters) that aren't currently up,

direct all clusters' HTTP paths for overrides to a cluster dedicated for metadata queries or
disable overrides completely.

Disable Ephemeral Overrides

To disable ephemeral overrides, set immuta.ephemeral.host.override in spark-defaults.conf to false.

Py4j Security Error

Audience: Data Users and System Administrators
Content Summary: This page provides an explanation and solution for this common Databricks error.

Py4j Security Error Details

Error Message: py4j.security.Py4JSecurityException: Constructor <> is not whitelisted
Explanation: This error indicates you are being blocked by Py4j security rather than the Immuta Security Manager. is strict and generally ends up blocking many ML libraries.
Solution: Turn off Py4j security on the offending cluster by setting IMMUTA_SPARK_DATABRICKS_PY4J_STRICT_ENABLED=false in the environment variables section. Additionally, because there are limitations to the security mechanisms Immuta employs on-cluster when Py4j security is disabled, ensure that all users on the cluster have the same level of access to data, as users could theoretically see (policy-enforced) data that other users have queried.

Scala Cluster Security Details

Audience: System Administrators
Content Summary: It is most secure to leverage an equalized project when working in a Scala cluster; however, it is not required to limit Scala to equalized projects. This document outlines security recommendations for Scala clusters and discusses the security risks involved when equalized projects are not used.

Language Support

R and Scala are both supported, but require advanced configuration; work with your Immuta support professional to use these languages.

Recommendations

There are limitations to isolation among users in Scala jobs on a Databricks cluster, even when using Immuta’s SecurityManager. When data is broadcast, cached (spilled to disk), or otherwise saved to SPARK_LOCAL_DIR, it's impossible to distinguish between which user’s data is composed in each file/block. If you are concerned about this vulnerability, Immuta suggests that Scala clusters

be limited to Scala jobs only.
use project equalization, which forces all users to act under the same set of attributes, groups, and purposes with respect to their data access.

Context for Security: Why Project Equalization is Recommended

When data is read in Spark using an Immuta policy-enforced plan, the masking and redaction of rows is performed at the leaf level of the physical Spark plan, so a policy such as "Mask using hashing the column social_security_number for everyone" would be implemented as an expression on a project node right above the FileSourceScanExec/LeafExec node at the bottom of the plan. This process prevents raw data from being shuffled in a Spark application and, consequently, from ending up in SPARK_LOCAL_DIR.

This policy implementation coupled with an equalized project guarantees that data being dropped into SPARK_LOCAL_DIR will have policies enforced and that those policies will be homogeneous for all users on the cluster. Since each user will have access to the same data, if they attempt to manually access other users' cached/spilled data, they will only see what they have access to via equalized permissions on the cluster. If project equalization is not turned on, users could dig through that directory and find data from another user with heightened access, which would result in a data leak.

Configuration for Requiring Equalized Projects with Scala

To require that Scala clusters be used in equalized projects and avoid the risk described above, change the immuta.spark.require.equalization value to true in your Immuta configuration file when you spin up Scala clusters:

Once this configuration is complete, users on the cluster will need to switch to an Immuta equalized project before running a job. (Remember that when working under an Immuta Project, only tables within that project can be seen.) Once the first job is run using that equalized project, all subsequent jobs, no matter the user, must also be run under that same equalized project. If you need to change a cluster's project, you must restart the cluster.