Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Private preview: This feature is only available to select accounts. Reach out to your Immuta representative to enable this feature.
Snowflake column lineage specifies how data flows from source tables or columns to the target tables in write operations. When Snowflake lineage tag propagation is enabled in Immuta, Immuta automatically applies tags added to a Snowflake table to its descendant data source columns in Immuta so you can build policies using those tags to restrict access to sensitive data.
Snowflake Access History tracks user read and write operations. Snowflake column lineage extends this Access History to specify how data flows from source columns to the target columns in write operations, allowing data stewards to understand how sensitive data moves from ancestor tables to target tables so that they can
trace data back to its source to validate the integrity of dashboards and reports,
identify who performed write operations to meet compliance requirements,
evaluate data quality and pinpoint points of failure, and
tag sensitive data on source tables without having tag columns on their descendant tables.
However, tagging sensitive data doesn’t innately protect that data in Snowflake; users need Immuta to disseminate these lineage tags automatically to descendant tables registered in Immuta so data stewards can build policies using the semantic and business context captured by those tags to restrict access to sensitive data. When Snowflake lineage tag propagation is enabled, Immuta propagates tags applied to a data source to its descendant data source columns in Immuta, which keeps your data inventory in Immuta up-to-date and allows you to protect your data with policies without having to manually tag every new Snowflake data source you register in Immuta.
An application administrator enables the feature on the Immuta app settings page.
Snowflake lineage metadata (column names and tags) for the Snowflake tables is stored in the metadata database.
A data owner creates a new data source (or adds a new column to a Snowflake table) that initiates a job that applies all tags for each column from its ancestor columns.
A data owner or governor adds a tag to a column in Immuta that has descendants, which initiates a job that propagates the tag to all descendants.
An audit record is created that includes which tags were applied and from which columns those tags originated.
The Snowflake Account Usage ACCESS_HISTORY
view contains column lineage information.
To appropriately propagate tags to descendant data sources, Immuta fetches Access History metadata to determine what column tags have been updated, stores this metadata in the Immuta metadata database, and then applies those tags to relevant descendant columns of tables registered in Immuta.
Consider the following example using the Customer, Customer 2, and Customer 3 tables that were all registered in Immuta as data sources.
Customer: source table
Customer 2: descendant of Customer
Customer 3: descendant of Customer 2
If the Discovered.Electronic Mail Address
tag is added to the Customer data source in Immuta, that tag will propagate through lineage to the Customer 2 and Customer 3 data sources.
After an application administrator has enabled Snowflake lineage tag propagation, data owners can register data in Immuta and have tags in Snowflake propagated from ancestor tables to descendant data sources. Whenever new tags are added to those tables in Immuta, those upstream tags will propagate to descendant data sources.
By default all tags are propagated, but these tags can be filtered on the app settings page or using the Immuta API.
Lineage tag propagation works with any tag added to the data dictionary. Tags can be manually added, synced from an external catalog, or discovered by SDD. Consider the following example using the Customer, Customer 2, and Customer 3 tables that were all registered in Immuta as data sources.
Customer: source table
Customer 2: descendant of Customer
Customer 3: descendant of Customer 2
Immuta added the Discovered.Electronic Mail Address
tag to the Customer data source, and that tag propagated through lineage to the Customer 2 and Customer 3 data sources.
Removing the tag from the Customer 2 table soft deletes it from the Customer 2 data source. When a tag is deleted, downstream lineage tags are removed, unless another parent data source still has that tag. The tag remains visible, but it will not be re-added if a future propagation event specifies the same tag again. Immuta prevents you from removing Snowflake object tags from data sources. You can only remove Immuta-managed tags. To remove Snowflake object tags from tables, you must remove them in Snowflake.
However the Discovered.Electronic Mail Address
tag still applies to the Customer 3 data source because Customer still has the tag applied. The only way a tag will be removed from descendant data sources is if no other ancestor of the descendant still prescribes the tag.
If the Snowflake lineage tag propagation feature is disabled, tags will remain on Immuta data sources.
Sensitive data discovery will still run on data sources and can be manually triggered. Tags applied through sensitive data discovery will propagate as tags added through lineage to descendant Immuta data sources.
Immuta audit records include Snowflake lineage tag events when a tag is added or removed.
The example audit record below illustrates the SNOWFLAKE_TAGS.pii
tag successfully propagating from the Customer table to Customer 2:
Without tableFilter
set, Immuta will ingest lineage for every table on the Snowflake instance.
Tag propagation based on lineage is not retroactive. For example, if you add a table, add tags to that table, and then run the lineage ingestion job, tags will not get propagated. However, if you add a table, run the lineage ingestion job, and then add tags to the table, the tags will get propagated.
The native lineage job needs to pull in lineage data before any tag is applied in Immuta. When Immuta gets new lineage information from Snowflake, Immuta does not update existing tags in Immuta.
There can be up to a 3-hour delay in Snowflake for a lineage event to make it into the ACCESS_HISTORY
view.
Immuta does not ingest lineage information for views.
Snowflake only captures lineage events for CTAS
, CLONE
, MERGE
, and INSERT
write operations. Snowflake does not capture lineage events for DROP
, RENAME
, ADD
, or SWAP
. Instead of using these latter operations, you need to recreate a table with the same name if you need to make changes.
Immuta cannot enforce coherence of your Snowflake lineage. If a column, table, or schema in the middle of the lineage graph gets dropped, Immuta will not do anything unless a table with that same name gets recreated. This means a table that gets dropped but not recreated could live in Immuta’s system indefinitely.
The warehouse you select when configuring the Snowflake integration uses compute resources to set up the integration, register data sources, orchestrate policies, and run jobs like sensitive data discovery. Snowflake credit charges are based on the size of and amount of time the warehouse is active, not the number of queries run.
This document prescribes how and when to adjust the size and scale of clusters for your warehouse to manage workloads so that you can use Snowflake compute resources the most cost effectively.
In general, increase the size of and number of clusters for the warehouse to handle heavy workloads and multiple queries. Workloads are typically lighter after data sources are onboarded and policies are established in Immuta, so compute resources can be reduced after those workloads complete.
The Snowflake integration uses warehouse compute resources to sync policies created in Immuta to the Snowflake objects registered as data sources and, if enabled, to run sensitive data discovery and schema monitoring. Follow the guidelines below to adjust the warehouse size and scale according to your needs.
Enable auto-suspend and auto-resume to optimize resource use in Snowflake. In the Snowflake UI, the lowest auto suspend time setting is 5 minutes. However, through SQL query, you can set auto_suspend
to 61 seconds (since the minimum uptime for a warehouse is 60seconds). For example,
Sensitive data discovery uses compute resources for each table registered if it is enabled. Consider disabling sensitive data discovery when registering data sources if you have an external catalog available or a tagging strategy in place.
Register data before creating global policies. By default, Immuta does not apply a subscription policy on registered data (unless an existing global policy applies to it), which allows Immuta to only pull metadata instead of also applying policies when data sources are created. Registering data before policies are created reduces the workload and the Snowflake compute resources needed.
Begin onboarding with a small dataset of tables, and then review and monitor query performance in the Snowflake Query Monitor. Adjust the virtual warehouse accordingly to handle heavier loads.
Schema monitoring uses the compute warehouse that was employed during the initial ingestion. If you expect a low number of new tables or minimal changes to the table structure, consider scaling down the warehouse size.
Resize the warehouse after after data sources are registered and policies are established. For example,
For more details and guidance about warehouse sizing, see the Snowflake Warehouse Considerations documentation.
Even after your integration is configured, data sources are registered, and policies are established, changes to those data sources or policies may initiate heavy workloads. Follow the guidelines below to adjust your warehouse size and scale according to your needs.
Review your Snowflake query history to identify query performance and bottlenecks.
Check how many credits queries have consumed:
After reviewing query performance and cost, implement strategies above to adjust your warehouse.
Snowflake Enterprise Edition required
In this integration, Immuta manages access to Snowflake tables by administering Snowflake row access policies and column masking policies on those tables, allowing users to query tables directly in Snowflake while dynamic policies are enforced.
Like with all Immuta integrations, Immuta can inject its ABAC model into policy building and administration to remove policy management burden and significantly reduce role explosion.
When an administrator configures the Snowflake integration with Immuta, Immuta creates an IMMUTA
database and schemas (immuta_procedures
, immuta_policies
, and immuta_functions
) within Snowflake to contain policy definitions and user entitlements. Immuta then creates a system role and gives that system account the privileges required to orchestrate policies in Snowflake and maintain state between Snowflake and Immuta. See the Snowflake privileges section for a list of privileges, the user they must be granted to, and an explanation of why they must be granted.
An Immuta application administrator configures the Snowflake integration and registers Snowflake warehouse and databases with Immuta.
Immuta creates a database inside the configured Snowflake warehouse that contains Immuta policy definitions and user entitlements.
A data owner registers Snowflake tables in Immuta as data sources.
If Snowflake tag ingestion was enabled during the configuration, Immuta uses the host provided in the configuration and ingests internal tags on Snowflake tables registered as Immuta data sources.
A data owner, data governor, or administrator creates or changes a policy or a user's attributes change in Immuta.
The Immuta web service calls a stored procedure that modifies the user entitlements or policies.
Immuta manages and applies Snowflake governance column and row access policies to Snowflake tables that are registered as Immuta data sources.
If Snowflake table grants is not enabled, Snowflake object owner or user with the global MANAGE GRANTS privilege grants SELECT privilege on relevant Snowflake tables to users. Note: Although they are GRANTed access, if they are not subscribed to the table via Immuta-authored policies, they will not see data.
A Snowflake user who is subscribed to the data source in Immuta queries the corresponding table directly in Snowflake and sees policy-enforced data.
When Immuta users create policies, they are then pushed into the Immuta database within Snowflake; there, the Immuta system account orchestrates Snowflake row access policies and column masking policies directly onto Snowflake tables. Changes in Immuta policies, user attributes, or data sources trigger webhooks that keep the Snowflake policies up-to-date.
For a user to query Immuta-protected data, they must meet two qualifications:
They must be subscribed to the Immuta data source.
They must be granted SELECT
access on the table by the Snowflake object owner or automatically via the Snowflake table grants feature.
After a user has met these qualifications they can query Snowflake tables directly.
See the integration support matrix on the Data policy types reference guide for a list of supported data policy types in Snowflake.
When a user applies a masking policy to a Snowflake data source, Immuta truncates masked values to align with Snowflake column length (VARCHAR(X)
types) and precision (NUMBER (X,Y)
types) requirements.
Consider these columns in a data source that have the following masking policies applied:
Column A (VARCHAR(6)): Mask using hashing for everyone
Column B (VARCHAR(5)): Mask using a constant REDACTED
for everyone
Column C (VARCHAR(6)): Mask by making null for everyone
Column D (NUMBER(3, 0)): Mask by rounding to the nearest 10 for everyone
Querying this data source in Snowflake would return the following values:
5w4502
REDAC
990
6e3611
REDAC
750
9s7934
REDAC
380
Hashing collisions
Hashing collisions are more likely to occur across or within Snowflake columns restricted to short lengths, since Immuta truncates the hashed value to the limit of the column. (Hashed values truncated to 5 characters have a higher risk of collision than hashed values truncated to 20 characters.) Therefore, avoid applying hashing policies to Snowflake columns with such restrictions.
For more details about Snowflake column length and precision requirements, see the Snowflake behavior change release documentation.
When a policy is applied to a column, Immuta uses Snowflake memoizable functions to cache the result of the called function. Then, when a user queries a column that has that policy applied to it, Immuta uses that cached result to dramatically improve query performance.
The privilege grants the Snowflake integration requires align to the least privilege security principle. The table below describes each privilege required in Snowflake for the or the user. The references to IMMUTA_DB
, IMMUTA_WH
, and IMMUTA_IMPERSONATOR_ROLE
in the table can be replaced with what you chose for the name of your Immuta database, warehouse, and impersonation role when setting up the integration, respectively.
CREATE DATABASE ON ACCOUNT WITH GRANT OPTION
Setup user
All
The setup script this user runs creates an Immuta database in the customer Snowflake account where all Immuta managed objects (UDFs, masking policies, row access policies, and user entitlements) will be written and stored.
CREATE ROLE ON ACCOUNT WITH GRANT OPTION
Setup user
All
The setup script this user runs creates a ROLE
for Immuta that will be used to manage the integration once it has been initialized.
CREATE USER ON ACCOUNT WITH GRANT OPTION
Setup user
All
The setup script this user runs creates the IMMUTA_SYSTEM_ACCOUNT
user that Immuta will use to manage the integration.
MANAGE GRANTS ON ACCOUNT
Setup user
All
The user configuring the integration must be able to GRANT
global privileges and access to objects within the Snowflake account. All privileges that are documented here are granted to the IMMUTA_SYSTEM_ACCOUNT
user by this setup user.
OWNERSHIP ON ROLE IMMUTA_IMPERSONATOR_ROLE
IMMUTA_SYSTEM_ACCOUNT
user
Impersonation
If impersonation is enabled, Immuta must be able to manage the Snowflake roles used for impersonation, which is created when the setup script runs, in order to manage the impersonation feature.
ALL PRIVILEGES ON DATABASE IMMUTA_DB
ALL PRIVILEGES ON ALL SCHEMAS IN DATABASE IMMUTA_DB
USAGE ON FUTURE PROCEDURES IN SCHEMA IMMUTA_DB.IMMUTA_PROCEDURES
IMMUTA_SYSTEM_ACCOUNT
user
All
The setup script grants the Immuta system account user these privileges because Immuta must have full ownership of the Immuta database where Immuta objects are managed.
USAGE ON WAREHOUSE IMMUTA_WH
IMMUTA_SYSTEM_ACCOUNT
user
All
To make changes to state in the Immuta database, Immuta requires access to compute (a Snowflake warehouse). Some state changes are DDL operations, and others are DML and require compute.
IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE
IMMUTA_SYSTEM_ACCOUNT
user
Audit
APPLY TAG ON ACCOUNT
IMMUTA_SYSTEM_ACCOUNT
user
Tag ingestion
APPLY MASKING POLICY ON ACCOUNT
APPLY ROW ACCESS POLICY ON ACCOUNT
IMMUTA_SYSTEM_ACCOUNT
user
Snowflake integration with governance features enabled
MANAGE GRANTS ON ACCOUNT
IMMUTA_SYSTEM_ACCOUNT
user
Table grants
Immuta must be able to MANAGE GRANTS
on objects throughout the customer's Snowflake account.
CREATE ROLE ON ACCOUNT
IMMUTA_SYSTEM_ACCOUNT
user
Table grants
When using the table grants feature, Immuta must be able to create roles as targets for Immuta subscription policy permissions in the customer’s Snowflake account.
USAGE ON DATABASE IMMUTA_DB
USAGE ON SCHEMA IMMUTA_DB.IMMUTA_PROCEDURES
USAGE ON SCHEMA IMMUTA_DB.IMMUTA_FUNCTIONS
USAGE ON FUTURE FUNCTIONS IN SCHEMA IMMUTA_DB.IMMUTA_FUNCTIONS
USAGE ON SCHEMA IMMUTA_DB.IMMUTA_SYSTEM
SELECT ON IMMUTA_DB.IMMUTA_SYSTEM.USER_PROFILE
PUBLIC
role
All
Immuta has stored procedures and functions that are used for policy enforcement and do not expose or contain any sensitive information. These objects must be accessible by all users to facilitate the use and creation of policies or views to enforce Immuta policies in Snowflake.
SELECT ON IMMUTA_DB.IMMUTA_SYSTEM.ALLOW_LIST
PUBLIC
role
All