Skip to content

You are viewing documentation for Immuta version 2021.5.

For the latest version, view our documentation for Immuta SaaS or the latest self-hosted version.

Snowflake Integration

Audience: System Administrators, Data Owners, and Data Users

Content Summary: This page describes the Snowflake integration, through which Immuta applies policies directly in Snowflake. Users can use the Snowflake Web UI and their existing BI tools to query protected data natively in Snowflake.

There are two integration options:

See the Snowflake integration page for a tutorial on enabling Snowflake through the App Settings page.

Snowflake Integration Using Snowflake Governance Features

Snowflake Enterprise Edition Required

This integration requires the Snowflake Enterprise Edition.

In this integration, Immuta manages access to Snowflake tables by administering Snowflake row access policies and column masking policies on those tables, allowing users to query tables directly in Snowflake while dynamic policies are enforced.

Like with all Immuta integrations, Immuta can inject its ABAC model into policy building and administration to remove policy management burden and reduce role explosion significantly.

Data Flow

  1. An Immuta Application Administrator configures the Snowflake integration and registers Snowflake warehouse and databases with Immuta.
  2. Immuta creates a database inside the configured Snowflake warehouse that contains Immuta policy definitions and user entitlements.
  3. A Data Owner registers Snowflake tables in Immuta as data sources. See Best Practices for Registering Data Sources.
  4. If a Snowflake catalog was registered during the configuration, Immuta uses the host provided in the configuration and ingests internal tags on Snowflake tables registered as Immuta data sources.
  5. A Data Owner, Data Governor, or Administrator creates or changes a policy or a user's attributes change in Immuta.
  6. The Immuta Web Service calls a stored procedure that modifies the user entitlements or policies.
  7. Immuta manages and applies Snowflake Governance column and row access policies to Snowflake tables that are registered as Immuta data sources.
  8. A Snowflake object owner or user with the global MANAGE GRANTS privilege grants SELECT privilege on relevant Snowflake tables to users. Note: Although they are GRANTed access, if they are not subscribed to the table via Immuta-authored policies, they will not see data.
  9. A Snowflake user who is subscribed to the data source in Immuta queries the corresponding table directly in Snowflake and sees policy-enforced data.

Snowflake Integration

Registering Data Sources

Best Practice

Use a dedicated Snowflake role to register Snowflake tables as Immuta data sources. Then, include this role in the Excepted Roles/Users list.

Registering Snowflake data sources using a dedicated Snowflake role for which no policies will apply ensures that your integration works with the following use cases:

  • Snowflake Project Workspaces: Snowflake workspaces generate static views with the credentials used to register the table as an Immuta data source. Those tables must be registered in Immuta by an Excepted Role so that policies applied to the backing tables are not applied to the project workspace views.

  • Using Views and Tables within Immuta: Because this integration uses Snowflake Governance Policies, users can register tables and views as Immuta data sources. However, if you want to register views and apply different policies to them than their backing tables, those views must be registered in Immuta by an Excepted Role; otherwise, the backing table’s policies will be applied to that view.

Excepted Roles/Users

Excepted Roles and Users are assigned when the integration is installed, and no policies will apply to these users' queries, despite any Immuta policies enforced on the tables they are querying. Consequently, roles and users added to this list should be limited to service accounts.

Immuta excludes the listed roles and users from policies by wrapping all policies in a CASE statement that will check if a user is acting under one of the listed user names or roles. If a user is, then the policy will not be acted on the queried table. If the user is not, then the policy will be executed like normal. Immuta does not distinguish between role and user name, so if you have a role and user with the exact same name, both the user and any user acting under that role will have full access to the data sources and no policies will be enforced for them.

Snowflake Integration Migration

Migration Troubleshooting
  • If multiple Snowflake integrations are enabled, they will all migrate together. If one fails, they will all revert to the Snowflake Standard integration.
  • If an error occurs during migration and the integration cannot be reverted, the integration must be disabled and re-enabled.

You can migrate from a Snowflake integration without Snowflake Governance Features to a Snowflake integration with Snowflake Governance Features on the App Settings page. Once prompted, Immuta will migrate the integration, allowing users to seamlessly transition workloads from the legacy Immuta views to the direct Snowflake tables.

After the migration is complete, Immuta views will still exist for pre-existing Snowflake data sources to support existing workflows. However, disabling the Immuta data source will drop the Immuta view, and, if the data source is re-enabled, the view will not be recreated.

You can migrate back to a Snowflake integration without Snowflake Governance Features from the Snowflake integration using Governance Features if any issues occur. However, this process is only intended to resolve any issues that occur during migration and regain utility of Immuta. Please consult your Immuta professional for assistance.

Access must be revoked.

Access to the Snowflake tables must be revoked when migrating from the Snowflake Enterprise to the Snowflake Standard integration to prevent users from having access to the raw tables.

Caveats

  • Immuta policies that rely on a masked column as input cannot be natively queried in Snowflake. These policies will present a message upon creation and in the health status of any affected data sources. To avoid any data leaks, more strict masking will be enforce until the policies are changed.

    • Additionally, if there is any other error in generating or applying policies natively in Snowflake, the data source will be locked and only users on the Excepted Roles/Users List and the credentials used to create the data source will be able to access the data.
  • Users are unable to rollback from the Snowflake Integration using Governance Features to the not using Governance Features if Snowflake SQL-backed data sources exist. Before trying to rollback, edit the data sources to be Snowflake tables or views.

  • Once a Snowflake integration is disabled in Immuta, the user must remove the access that was granted in Snowflake. If that access is not revoked, users will be able to access the raw table in Snowflake.

  • Migration must be done using the credentials and credential method (automatic or bootstrap) used to install the integration.

Snowflake Integration Without Snowflake Governance Features

In this model, all enforcement is done by creating views that contain all policy logic. Each view has a 1-to-1 relationship with the original table. All policy-enforced views are accessible through the PUBLIC role and access controls are applied in the view, allowing customers to leverage Immuta's powerful set of attribute-based policies. Additionally, users can continue using roles to enforce compute-based policies through "warehouse" roles, without needing to grant each of those roles access to the underlying data or create multiple views of the data for each specific business unit.

Sync Views and Data Sources

This integration leverages webhooks to keep Snowflake views up-to-date with the corresponding Immuta data sources. Whenever a data source or policy is created, updated, or disabled, a webhook will be called that will create, modify, or delete the native Snowflake view.

The SQL that makes up all views includes a join to the secure view: immuta_system.user_profile. This view is a select from the immuta_system.profile table (which contains all Immuta users and their current groups, attributes, projects, and a list of valid tables they have access to) with a constraint immuta__userid = current_user() to ensure it only contains the profile row for the current user. This secure view is readable by all users and will only display the data that corresponds to the user executing the query.

Note: The immuta_system.profile table is updated through webhooks whenever a user's groups or attributes change, they switch projects, they acknowledge a purpose, or when their data source access is approved or revoked. The profile table can only be read and updated by the Immuta system account.

Secure and Non-Secure Views

When creating a native Snowflake data source, users have the option to use a regular view (traditional database view) or a secure view; however, according to Snowflake's documentation , "the Snowflake query optimizer, when evaluating secure views, bypasses certain optimizations used for regular views. This may result in some impact on query performance for secure views." To use the data source with both Snowflake and Snowflake Workspaces, secure views are necessary. Note: If HIPAA compliance is required, secure views must be used.

Snowflake Native View

Non-Secure View Policy Implications

When using a non-secure view, certain policies may leak sensitive information. In addition to the concerns outlined here, there is also a risk of someone exploiting the query optimizer to discover that a row exists in a table that has been excluded by row-level policies. This attack is mentioned here in the Snowflake documentation.

Policies that will not leak sensitive information

  • masking by making NULL, using a constant, or by rounding (date/numeric)
  • minimization row-level policies
  • date-based row-level policies
  • k-anonymization masking policies

Policies that could leak sensitive information

  • masking using a regex will show the regex being applied. In general this should be safe, but if you have a regex policy that removes a specific selector to redact (e.g., a regex of /123-45-6789/g to specifically remove a single SSN from a column), then someone would be able to identify columns with that value.
  • in conditional masking and custom WHERE clauses including “Right To Be Forgotten,” the custom SQL will be visible, so for a policy like "only show rows where COUNTRY NOT IN(‘UK’, ‘AUS’)," users will know that it’s possible there is data in that table containing those values.

Policies that will leak potentially sensitive information

These policies leak information sensitive to Immuta, but in most cases would require an attacker to reverse the algorithm. In general these policies should be used with secure views:

  • masking using hashing will include the salt used
  • numeric and categorical local differential privacy will include the salt used
  • reversible masking will include both a key and an IV
  • format preserving masking will include a tweak, key, an alphabet range, prefix, pad to length, and checksum id if used

Policy Enforcement

The data sources themselves have all the Data policies included in the SQL through a series of CASE statements that determine which view of the data a user will see. Row-level policies are applied as top-level WHERE clauses, and usage policies (purpose-based or subscription-level) are applied as WHERE clauses against the user_profile JOIN. The access_check function allows Immuta to throw custom errors similar to the Query Engine when a user lacks access to a data source because they are not subscribed to the data source, they are operating under the wrong project, or they cannot view any data because of policies enforced on the data source.

Snowflake Database Structure

By default, all native views are created within the immuta database, which is accessible by the PUBLIC role, so users acting under any Snowflake role can connect. All views within the database have the SELECT permission granted to the PUBLIC role as well, and access is enforced by the access_check function built into the individual views. Consequently, there is no need for users to manage any role-based access to any of the database objects managed by Immuta.

Limitations

  • Immuta is unable to create a corresponding view in Snowflake for data sources

    • that have a differential privacy policy applied,
    • with an external policy handler, or
    • that are using the Advanced Rules DSL.
  • Certain interpolation functions can also block the creation of a native view, specifically @interpolatedComparison() and @iam.

Multiple Snowflake Instances

A user can configure multiple integrations of Snowflake to a single Immuta instance and use them dynamically or with workspaces.

Caveats

  • There can only be one native connection per host.
  • The host of the data source must match the host of the native connection for the native view to be created.
  • Projects can only be configured to use one Snowflake host.

Native Query Audit

Once this feature has been enabled on the App Settings page with the Snowflake native integration, Immuta will run a query against Snowflake to retrieve the query histories. These histories provide audit records for queries against Snowflake native data sources that are queried natively in Snowflake.

This process will happen automatically every 24 hours, or can be manually prompted at any time from the Immuta Audit page. When manually prompted, it will only search for new queries that were created since the latest native query that has been audited. The job is run in the background, so the new queries will not be immediately available.

For details about the contents of these audits, see the Native Query Audit Logs page.

Prompt Native Query Audit

To manually prompt the native query audit, click Native Query Audit on the Audit page:

Native Query Button: Audit Page

Alternatively, the schedule for the automatic job can be changed to fit your needs. See instructions for changing the frequency of the automatic job on the App Settings Tutorial page.

Caveats

  • The scheduled and manual jobs that query Snowflake are run in the background. The audit records will not update immediately.
  • If you are relying on the scheduled job, any audit records for queries run in the day will not appear until the next day, at the earliest. In some cases, they could appear another day later.
  • This feature is only available with Snowflake Enterprise or higher.

Snowflake External Catalog

When configuring a native Snowflake integration, you can add Snowflake as an external catalog as well. With this feature enabled, Immuta will automatically ingest Snowflake Object Tags from your Snowflake instance into Immuta and add them to the appropriate data sources.

The Snowflake tags' Key and Value pairs will be reflected in Immuta as two levels: the Key will be the top level and the Value the second. As Snowflake tags are hierarchical, Snowflake tags applied to a database will also be applied to all of the schemas in that database, all of the tables within those schemas, and all of the columns within those tables. For example: If a database is tagged PII, all of the tables and columns in that database will also be tagged PII.

To add Snowflake as an external catalog, follow one of the tutorials below:

Caveats

Snowflake has some natural data latency. If you manually refresh the Governance page to see all tags created globally, users can experience a delay of up to two hours. However, if you run schema detection or a health check to find where those tags are applied, the delay will not occur because Immuta will only refresh tags for those specific tables.

Enable Snowflake

Snowflake is enabled through the App Settings page.

Snowflake Configuration

Once Snowflake has been enabled on an instance, all future Snowflake data sources will also be created natively within the immuta database of the linked Snowflake instance. In addition to creating views, Immuta will also periodically sync user metadata to a system table within the Snowflake instance.