Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Subscription policies manage table-level access to data. The how-to and reference guides in this section detail how to implement subscription policies to meet your compliance requirements and describe the types of subscription policies available.
Subscription policy: Author a subscription policy to manage access to data sources.
ABAC subscription policy: Author a subscription policy that uses attribute-based access controls to automate data access decisions.
Advanced DSL builder: Author a subscription policy using the advanced special functions.
Restricted subscription policy: As a data owner, author a single subscription policy that can apply to multiple data sources you own.
Clone, activate, or stage a global policy: Clone, activate, or stage a global policy.
Subscription policies: This reference guide describes the design and behavior of Immuta subscriptions policies.
Subscription policy access types: This reference guide describes read and write access policies and how they translate to privileges in the remote data platform.
Advanced use of special functions: This reference guide defines the advanced functions available when building subscription policies.
This guide demonstrates how to build attribute-based access control (ABAC) subscription policies using the policy builder in the Immuta UI. To build more complex policies than the builder allows, follow the Advanced rules DSL policy guide.
Determine your policy scope:
Global policy: Click the Policies page icon in the left sidebar and select the Subscription Policies tab. Click Add Subscription Policy and complete the Enter Name field.
Local policy: Navigate to a specific data source and click the Policies tab. Click Add Subscription Policy and select New Local Subscription Policy.
Select Allow users with specific groups/attributes.
Choose the condition that will drive the policy: when user is a member of a group or possesses attribute.
Use the subsequent dropdown to choose the group or attribute for your condition. You can add more than one condition by selecting + Add Another Condition. The dropdown menu in the subscription policy builder contains conjunctions for your policy. If you select or, only one of your conditions must apply to a user for them to see the data. If you select and, all of the conditions must apply.
Check the Require Manual Subscription checkbox to turn off automatic subscription. Enabling this feature will require users to manually subscribe to the data source if they meet the policy.
If you would like to make your data source visible in the list of all data sources in the UI to all users, click the Allow Data Source Discovery checkbox. Otherwise, this data source will not be discoverable by users who do not meet the criteria established in the policy.
If you would like users to have the ability to request approval to the data source, even if they do not have the required attributes or traits, check the Request Approval to Access checkbox. This will require an approver with permissions to be set.
For global policies: Select how you want Immuta to merge multiple global subscription policies that apply to a single data source.
Always Required: Users must meet all the conditions outlined in each policy to get access (i.e., the conditions of the policies are combined with AND
).
Share Responsibility: Users need to meet the condition of at least one policy that applies (i.e., the conditions of the policies are combined with OR
).
Note: To make this option selected by default, see the app settings page.
For global policies: Click the dropdown menu beneath Where should this policy be applied and select When selected by data owners, On all data sources, or On data sources. If you selected On data sources, finish the condition in one of the following ways:
tagged: Select this option and then search for tags in the subsequent dropdown menu.
with columns tagged: Select this option and then search for tags in the subsequent dropdown menu.
with column names spelled like: Select this option, and then enter a regex and choose a modifier in the subsequent fields.
in server: Select this option and then choose a server from the subsequent dropdown menu to apply the policy to data sources that share this connection string.
created between: Select this option and then choose a start date and an end date in the subsequent dropdown menus.
Click Create Policy. If creating a global policy, you then need to click Activate Policy or Stage Policy.
When you have multiple global ABAC subscription policies to enforce, create separate global ABAC subscription policies, and then Immuta will use boolean logic to merge all the relevant policies on the tables they map to.
This page details how users can create more complex policies using functions and variables in the Advanced DSL policy builder than the Subscription Policy builder allows.
For instructions on writing Global Subscription Policies, see the following tutorial.
Navigate to the App Settings Page.
Click Advanced Settings in the left panel, and scroll to the Preview Features section.
Check the Enable Enhanced Subscription Policy Variables checkbox.
Click Save.
Navigate to the Policies Page.
Select Subscription Policies and click + Add Subscription Policy.
Choose a name for your policy and select how the policy should grant access.
Select Create using Advanced DSL.
Select the rules for your policy from the Advanced DSL options. For example, creating a @hasTagsAsAttribute('Department', 'dataSource') would subscribe all users who have an attribute that matches a tag on a data source to that data source. So users with the attribute Department.Marketing
would be subscribed to data sources with the tag Marketing.
Check the Require Manual Subscription checkbox to turn off automatic subscription. Enabling this feature will require users to manually subscribe to the data source if they meet the policy.
If you would like to make your data source visible in the list of all data sources in the UI to all users, click the Allow Data Source Discovery checkbox. Otherwise, this data source will not be discoverable by users who do not meet the criteria established in the policy.
If you would like users to have the ability to request approval to the data source, even if they do not have the required attributes or traits, check the Request Approval to Access checkbox. This will require an approver with permissions to be set.
Select how you want Immuta to merge multiple global subscription policies that apply to a single data source.
Always Required: Users must meet all the conditions outlined in each policy to get access (i.e., the conditions of the policies are combined with AND
).
Share Responsibility: Users need to meet the condition of at least one policy that applies (i.e., the conditions of the policies are combined with OR
).
Select where this policy should be applied, On data sources, When selected by data owners, or On all data sources
If a user selects On data sources options include, with columns tagged, with columns spelled like, in server, and created between.
Click Create Policy.
Subscription policies manage access to tables; they can be defined one of four ways:
Anyone: Users will automatically be granted access (least restricted).
Anyone who asks (and is approved): Users will need to request access and be granted permission by the configured approvers (moderately restricted).
Users with specific groups or attributes: Only users with the specified groups/attributes will be able to see the data source and subscribe (moderately restricted). This restriction type is referred to as an attribute-based access control (ABAC) global subscription policy throughout this page.
Individual users you select: The data source will not appear in search results; data owners must manually add/remove users (most restricted).
Immuta almost always recommends the ABAC global subscription policies, as described in our two Secure policy use cases:
See Author a global subscription policy for a tutorial.
Immuta offers two types of subscription policies to manage read and write access in a single system:
Read access policies manage who can view data from the specified data sources.
Write access policies manage who can view and modify data in the specified data sources.
See the Subscription policy access types guide for details.
When building an ABAC global subscription policy, the users that match the condition of the policy will automatically be subscribed to the data source by default without taking any action once the policy is activated.
Should user attributes or groups change, that will immediately impact their subscription status if the change results in them either gaining or losing subscription access to table in question. That is the power in these types of policies. You are future proofing the need to alter policies and instead simply need to represent the correct metadata on your users and the policies will react.
Also, by default, user access is reflected appropriately in the Immuta user interface. If a user does not have access to a particular table, that data source will not be visible to them.
The above default behavior can be altered using the below settings in the subscription policy builder:
Allow Data Source Discovery Configures visibility so that data sources that the user cannot subscribe to are still discoverable in the Immuta.
Require Manual Subscription In this case, rather than automatically subscribing users to the table, they must click the subscribe button in Immuta to subscribe to the table. This can be valuable if data consumers don't want to have thousands of tables they could access listed in their data platform, only the ones they care about.
Request Approval to Access Selecting this option allows users to request access to a data source and be manually approved by a specified user, even if the requesting user does not meet the group or attribute conditions in the policy. This setting requires Allow Data Source Discovery to be selected as well; otherwise, they could never discover the data source to request the override. See Global subscription policy merging for a tutorial.
For instructions on creating ABAC subscription policies, see the ABAC subscription policy guide.
In some cases, multiple ABAC global policies may apply to a single data source and can allow delegation of policy to many different policy builders. Rather than allowing the two policies to conflict, Immuta combines the conditions of the subscription policies.
Combining of global subscription policies only occurs with ABAC subscription policies. See the next section for how to deal with non-ABAC subscription policy conflicts.
In the ABAC subscription policy builder, the below option determines how to merge subscription policies:
Always Required: Users must meet all the conditions outlined in each policy to get access (i.e., the conditions of the policies are combined with AND
).
Share Responsibility: Users need to meet the condition of at least one policy that applies (i.e., the conditions of the policies are combined with OR
).
Consider the following global subscription policies potentially created by completely different users:
Policy 1: (Always Required)
Allow users to subscribe to the data source when user is a member of group HR; otherwise, allow users to subscribe when approved by an Owner of the data source.
Policy 2: (Shared Responsibility)
Allow users to subscribe to the data source when user is a member of group Analytics; otherwise, allow users to subscribe when approved by anyone with permission Governance.
Policy 3: (Shared Responsibility)
Allow users to subscribe to the data source when user has attribute Office Location Ohio; otherwise, allow users to subscribe when approved by anyone with permission Audit.
If a data source exists where all of these policies apply, the subscription policies are combined, which will result in a policy like this:
Combined policy:
Allow users to subscribe to the data source when user (@isInGroups('HR')) AND ((@isInGroups('Analytics')) OR (@hasAttribute('Office Location', 'Ohio')))
Otherwise
Allow users to subscribe when approved by ( anyone with permission Owner (of this data source) )
AND
( ( anyone with permission GOVERNANCE ) OR ( anyone with permission AUDIT ) )
Note that all the policies which have Always Required
must have a manual override (approve by
) selected for there to be any approved by
in the final merged policy.
Once enabled on a data source, combined global subscription policies can be edited and disabled by data owners.
Find a real example of the power behind merging subscription policies in the Automate data access control decisions use case.
If straying outside ABAC subscription policies, it is possible multiple global subscription policies created may apply to a single data source and not merge.
More specifically, if two or more global subscription policies of the restriction levels listed below apply to the same table (even if an ABAC subscription policy is also present) source they will conflict:
Anyone
Anyone who asks (and is approved)
Individual users you select
When such conflicts occur, data owners can manually choose which policy will apply. To do this the data owner must
Disable the applied global subscription policy in the policies tab on a data source.
Provide a reason the global policy should be disabled.
Select which conflicting global subscription policy they want to apply.
Users can create more complex policies using functions and variables in the advanced DSL policy builder than the subscription policy builder allows.
After an application admin has enabled enhanced subscription policy variables, data governors and owners can create global subscription policies using all the functions and variables outlined below.
The above table provides some basic guidance, for more information refer to the Advanced use of special functions.
See the Advanced DSL tutorial for details on how to build a policy that includes advanced DSL.
By default, Immuta does not apply a subscription policy on registered data (unless an existing global policy applies to it).
You can disable this behavior, but there are very few scenarios in which it is recommended to disable. It is best to instead have subscription policies active on standby that will be applied as you add tags to your data sources that drive those subscription policies. This is because global policies that match registered data sources will apply to data sources, no matter which subscription policy is enabled by default.
Deprecation notice
The ability to configure the behavior of the default subscription policy has been deprecated. Once this configuration setting is removed from the app settings page, Immuta will not apply a subscription policy to registered data sources unless an existing global policy applies to them. To set an "Allow individually selected users" subscription policy on all data sources, create a global subscription policy with that condition that applies to all data sources or apply a local subscription policy to individual data sources.
There are two settings available as the default subscription policy: none or allow individually selected users.
None: The default. If this option is selected as the default subscription policy, a data source will have no subscription policy applied to it if
it is a new data source and no global policy matches it.
a data owner or governor removes an existing global subscription policy from the data source.
Once a global subscription policy matches or a data owner applies a local subscription policy to a data source, that policy will restrict users’ access to the table.
Allow individually selected users: If this option is selected, data owners have to manually add users as subscribers to the data source in Immuta for those users to query the underlying table.
Changing the default subscription policy setting only affects new data sources; existing data sources (and those in the process of being registered when the setting is changed) are unaffected. For example, if an Immuta data source’s subscription policy restricts access to members of the Marketing group before the feature is enabled, that existing subscription policy will still apply to that table in the underlying data platform; only users who are members of the Marketing group will be able to access that data.
For instructions on changing the default subscription policy setting, see the manage default subscription policy page.
Even if there is no subscription policy on a table, data owners and governors can manage data policies on data sources without affecting users’ access to the registered data sources. This can be powerful to manage table access outside of Immuta but want to manage data policies in Immuta.
If no subscription policy is applied to a data source, users can only subscribe as data source owners; they cannot be added as regular subscribers. To add regular subscribers, a data owner or governor must apply a subscription policy to the data source.
Immuta policies restrict access to data and apply to data sources at the local or global level:
Local policies are authored against specific tables, one at a time.
Global policies target tags instead of specific tables, allowing you to build a single policy that impacts a large percentage of your data rather than building separate local policies for each table.
Consider the following local and global policy examples:
Local policy example: Mask using hashing the values in the columns ssn
, last_name
, and home_address
.
Global policy example: Mask using hashing values in columns tagged PII
on all data sources.
In this scenario, the local policy would mask the sensitive columns specified in the data policy on the single data source it was created for. If only using local policies, a data owner or governor would have to write that policy for every data source on which they wanted to mask sensitive data. The second policy would mask any column tagged PII
on all data sources that had the PII
tag applied to a column. Because this global policy automatically applies to those qualifying data sources, that policy only needs to be written once.
Consequently, global policies are the best practice for using Immuta: they provide the most scalability and manageability of access control.
For details about subscription and data policy types, see the subscription policies overview or data policies overview.
Best practice: access controls
In most cases, the goal is to share as much data as possible while still being compliant with privacy regulations. Immuta recommends a scale of wide subscription policies and specific data policies to give as much access as possible.
Global policies can be authored by users with GOVERNANCE permission or data owners. Data owners' global policies will only attach to data sources they own (also called restricted global policies), even if the tags their policies target go beyond their data sources.
When building global policies at scale, never reference individual users. The goal is to instead reference metadata about users. This way, as users gain or lose that metadata, access logic is automatically updated. This future proofs policies.
User metadata are values connected to specific Immuta users and are used in policies to gain access to tables (with subscription policies) or data (with data policies). These attributes fall into three categories:
attributes: Attributes are key/value pairs that can be assigned to users.
groups: Groups are similar to roles in a data platform; users can be assigned to one or many groups.
group attributes: It is possible to assign attributes to groups. This is a shortcut for assigning a specific attribute to a large set of users: all the users in the group will be assigned the attribute.
The table below outlines the various states of global policies in Immuta.
Note: Policies that contain the circumstance When selected by data owners cannot be staged.
Data owners who are not governors can write restricted global policies for data sources that they own. With this feature, data owners have higher-level policy controls and can write and enforce policies on multiple data sources simultaneously, eliminating the need to write redundant local policies on data sources.
There is no special configuration necessary, once a user is a data owner, the global policy builder is available to them, and they can build global policies just like any other user with GOVERNANCE permission. The only difference is that global policy will be additionally restricted to only the data sources the user owns.
Data ownership can be assigned, but it is also automatically granted to the user that registered the data source with Immuta.
All data source subscribers can access a data source's policies on the policies tab of the data source, but only data owners and governors can edit it. Although not recommended, from there users can view existing policies or apply new local policies to the data source with the following features:
Apply Existing Policies Button: By clicking this button in the top right corner of the policies tab, users can search for and apply existing policies to the data source from other data sources or global policies.
Subscription Policy Builder: In this section, users can determine who may access the data source. If a subscription policy has already been set by a global policy, a notification and a Disable button appear at the bottom of this section. Users can click the Disable button to make changes to the subscription policy.
Data Policy Builder: In this section, users can create policies to enforce privacy controls on the data source and see a list of data policies currently applied to the data source.
All changes made to policies by data owners or governors appear in a collapsible Activity panel on the right side of the screen.
The information recorded in the activity panel includes when the data source was created, the name and type of the policy, when the policy was applied or changed, and if the policy is in conflict on the data source. Additionally, global policy changes are identified by the governance icon; all other updates are labeled by the data sources icon.
Immuta allows you to define policies at different levels of your data stack.
First are subscription policies, which are commonly termed table access grants or table-level access. Subscription policies control access to your tables. Immuta calls them subscription policies because they are not always an access grant but could also be the result of a data consumer finding the data, requesting access, and then being subscribed to it via Immuta policy you have in place.
Second are data policies, which control access more granularly inside a table. For example, Immuta can help you build policies to redact rows, mask columns, or even mask cells.
While it is possible to build policies one table at a time using Immuta, there isn't much value in doing so. These are termed local policies in Immuta.
To build policy at scale, you must use global policies. Global policies allow you to build policies that reference tags rather than physical tables or columns. So instead of building a policy like this mask column name in table customers
, you can instead build a policy such as mask columns tagged name anywhere you see the name tag
.
These global policies will then seek out the name tag, wherever found, and apply the policy, no matter the physical location of the tables that contain names. It's important to understand that Immuta supports tag-based global policies for more than just masking. Both subscription and row-level policies can be authored as global policies targeting tags instead of physical tables and columns.
How you get the tags on the tables and columns is outlined in the Automate data access control decisions use case.
There are many guides found in this section, but an efficient approach to learning how to author secure policy would be to first read the two Immuta use cases specific to secure:
And then to focus on the complex topics around how applying policy at scale is managed in Immuta, specifically
Overview on how to author policies at scale
Overview of subscription policies and data policies
Full reference guide for all data policies
Details on how to minimize policy downtime if there's a large amount of change due to data engineering in your data platform(s)
Details on how subscription policy conflicts and data policy conflicts are managed
Data policies determine what users see when they query data in a table they have access to.
This guide provides a general overview of data policies and their behavior.
: Certify policies, exempt users from policies, and view policy diffs on a data source.
: This guide describes all the data policies available in Immuta.
: This guide describes the types of masking policies available and when to use each.
: Row-level policies compare data values with user metadata at query-time to determine whether or not the querying user should have access to the individual rows of data.
: This guide describes the custom functions you can use to extend the PostgreSQL WHERE syntax.
: In some cases, two conflicting global masking policies apply to a single data source. This guide describes how Immuta handles those conflicts.
: When building a global data policy, governors can create custom certifications that must be acknowledged by data owners when the policy is applied to data sources.
: These policies reduce conflicts between masking policies that apply to a single column, allowing policies to scale more effectively across your organization.
Private preview
Write policies are only available to select accounts. Contact your Immuta representative to enable this feature.
Immuta offers two types of to manage read and write access in a single system:
Read access policies manage who can read data.
Write access policies manage who can modify data.
Both of these access types can be enforced at any of the restriction levels outlined in the .
The table below illustrates the access types supported by each integration.
Integration | Read access policies | Write access policies |
---|
To create a read or write access policy, see the .
Once a read or write access policy is enforced on an Immuta data source, it translates to the relevant privileges on the table, view, or object in the remote platform. The sections below detail how these access types are enforced for each integration.
The Snowflake integration supports read and write access subscription policies. However, when applying read and write access policies to Snowflake data sources, the privileges granted by Immuta vary depending on the object type. For example, users can register Snowflake views as Immuta data sources and apply read and write policies to them, but when a write policy is applied to a view only the SELECT
privilege will take effect in Snowflake, as views are read-only objects.
Users can register any object stored in Snowflake’s information_schema.tables
view as an Immuta data source. The table below outlines the Snowflake privileges Immuta issues when read and write policies are applied to various object types in Snowflake. Beyond the privileges listed, Immuta always grants the USAGE
privilege on the parent schema and database for any object that access is granted to for a particular user.
The Databricks Unity Catalog integration supports read and write access subscription policies. When users create a subscription policy in Immuta, Immuta uses the Unity Catalog API to issue GRANTS
or REVOKES
against the catalog, schema, or table in Databricks for every user affected by that subscription policy.
Users can register any object stored in Databricks Unity Catalog’s information_schema.tables
view as an Immuta data source. However, when applying read and write access policies to these data sources, the privileges granted by Immuta vary depending on the object type. For example, users can register foreign tables as Immuta data sources and apply read and write policies to them, but only a read policy will take effect in Databricks and allow users to SELECT
those tables. If a write policy is applied, Immuta will not issue SELECT
or MODIFY
privileges in Databricks.
The table below outlines the Databricks privileges Immuta issues when read and write policies are applied to various object types in Databricks Unity Catalog. Beyond the privileges listed, Immuta always grants the USAGE
privilege on the parent schema and catalog for any object that access is granted to for a particular user.
The Databricks Spark integration supports read access subscription policies. When a read access policy is applied to a data source, Immuta modifies the logical plan that Spark builds when a user queries data to enforce policies that apply that user. If the user is subscribed to the data source, the user is granted SELECT
on the object in Databricks. If the user does not have read access to the object, they are denied access.
The Starburst (Trino) integration supports read and write access subscription policies. In the Starburst (Trino) integration's default configuration, the following access values grant read and write access to Starburst (Trino) data when a user is granted access through a subscription policy:
READ
: When a user is granted read access to a data source, they can SELECT
on tables or views and SHOW
on tables, views, or columns in Starburst (Trino). This setting in enabled by default when you configure the Starburst (Trino) integration.
WRITE
: In its default setting, the Starburst (Trino) integration's write access value controls the authorization of SQL operations that perform data modification (such as INSERT
, UPDATE
, DELETE
, MERGE
, and TRUNCATE
). When users are granted write access to a data source through a subscription policy, they can INSERT
, UPDATE
, DELETE
, MERGE
, and TRUNCATE
on tables and REFRESH
on materialized views. This setting is enabled by default when you configure the Starburst (Trino) integration.
Because Starburst (Trino) can govern certain table modification operations (like ALTER
) separately from data modification operations (like INSERT
), Immuta allows users to specify what modification operations are permitted on data in Starburst (Trino). Administrators can allow table modification operations (such as ALTER
and DROP
tables) to be authorized as write operations through advanced configuration in the Immuta web service or Starburst (Trino) cluster with the following access values:
OWN
: When mapped via advanced configuration to Immuta write policies, users who are granted write access to Starburst (Trino) data can ALTER
and DROP
tables and SET
comments and properties on a data source.
CREATE
: When this privilege is granted on Starburst (Trino) data, an Immuta user can create catalogs, schemas, tables, or views on a Starburst (Trino) cluster. CREATE
is a Starburst (Trino) privilege that is not controlled by Immuta policies, and this property can only be set in the access-control.properties
file on the Starburst (Trino) cluster.
Administrators can customize table and data modification settings in one or both of the following places; however, the access-control.properties
overrides the settings configured in the Immuta web service:
Immuta web service: Configuring write policies in the Immuta web service allows all Starburst (Trino) clusters targeting that Immuta tenant to receive the same write policy configuration for Immuta data sources. This configuration only affects tables or views registered as Immuta data sources. Use the option below to control how unregistered data is affected.
Immuta web service access grants mapping
Customizing read and write access in the Immuta web service affects operations on all Starburst (Trino) data registered as Immuta data sources in that Immuta tenant. This configuration method should be used when all Starburst (Trino) data source operations should be affected identically across Starburst (Trino) clusters connected to the Immuta web service. Example configurations are provided below. Contact your Immuta representative to customize the mapping of read or write access policies for your Immuta tenant.
Default configuration
The default setting shown below maps WRITE
to READ
and WRITE
permissions and maps READ
to READ
. Both the READ
and WRITE
permission should always include READ
.
In this example, if a user is granted write access to a data source through a subscription policy, that user can perform data modification operations (INSERT
, UPDATE
, MERGE
, etc.) on the data.
Custom configuration
The following configuration example maps WRITE
to READ
, WRITE
, and OWN
permissions and maps READ
to READ
. Both READ
and WRITE
permissions should always include READ
.
In this example, if a user gets write access to a data source through a subscription policy, that user can perform both data (INSERT
, UPDATE
, MERGE
, etc.) and table (ALTER
, DROP
, etc.) modification operations on the data.
Starburst (Trino) cluster access grants mapping
The Starburst (Trino) integration can also be configured to allow read and write policies to apply to any data source (registered or unregistered in Immuta) on a specific Starburst (Trino) cluster.
The default setting shown below maps WRITE
to READ
and WRITE
permissions and maps READ
to READ
. Both the READ
and WRITE
permission should always include READ
.
In this example, if a user is granted write access to a data source through a subscription policy, that user can perform data modification operations (INSERT
, UPDATE
, MERGE
, etc.) on the data.
The following configuration example maps WRITE
to READ
, WRITE
, and OWN
permissions and maps READ
to READ
. Both READ
and WRITE
permissions should always include READ
.
In this example, if a user gets write access to a data source through a subscription policy, that user can perform both data (INSERT
, UPDATE
, MERGE
, etc.) and table (ALTER
, DROP
, etc.) modification operations on the data.
Two properties customize the behavior of read or write access for all Immuta users on that Starburst cluster:
immuta.allowed.immuta.datasource.operations
: This property governs objects (catalogs, schemas, tables, etc.) that are registered as data sources in Immuta. For these permissions to apply, the user must be subscribed in Immuta and not be an administrator (who gets all permissions).
immuta.allowed.non.immuta.datasource.operations
: This property governs objects (catalogs, schemas, tables, etc.) that are not registered as data sources in Immuta. This is the only property that allows the CREATE
permission, since CREATE
is enforced on new objects that do not exist in Starburst or Immuta yet (such as a new table being created with CREATE TABLE
).
Default configuration
By default, Immuta allows READ
and WRITE
operations to be authorized on data registered in Immuta, while all operations are permitted for data sources that are not registered in Immuta.
Custom configuration
In the example below, the configuration allows READ
, WRITE
, and OWN
operations to be authorized on data sources registered in Immuta and all operations are permitted on data that is not registered in Immuta. If a user gets write access to data registered in Immuta through a subscription policy, that user can perform both data (INSERT
, UPDATE
, MERGE
, etc.) and table (ALTER
, DROP
, etc.) modification operations on the data.
The Redshift integration supports read access subscription policies. Immuta grants the SELECT
Redshift privilege to the PUBLIC
role when the integration is configured, which allows all users who meet the conditions of a subscription policy to access the Immuta-managed view. When a data source is created, Immuta creates a corresponding dynamic view of the table with a join to a secure view that contains all Immuta users, their entitlements, their projects, and a list of the tables they have access to. When a read policy is created or updated (or when a user's entitlements change, they switch projects, or when their data source access is approved or revoked), Immuta updates the secure view to grant or revoke users' access to the data source. If a user is granted access to the data source, they can access the view. If a user does not have read access to the view, zero rows are returned when they attempt to query the view.
The Azure Synapse Analytics integration supports read access subscription policies. Immuta grants the SELECT
privilege to the PUBLIC
role when the integration is configured, which allows all users who meet the conditions of a subscription policy to access the Immuta-managed view. When a read policy is created or removed (or when a user's entitlements change, they switch projects, or when their data source access is approved or revoked), Immuta updates the view that contains the users' entitlements, projects, and a list of tables they have access to grant or revoke their access to the dynamic view. Users' read access is enforced through an access check function in each individual view. If a user is granted access to the data source, they can access the view. If a user does not have read access to the view, they receive an Access denied: you are not subscribed to the data source
error when they attempt to query the view.
The Google BigQuery integration supports read access subscription policies. In this integration, Immuta creates views that contain all policy logic. Each view has a 1-to-1 relationship with the original table, and read access controls are applied in the view. After data sources are registered, Immuta uses the custom user and role, created before the integration is enabled, to push the Immuta data sources as views into a mirrored dataset of the original table. Immuta manages grants on the created view to ensure only users subscribed to the Immuta data source will see the data.
READ
and READWRITE
access levelsWith the exception of the Starburst integration, users can only modify existing data when they are granted write access to data; they cannot create new tables or delete tables.
Write actions are not currently captured in audit logs.
Best practice: write global policies
Build global subscription policies using attributes and Discovered tags instead of writing local policies to manage data access. This practice prevents you from having to write or rewrite single policies for every data source added to Immuta and from manually approving data access.
Private preview
are only available to select accounts. Contact your Immuta representative to enable this feature.
At least one of the following permissions is required to manage write policies:
CREATE_DATA_SOURCE Immuta permission (to create local write policies)
GOVERNANCE Immuta permission (to create local or global write policies)
MANAGE_POLICIES domain permission (to create global write policies)
, , or integration
(for Snowflake integrations)
Determine your policy scope:
: Click the Policies page icon in the left sidebar and select the Subscription Policies tab. Click Add Subscription Policy and complete the Enter Name field.
: Navigate to a specific data source and click the Policies tab. Click Add Subscription Policy and select New Local Subscription Policy.
Select the access type:
Read Access: Control who can view the data source.
Write Access: Control who can view and modify data in the data source.
Select the level of access restriction you would like to apply:
Allow anyone: Check the Require Manual Subscription checkbox to turn off automatic subscription. Enabling this feature will require users to manually subscribe to the data source if they meet the policy.
Allow anyone who asks (and is approved):
Click Anyone or An individual selected by user from the first dropdown menu in the subscription policy builder.
Note: If you choose An individual selected by user, when users request access to a data source they will be prompted to identify an approver with the permission specified in the policy and how they plan to use the data.
Select the Owner (of the data source), USER_ADMIN, GOVERNANCE, or AUDIT permission from the subsequent dropdown menu.
Note: You can add more than one approving party by selecting + Add Another Approver.
Allow users with specific groups/attributes: See the for instructions.
Allow individually selected users
For global policies: From the Where should this policy be applied dropdown menu, select When selected by data owners, On all data sources, or On data sources. If you selected On data sources, finish the condition in one of the following ways:
tagged: Select this option and then search for tags in the subsequent dropdown menu.
with columns tagged: Select this option and then search for tags in the subsequent dropdown menu.
with column names spelled like: Select this option, and then enter a regex and choose a modifier in the subsequent fields.
in server: Select this option and then choose a server from the subsequent dropdown menu to apply the policy to data sources that share this connection string.
created between: Select this option and then choose a start date and an end date in the subsequent dropdown menus.
Click Create Policy. If creating a global policy, you then need to click Activate Policy or Stage Policy.
Data owners who are not governors can write restricted and , which allow them to enforce policies on multiple data sources simultaneously, eliminating the need to write redundant .
Unlike global policies, the application of these policies is restricted to the data sources owned by the users or groups specified in the policy and will change as users' ownerships change.
Private preview
are only available to select accounts. Contact your Immuta representative to enable this feature.
At least one of the following permissions is required to manage write policies:
CREATE_DATA_SOURCE Immuta permission (to create local write policies)
GOVERNANCE Immuta permission (to create local or global write policies)
MANAGE_POLICIES domain permission (to create global write policies)
, , or integration
(for Snowflake integrations)
Click the Policies in the left sidebar and select Subscription Policies.
Click Add Policy, complete the Enter Name field.
Select the access type:
Read Access: Control who can view the data source.
Write Access: Control who can view and modify data in the data source.
Select the level of access restriction you would like to apply to your data sources:
Allow anyone: Check the Require Manual Subscription checkbox to turn off automatic subscription. Enabling this feature will require users to manually subscribe to the data source if they meet the policy.
Allow anyone who asks (and is approved):
Click Anyone or An individual selected by user from the first dropdown menu in the subscription policy builder.
Note: If you choose An individual selected by user, when users request access to a data source they will be prompted to identify an approver with the permission specified in the policy and how they plan to use the data.
Select the Owner (of the data source), USER_ADMIN, GOVERNANCE, or AUDIT permission from the subsequent dropdown menu.
Note: You can add more than one approving party by selecting + Add Another Approver.
Allow users with specific groups/attributes:
Choose the condition that will drive the policy: when user is a member of a group or possesses attribute. Note: To build more complex policies than the builder allows, follow the policy guide.
Use the subsequent dropdown to choose the group or attribute for your condition. You can add more than one condition by selecting + Add Another Condition. The dropdown menu in the subscription policy builder contains conjunctions for your policy. If you select or, only one of your conditions must apply to a user for them to see the data. If you select and, all of the conditions must apply.
Check the Require Manual Subscription checkbox to turn off automatic subscription. Enabling this feature will require users to manually subscribe to the data source if they meet the policy.
If you would like to make your data source visible in the list of all data sources in the UI to all users, click the Allow Data Source Discovery checkbox. Otherwise, this data source will not be discoverable by users who do not meet the criteria established in the policy.
If you would like users to have the ability to request approval to the data source, even if they do not have the required attributes or traits, check the Request Approval to Access checkbox. This will require an approver with permissions to be set.
Select how you want Immuta to merge multiple global subscription policies that apply to a single data source.
Always Required: Users must meet all the conditions outlined in each policy to get access (i.e., the conditions of the policies are combined with AND
).
Share Responsibility: Users need to meet the condition of at least one policy that applies (i.e., the conditions of the policies are combined with OR
).
Note: To make this option selected by default, see .
Allow individually selected users
From the Where should this policy be applied dropdown menu, select When selected by data owners, On all data sources, or On data sources. If you selected On data sources, finish the condition in one of the following ways:
tagged: Select this option and then search for tags in the subsequent dropdown menu.
with columns tagged: Select this option and then search for tags in the subsequent dropdown menu.
with column names spelled like: Select this option, and then enter a regex and choose a modifier in the subsequent fields.
in server: Select this option and then choose a server from the subsequent dropdown menu to apply the policy to data sources that share this connection string.
created between: Select this option and then choose a start date and an end date in the subsequent dropdown menus.
Beneath Whose Data Sources should this policy be restricted to, add users or groups to the policy restriction by typing in the text fields and selecting from the dropdown menus that appear.
Opt to complete the Enter Rationale for Policy (Optional) field.
Click Create Policy, and then click Activate Policy or Stage Policy.
Best practice: write global policies
Build global policies with tags instead of writing local policies to manage data access. This practice will prevent you from having to write or rewrite single policies for every data source added to Immuta.
Determine your policy scope:
: Click the Policies page icon in the left sidebar and select the Data Policies tab. Click Add Policy and enter a name for your policy.
: Navigate to a specific data source and click the Policies tab. Scroll to the Data Policies section and click Add Policy.
Select Mask from the first dropdown menu.
Select columns tagged, columns with any tag, columns with no tags, all columns, or columns with names spelled like.
Select a masking type:
: Enter a constant in the field that appears next to the masking type dropdown.
:
Enter a regular expression and replacement value in the fields that appear next to the masking type dropdown.
From the next dropdown, choose to make the regex Case Insensitive and/or Global.
:
Select using fingerprint or specifying the bucket from the subsequent dropdown menu.
If specifying the bucket, select the Bucket Type and then enter the bucket size.
Note: If you choose by rounding as your masking type, the statistics of the data fingerprint will autogenerate the bucket size when the policy is applied to a data source.
: Select either using fingerprint or requiring group size of at least and enter a group size in the subsequent dropdown menu.
: Enter the custom function native to the underlying database.
Note: The function must be valid for the data type of the column. If it is not, the default masking type will be applied to the column.
Select everyone except, everyone, or everyone who to continue the condition.
everyone except: In the subsequent dropdown menus, choose is a member of group, possesses attribute, or is acting under purpose. Complete the condition with the subsequent dropdown menus.
for everyone who: Complete the Otherwise clause. You can add more than one condition by selecting + Add Another Condition. The dropdown menu in the policy builder contains conjunctions for your policy. If you select or, only one of your conditions must apply to a user for them to see the data. If you select and, all of the conditions must apply.
Opt to complete the Enter Rationale for Policy (Optional) field, and then click Add.
For global policies: Click the dropdown menu beneath Where should this policy be applied and select When selected by data owners, On all data sources, or On data sources. If you selected On data sources, finish the condition in one of the following ways:
tagged: Select this option and then search for tags in the subsequent dropdown menu.
with columns tagged: Select this option and then search for tags in the subsequent dropdown menu.
with column names spelled like: Select this option, and then enter a regex and choose a modifier in the subsequent fields.
in server: Select this option and then choose a server from the subsequent dropdown menu to apply the policy to data sources that share this connection string.
created between: Select this option and then choose a start date and an end date in the subsequent dropdown menus.
Click Create Policy. If creating a global policy, you then need to click Activate Policy or Stage Policy.
Click Add Certification in the data policy builder.
Enter a Certification Label and Certification Text in the corresponding fields of the dialog that appears.
Click Save.
When executing transforms in your data platform, new tables and views are constantly being created, columns added, data changed - transform DDL. This constant transformation can cause latency between the DDL and Immuta policies discovering, adapting, and attaching to those changes, which can result in data leaks. This policy latency is referred to as policy downtime.
The goal is to have as little policy downtime as possible. However, because Immuta is separate from data platforms and those data platforms do not currently have webhooks or eventing service, Immuta does not receive alerts of DDL events. This causes policy downtime.
This page describes the appropriate steps to minimize policy downtime as you execute transforms using dbt or any other transform tool and links to tutorials that will help you complete these steps.
Required:
: This feature detects destructively recreated tables (from CREATE OR REPLACE
statements) even if the table schema wasn’t changed. Enable schema monitoring when you register your data sources.
Recommended (if you are using Snowflake):
: This feature implements Immuta subscription policies as table GRANTS
in Snowflake rather than Snowflake row access policies. Note this feature may not be automatically enabled if you were an Immuta customer before January 2023; see to enable.
: This feature removes unnecessary Snowflake row access policies when Immuta project workspaces or impersonation are disabled, which improves the query performance for data consumers.
To benefit from the scalability and manageability provided by Immuta, you should author all Immuta policies as . Global policies are built at the semantic layer using tags rather than having to reference individual tables with policy. When using global policies, as soon as a new tag is discovered by Immuta, any applicable policy will automatically be applied. This is the most efficient approach for reducing policy downtime.
There are three different approaches for tagging in Immuta:
: This approach uses to automatically tag data.
: This approach pulls in the tags from an external catalog. Immuta supports Snowflake, Databricks Unity Catalog, Alation, and Collibra to pull in external tags.
: This approach requires a user to create and manually apply tags to all data sources using the Immuta API or UI.
Note that there is added complexity with manually tagging new columns with Alation, Collibra, or Immuta. These listed catalogs can only tag columns that are registered in Immuta. If you have a new column, you must wait until after schema detection runs and detects that new column. Then that column must be manually tagged. This issue is not present when manually tagging with Snowflake or Databricks Unity catalog because they are already aware of the column or using SDD because it runs after schema monitoring.
Access to and registration of views created from Immuta-protected tables only need to be taken into consideration if you are using both data and subscription policies.
Views will have existing data policies (row-level security, masking) enforced on them that exist on the backing tables by nature of how views work (the query is passed down to the backing tables). So when you tag and register a view with Immuta, you are re-applying the same data policies on the view that already exist on the backing tables, assuming the tags that drive the data policies are the same on the view’s columns.
If you do not want this behavior or its possible negative performance consequences, then Immuta recommends the following based on how you are tagging data:
For auto-tagging, place your incremental views in a separate database that is not being monitored by Immuta. Do not register them with Immuta, and schema monitoring will not detect them from the separate database.
For either manually tagging option, do not tag view columns.
Using either option, the views will only be accessible to the person who created them. The views will not have any subscription policies applied to give other users access because the views are either not registered in Immuta or there are no tags. To give other users access to the data in the view, they should subscribe to the table at the end of the transform pipeline.
However, if you do want to share the views using subscription policies, you should ensure that the tags that drive the subscription policies exist on the view and that those tags are not shared with tags that drive your data policies. It is possible to target subscription policies on all tables or tables from a specific database rather than using tags.
Policy is enforced on READ
. Therefore, if you run a transform that creates a new table, the data in that new table will represent the policy-enforced data.
For example, if the credit_card_number
column is masked for Steve, on read, the real credit card numbers will be dynamically masked. If Steve then copies them into a new table via the transform, he is physically loading masked credit card numbers into that table. Now if another user, Jane, is allowed to see credit card numbers and queries the table, her query will not show the credit card numbers. This is because credit card numbers are already masked in that table. This problem only exists for tables, not views, since tables have the data physically copied into them.
To address this situation, you can do one of the following:
Use views for all transforms.
Ensure the users who are executing the transforms always have a higher level of access than the users who will consume the results of the transforms. Or, if this is not possible,
Set up a dev environment for creating the transformation code; then, when ready for production, have a promotion process to execute those production transformations using a system account free of all policies. Once the jobs execute as that system account, Immuta will discover, tag, and apply the appropriate policy.
Data downtime refers to the techniques you can use to hide data after transformations until Immuta policies have had a chance to synchronize. It makes data inaccessible; however, it is preferred to the data leaks that could occur while waiting for policies to sync.
Whenever DDL occurs, it can result in policy downtime, such as in the following examples:
A new column is added to a table that needs to be masked from users that have access to that table (potential data leak).
A new table is created in a space where other users have read access on future tables (potential data leak).
A tag that drives a policy is updated, deleted, or newly added with no other changes to the schema or table. This is a limitation of how Immuta can discover changes - it is too inefficient to search for tag changes, so schema changes are what drives Immuta to take action.
Immuta recommends all of the following best practices to ensure data downtime occurs during policy downtime:
Use CREATE OR REPLACE
for all DDL, including altering tags, so that access is always revoked.
Without these best practices, you are making unintentional policy decision in Snowflake that may be in conflict with your organization's actual policies enforced by Immuta.
For example, if the CREATE OR REPLACE
added a new column that contains sensitive data, and the user COPY GRANTS
, it opens that column to users, causing a data leak. Instead, access must be blocked using the above data downtime techniques until Immuta has synchronized.
Immuta recommends all of the following best practices to ensure data downtime occurs during policy downtime:
Without these best practices, you are making unintentional policy decision in Unity Catalog that may be in conflict with your organization's actual policies enforced by Immuta.
For example, if you GRANT SELECT
on a schema, and then someone writes a new table with sensitive data into that schema, it could cause a data leak. Instead, access must be blocked using the above data downtime techniques until Immuta has synchronized.
When schema monitoring is run globally, it will detect the following:
Any new table
Any new view
Changes to the object type backing an Immuta data source (for example, when a TABLE
changes to a VIEW
); when an object type changes, Immuta reapplies existing policies to the data source.
Any existing table destructively recreated through CREATE OR REPLACE
(even if there are no schema changes)
Any existing view destructively recreated through CREATE OR REPLACE
(even if there are no schema changes)
Any dropped table
Any new column
Any dropped column
Any column type change (which can impact policy application)
Any tag created, updated, or deleted (but only if the schema changed; otherwise tag changes alone are detected with Immuta’s health check)
Then, if any of the above is detected, for those tables or views, Immuta will complete the following:
Synchronize the existing policy back to the table or view to reduce data downtime
If SDD is enabled, execute SDD on any new columns or tables
If an external catalog is configured, execute a tag synchronization
Synchronize the final updated policy based on the SDD results and tag synchronization
See the image below for an illustration of this process with Snowflake.
The two options for running schema monitoring are described in the sections below. You can implement them together or separately.
If the data platform supports custom UDFs and external functions, you can wrap the /dataSource/detectRemoteChanges
endpoint with one. Then, as your transform jobs complete, you can use SQL to call this UDF or external function to tell Immuta to execute schema monitoring. The reason for wrapping in a UDF or external function is because dbt and transform jobs always compile to SQL, and the best way to make this happen immediately after the table is created (after the transform job completes) is to execute more SQL in the same job.
Consult your Immuta professional for a custom UDF compatible with Snowflake or Databricks Unity Catalog.
The default schedule for Immuta to run schema monitoring is every night at 12:30 a.m. UTC. However, this schedule can be updated by changing some advanced configuration. The processing time for schema monitoring is dependent on the number of tables and columns changed in your data environment. If you want to change the schedule to run more frequently than daily, Immuta recommends you test the runtime (with a representative set of DDL changes) before making the configuration change.
Consult your Immuta professional to update the schema monitoring schedule, if desired.
There are some use cases where you want all users to have access to all tables, but want to mask sensitive data within those tables. While you could do this using just data policies, Immuta recommends you still utilize subscription policies to ensure users are granted access.
Click the Policies icon in the left sidebar and navigate to the Data Policies or Subscription Policies tab.
Click the dropdown menu in the Actions column of a policy and select Clone.
Open the dropdown menu and click Edit in the Global Policy Builder. Then make your changes using the dropdown menus.
Click Create Policy, select Activate Policy, and then click Confirm.
The policy will display as Pending
until it is enforced on all data sources it applies to. See the for details.
Note: If a cloned policy contains custom certifications, the certifications will also be cloned.
Click the Policies icon in the left sidebar and navigate to the Data Policies tab.
Click the dropdown menu in the Actions column of one of the templated policies and select Activate. Note: If data governors decide to stage an active policy, they select Stage from this dropdown menu.
The policy will display as Pending
until it is enforced on all data sources it applies to. See the for details.
Click the dropdown menu in the Actions column of a policy and select Stage. Note: If data governors decide to make a staged policy active, they select Activate from this dropdown menu.
Click Confirm in the dialog that appears.
Requirement and prerequisite:
CREATE_DATA_SOURCE
or GOVERNANCE
Immuta permission
A
Determine your policy scope:
: Click the Policies page icon in the left sidebar and select the Data Policies tab. Click Add Policy and enter a name for your policy.
: Navigate to a specific data source and click the Policies tab. Scroll to the Data Policies section and click Add Policy.
Select Limit usage to purpose(s) in the first dropdown menu.
In the next field, select a specific purpose that you would like to restrict usage of this data source to or ANY PURPOSE. You can add more than one condition by selecting + Add Another Condition. The dropdown menu in the policy builder contains conjunctions for your policy. If you select or, only one of your conditions must apply to a user for them to see the data. If you select and, all of the conditions must apply.
Select for everyone or for everyone except. If you select for everyone except, you must select conditions that will drive the policy such as group, purpose, or attribute.
Opt to complete the Enter Rationale for Policy (Optional) field, and then click Add.
For global policies: Click the dropdown menu beneath Where should this policy be applied, and select On all data sources, On data sources, or When selected by data owners. If you select On data sources, finish the condition in one of the following ways:
tagged: Select this option and then search for tags in the subsequent dropdown menu.
with columns tagged: Select this option and then search for tags in the subsequent dropdown menu.
with column names spelled like: Select this option, and then enter a regex and choose a modifier in the subsequent fields.
in server: Select this option and then choose a server from the subsequent dropdown menu to apply the policy to data sources that share this connection string.
created between: Select this option and then choose a start date and an end date in the subsequent dropdown menus.
Click Create Policy. If creating a global policy, you then need to click Activate Policy or Stage Policy.
Data owners who are not governors can write restricted and data policies, which allow them to enforce policies on multiple data sources simultaneously, eliminating the need to write redundant .
Unlike global policies, the application of these policies is restricted to the data sources owned by the users or groups specified in the policy and will change as users' ownerships change.
Click Policies in the left sidebar and select Data Policies.
Click Add Policy and complete the Enter Name field.
Select how the policy should protect the data. Click a link below for instructions on building that specific data policy:
Opt to complete the Enter Rationale for Policy (Optional) field, and then click Add.
From the Where should this policy be applied dropdown menu, select When selected by data owners, On all data sources, or On data sources. If you selected On data sources, finish the condition in one of the following ways:
tagged: Select this option and then search for tags in the subsequent dropdown menu.
with columns tagged: Select this option and then search for tags in the subsequent dropdown menu.
with column names spelled like: Select this option, and then enter a regex and choose a modifier in the subsequent fields.
in server: Select this option and then choose a server from the subsequent dropdown menu to apply the policy to data sources that share this connection string.
created between: Select this option and then choose a start date and an end date in the subsequent dropdown menus.
Beneath Whose Data Sources should this policy be restricted to, add users or groups to the policy restriction by typing in the text fields and selecting from the dropdown menus that appear.
Click Create Policy, and then click Activate Policy or Stage Policy.
There are several different that are available for building subscription policies. Some of these functions, listed below, are narrowly focused on orchestrated RBAC use cases. Orchestrated RBAC is when an organization has many roles that represent access, and rather than switching to using the ABAC model provided by Immuta, they use these special functions to orchestrate existing roles using Immuta.
Specifically, the functions to enable orchestrated-RBAC are:
@hostname
@database
@schema
@table
@hasTagAsAttribute('Attribute Name', 'dataSource' or 'column')
@hasTagAsGroup('dataSource' or 'column')
Policy:
@hasAttribute('SpecialAccess', '@hostname.@database.*')
User:
has the attribute
SpecialAccess
with the valueus-east-1-snowflake.default.*
The user would be subscribed to all the data sources in the default
database. Note this has nothing to do with tags, it is based purely on the physical name of the host, database, schema, and table in the native data platform. Also note that the user attribute contains an asterisk *
to denote everything under the default database hierarchy. Asterisks are supported only for the infrastructure special functions:
@hostname
@database
@schema
@table
This is because, since it's an infrastructure view, Immuta can assume a 4-level hierarchy (hostname.database.schema.table) and an asterisk can be placed between any two objects in that 4-level hierarchy to represent any object, such as us-east-1-snowflake.*.hr
. That would give the user access to any schema named hr
in host us-east-1-snowflake
no matter the database.
However, that is not possible when using the tag-based special functions:
@hasTagAsAttribute('Attribute Name', 'dataSource' or 'column')
@hasTagAsGroup('dataSource' or 'column')
Lastly, the asterisk represents any object, but cannot be used for a concatenated wildcard like so: snowfl*.tpc.*.*
Policy:
@hasTagAsAttribute('PersonalData', 'dataSource')
User:
has the attribute key
PersonalData
with the values
Discovered.Person Name
Discovered.Entity
Data source 1:
tagged:
Discovered.Country
Discovered.Passport
Discovered.Person Name
Data source 2:
tagged:
Discovered.State
Discovered.Postal Code
Discovered.Entity.Social Security Number
Data source 3:
tagged:
Discovered.State
Discovered.Passport
The user would be subscribed to data source 1 and 2, but the user would not be subscribed to data source 3. This is because access moves from left-to-right in the hierarchy based on what the user possesses (the wildcard asterisk is implied).
So if a user had a more specific attribute key PersonalData
with the values Discovered.Entity.Social Security Number
, they would only get access to hypothetical data source 2, because their attribute is further left or matches (in this case matches) Discovered.Entity.Social Security Number
.
The below table provides more examples:
This could be helpful for use cases with a policy like the following:
If user has the attribute “Allowed_Domain.Domain A” they get access to generic data that is part of domain A.
If user has the attribute “Badge_Allowed.Badge X” they should gain access to both “generic data + any additional data (only in domain A because they only have “Data Domain A General Access”) that has been tagged as “Badge X”.
In this case it can be two separate subscription policies, such as
Policy 1: @hasTagAsAttribute(Allowed_Domain, ‘datasource’)
this would limit to the domains where they are allowed to see generic data.
Policy 2: @hasTagAsAttribute(Badge_Allowed, ‘datasource’)
this would limit to the badges they are allowed to see.
Then, when the data sources are tagged with table tags that represent access, if the table only has the domain tag, only policy 1 will apply; however, if it has a domain tag and a badge tag, both policies will be applied and merged successfully by Immuta.
Determine your policy scope:
: Click the Policies page icon in the left sidebar and select the Data Policies tab. Click Add Policy and enter a name for your policy.
: Navigate to a specific data source and click the Policies tab. Scroll to the Data Policies section and click Add Policy.
Select Only show data by time from the first dropdown.
Select where data is more recent than or older than from the next dropdown, and then enter the number of minutes, hours, days, or years that you would like to restrict the data source to. Note that unlike many other policies, there is no field to select a column to drive the policy. This type of policy will be driven by the data source's event-time column, which is selected at data source creation.
Choose for everyone, everyone except, or for everyone who to drive the policy. If you choose for everyone except, use the subsequent dropdown to choose the group, purpose, or attribute for your condition. If you choose for everyone who as a condition, complete the Otherwise clause before continuing to the next step.
Opt to complete the Enter Rationale for Policy (Optional) field, and then click Add.
For global policies: Click the dropdown menu beneath Where should this policy be applied, and select On all data sources, On data sources, or When selected by data owners. If you select On data sources, finish the condition in one of the following ways:
tagged: Select this option and then search for tags in the subsequent dropdown menu.
with columns tagged: Select this option and then search for tags in the subsequent dropdown menu.
with column names spelled like: Select this option, and then enter a regex and choose a modifier in the subsequent fields.
in server: Select this option and then choose a server from the subsequent dropdown menu to apply the policy to data sources that share this connection string.
created between: Select this option and then choose a start date and an end date in the subsequent dropdown menus.
Click Create Policy. If creating a global policy, you then need to click Activate Policy or Stage Policy.
The Immuta Policy Builder allows you to use custom functions that reference important Immuta metadata from within your where clause. These custom functions can be seen as utilities that help you create policies easier. Using the Immuta Policy Builder, you can include these functions in your policy queries by choosing where in the sub-action drop-down menu.
@attributeValuesContains()
FunctionThis function returns true
for a given row if the provided column evaluates to an attribute value for which the querying user has a corresponding attribute value. This function requires two arguments and accepts no more than three arguments.
# | Parameter | Type | Required | Description |
---|
@columnTagged()
FunctionThis function returns the column name with the specified tag.
If this function is used in a Global Policy and the tag doesn't exist on a data source, the policy will not be applied.
# | Parameter | Type | Required | Description |
---|
@groupsContains()
FunctionThis function returns true
for a given row if the provided column evaluates to a group to which the querying user belongs. This function requires at least one argument.
@hasAttribute()
FunctionThis function returns a boolean indicating if the current user has the specified attribute name and value combination. If the specified attribute name or attribute value has a single quote, you will need to escape it using a \'\'
expression within a custom WHERE
policy.
@iam
FunctionThis function returns the IAM ID for the current user.
None.
@isInGroups()
FunctionThis function returns a boolean indicating if the current user is a member of all of the specified groups. If any of the specified groups has a single quote, you will need to escape it using a \'\'
expression within a custom WHERE
policy.
@isUsingPurpose()
FunctionThis function returns a boolean indicating if the current user is using the specified purpose. If the specified purpose has a single quote, you will need to escape it using a \'\'
expression within a custom WHERE
policy.
@purposesContains()
FunctionThis function returns true
for a given row if the provided column evaluates to a purpose under which the querying user is currently acting. This function requires at least one argument and accepts no more than two arguments.
@username
FunctionThis function returns the current user's user name.
None.
Variable/Function | Description | Example |
---|---|---|
Policy state | Enforcement | Description |
---|---|---|
Snowflake object (information_schema.tables.table_type) | Read policy applied | Write policy applied |
---|
Databricks Unity Catalog object (information_schema.tables.table_type) | Read policy applied | Write policy applied |
---|
Administrators can customize write access configuration to grant additional Starburst (Trino) table modification privileges. See the below for an overview and example configurations.
Starburst (Trino) cluster: Configuring write policies using the file on a Starburst () cluster allows access to be broadly customized for Immuta users on that cluster. This configuration file takes precedence over write policies passed from the Immuta web service. Use this option if all Immuta users should have the same level of access to data regardless of the configuration in the Immuta web service.
The default configuration and an example of a custom configuration are provided below. See the for guidance on configuring these properties in your Starburst cluster.
The Amazon S3 integration supports read and write access subscription policies. Users can apply read and write access policies to data in S3 to restrict what prefixes, buckets, or objects users can access or modify. To enforce access controls on this data, Immuta creates S3 grants that are administered by S3 Access Grants, an AWS feature that defines access permissions to data in S3. To query a data source they are subscribed to, users request temporary credentials from their Access Grants instance. These just-in-time access credentials provide access to a prefix, bucket, or object with a permission level of in S3. When a user or application requests temporary credentials to access S3 data, the S3 Access Grants instance evaluates the request against the grants Immuta has created for that user. If a matching grant exists, S3 Access Grants assumes the IAM role associated with the location of the matching grant and scopes the permissions of the IAM session to the S3 prefix, bucket, or object specified by the grant and vends these temporary credentials to the requester. If the grant does not exist for the user, they receive an Access denied
error.
Immuta read policies translate to the READ
access level in S3 Access Grants and Immuta write policies translate to the READWRITE
access level. The table below outlines the Amazon S3 actions granted on an S3 data source when users meet the restrictions specified in an Immuta read or write access subscription policy that is applied to the data source. See the for more details about grants, access levels, and actions.
Immuta policy | S3 Access Grants access level | Amazon S3 action |
---|
Read and write access can also be granted manually by a data owner. See the for details.
When you have multiple global ABAC subscription policies to enforce, create separate global ABAC subscription policies, and then Immuta will .
This step is optional, but data governors can add certifications that outline acknowledgements or . For example, data governors could add a custom certification that states that data owners must verify that tags have been added correctly to their data sources before certifying the policy.
Using this approach, Immuta can automatically tag your data after it is registered by schema monitoring using sensitive data discovery (SDD). SDD is made of algorithms to discover and tag the data most important to you and your organization's policies. Once customized and deployed, any time Immuta discovers a new table or column through schema monitoring, SDD will run and automatically tag the new columns without the need for any manual intervention. This is the recommended option because once SDD is customized for your organization, it will eliminate the human error associated with manually tagging and is more proactive than manual tagging, further reducing policy downtime.
and customized before registering any data with Immuta. Contact your Immuta support professional to enable this feature flag for you before enabling SDD on the app settings page.
Using this approach, you will rely on humans to tag. Those tags will be stored in the data platform (Snowflake, Databricks Unity Catalog) or catalog (Alation, Collibra). Then they will be synchronized back to Immuta. If using this option, Immuta recommends storing the tags in the data platform because the write of the tags to Snowflake or Databricks Unity Catalog can be , removing burden from manually tagging on every run. API calls to Alation and Collibra are also possible, but are not accessible over dbt. Manually tagging through the Alation or Collibra UI will negatively impact .
Using this approach, you will rely on humans to tag, but the tags will be stored directly in Immuta. This can be done or through the However, manually tagging through the Immuta UI will negatively impact .
When registering tables with Immuta, you must register each database or catalog with . Schema monitoring means that you do not need to individually register tables but rather make Immuta aware of databases, and then Immuta will periodically scan that database for changes and register any new changes for you. .
An existing table or view is recreated with the statement. This will drop all policies resulting in users losing all access. There is one exception: with Databricks Unity, the grants on the table are not dropped, which means masking and row filtering policies are dropped but the table access is not, making policy downtime even more critical to manage.
Do not when executing a CREATE OR REPLACE
statement.
Do not use .
Do not grant to catalogs or schemas, because those (future is the problematic piece).
There is no way to stop Databricks Unity Catalog from carrying over table grants on CREATE OR REPLACE
statements, so ensure you as quickly as possible if you have row filters or masking policies on that table.
As discussed above, data platforms do not currently have webhooks or eventing service, so Immuta does not receive alerts of DDL events. Schema monitoring will run every 24 hours by default to detect changes, but schema monitoring should also be run across your databases after you make changes to them. and the payload can be scoped down run schema monitoring on a specific database or schema or column detection on a specific table.
Apply "New" tags to all tables and columns not previously registered in Immuta and lock them down with the
Subscription policies allow for Immuta to have a state to move table access into post-data-downtime to realize policy uptime. Without subscription policies, when Immuta synchronizes policy, users will continue to not have access to tables because there is no subscription policy granting them access. If you want all users to have access to all tables, use a in Immuta for all your tables. This will ensure users are granted access back to the tables after data downtime.
The policy will display as Pending
until it is removed from all data sources it applies to. See the for details.
: To restrict access to data and associate your data source with a purpose, create a project and add the purpose and relevant data sources to the project.
This is because Immuta cannot rely on a 4-level hierarchy always being the case. For example, *.Age
could mean many things in a tag hierarchy. However it does support using parent attributes to apply to child attributes as described in .
User Attributes | Data Source Tags | Subscribed? | Notes |
---|
It is also possible to build subscription policies separately that use these special functions and have them on data sources.
While this approach is extremely powerful, in many cases, it will continue to leave you dealing with policy complexity associated with RBAC. Read the use case for more details, specifically guide.
# | Parameter | Type | Required | Description |
---|
# | Parameter | Type | Required | Description |
---|
# | Parameter | Type | Required | Description |
---|
# | Parameter | Type | Required | Description |
---|
# | Parameter | Type | Required | Description |
---|
@database
Users who have an attribute key that matches a database will be subscribed to the data source(s) within the database.
@hasAttribute('SpecialAccess', '@hostname.@database.*'): If a user had the attribute SpecialAccess: us-east-1-snowflake.default.*
, they would get subscribed to all the data sources in the default
database.
@hasAttribute('Attribute Name', 'Attribute Value')
Users who have the specified attribute are subscribed to the data source.
@hasAttribute('Occupation', 'Manager'): Any user who has the attribute Occupation
and the attribute key Manager
will be subscribed to the data source(s).
@hasTagAsAttribute('Attribute Name', 'dataSource' or 'column' )
Users who have an attribute key that matches a tag on a data source or column will be subscribed to that data source.
@hasTagAsAttribute('PersonalData', 'dataSource'): Users who have the attribute key PersonalData
with the values Discovered.Passport
,Discovered.Entity
would be subscribed to Data Source 1, which is tagged:[Discovered.Passport
] and Data Source 2, which is tagged:[Discovered.Entity
]. However, they would not be subscribed to Data Source 3, which is tagged: [Discovered.Country
].
@hasTagAsGroup('dataSource' or 'column' )
Users who are members of a group that matches a tag on a data source or column (respectively) will be subscribed to that data source.
@hasTagAsGroup('dataSource'): If Data Source 1 has the tags NewHire
and Interns
applied, users who are members of the groups New Hire
or Interns
would be subscribed to Data Source 1.
@hostname
Users who have an attribute key that match a hostname will be subscribed to the data source(s) with that hostname.
@hasAttribute('SpecialAccess', '@hostname.*'): If a user had the attribute SpecialAccess : us-east-1-snowflake.*
, they would get subscribed to all the data sources with the us-east-1-snowflake
hostname.
@iam
Users who sign in with the IAM with the specified ID (ID that displays on the App Settings page) will be subscribed to the data source.
@iam == 'oktaSamlIAM': Any user whose IAM ID is oktaSamlIAM
can be subscribed to the data source.
@isInGroups('List', 'of', 'Groups')
Users who are members of the specified group(s) can be subscribed to the data source.
@isInGroups('finance','marketing','newhire'): Users who are members of the groups finance
, marketing
, or newhire
can be subscribed to the data source.
@schema
Users who have an attribute key that match this schema will be subscribed to the data source(s) under that schema.
@hasAttribute('SpecialAccess', '@hostname.@database.@schema'): If a user had the attribute SpecialAccess : us-east-1-snowflake.default.public.*
, they would get subscribed to all the data sources under the public
schema.
@table
Users who have an attribute key that match this table will be subscribed to the data source(s).
@hasAttribute('SpecialAccess', '@hostname.@database.@schema.@table'): If a user had the attribute SpecialAccess : us-east-1-snowflake.default.public.credit_transactions
, they would get subscribed to the credit_transactions
data source.
Active policies
Enforced
If policies are edited in this state, the changes will be immediately enforced on data sources when the changes are saved.
Deleted policies
Not enforced
Once a policy has been deleted, it cannot be recovered or reactivated.
Disabled policies
Not enforced
Data owners or governors can place a policy in this state at the local level for a specific data source. Although this is similar to the staged policy state, this policy will still be enforced on other data sources after it is disabled for a specific data source.
Pending policies
Not enforced
When Immuta pushes a new policy or a policy change to the remote platform, the policy state displays as Pending
until the change is applied to all relevant data sources in the remote platform. For example, if an active global policy is staged, the policy state is Pending
until that policy is removed from all data sources it applied to when it was active. If a policy's conditions are updated to apply to data sources tagged PII
, it will display as Pending
until it is enforced on all data sources tagged PII
and removed from data sources without that tag. Use this state to understand the amount of time your policies take to be applied to or removed from data in your remote platform. See the Data engineering with limited policy downtime guide for strategies to address policy latencies.
Staged policies
Not enforced
This state is useful when regularly editing and reviewing policies. This state also allows you to lift a policy's enforcement without deleting the policy so that it can easily be re-enforced. See Clone, Activate, or Stage a Global Policy for a tutorial.
Read |
|
|
Write |
|
|
'PersonalData': [ | ['Discovered.Identifier Indirect', | Yes | Exact match on |
'PersonalData': ['Discovered.Entity'] | ['Discovered.PHI', 'Discovered.Entity.Age'] | Yes | User attribute 'Discovered.Entity' is a hierarchical parent of data source tag 'Discovered.Entity.Age' |
'Access': [ | ['Discovered.Identifier Indirect', | No | The policy is written to only match values under the 'PersonalData' attribute key. Not 'Access'. |
'PersonalData': ['Discovered'] | ['Discovered.Entity.Age'] | Yes | User attribute 'Discovered' is a hierarchical parent of data source tag 'Discovered.Entity.Age' |
'PersonalData': ['Discovered.Entity.Social Security Number'] | ['Discovered.Entity'] | No | Hierarchical matching only happens in one direction (user attribute contains data source tag). In this case, the user attribute is considered hierarchical child of the data source tags. |
1 | Attribute Name | String | Required | The name of the attribute to retrieve values for |
2 | Column Name/SQL Expression | String | Required | The column that contains the value to match the attribute key against |
3 | Placeholder | String | Optional | A placeholder in case the list of values is empty |
1 | Tag Name | String | Required | The name of the tag |
1 | Column Name/SQL Expression | String | Required | The column that contains the value to match the group against |
2 | Placeholder | String | Optional | A placeholder in case the list of values is empty |
1 | Attribute Name | String | Required | The name of the attribute |
2 | Attribute Value | String | Required | The value to correspond with the attribute name |
1 | Group names | Array (String) | Required | A list of group names, e.g. |
1 | Purpose | String | Required | The name of the purpose to check the user against |
1 | Column Name/SQL Expression | String/Expression | Required | The column that contains the value to match the purpose against |
2 | Placeholder | String | Optional | A placeholder in case the list of values is empty |
To manage and apply existing policies to data sources, a user must have either the CREATE_DATA_SOURCE Immuta permission or be manually assigned the owner role on a data source.
After a policy with a certification requirement is applied to a data source, data owners will receive a notification indicating that they need to certify the policy.
Navigate to the Policies tab of the affected data source, and review the policy in the Data Policies section.
Click Certify Policy.
In the Policy Certification modal, click Sign and Certify.
Once you have a data policy in effect, you can view the changes in your policies by clicking the Policy Diff button in the data policies section on a data source's policies tab.
The Policy Diff button displays previous policies and the current policy applied to the data source.
Immuta row-level policies compare data values with user metadata at query-time to determine whether or not the querying user should have access to the individual rows of data.
The values contained in one or many columns in the table in question (or a separate joined table) need to be referenced by the policy for its logic to take effect.
For example, consider the policy below:
Only show rows where user is a member of a group that matches the value in the column tagged
Department
.
The data values (the values in the column tagged Department
) are matched against the user attribute (their groups) to determine whether or not rows will be visible to the user accessing the data.
The policy targets columns tagged Department
; this means that this policy can be applied globally across all tables and data platforms that have that tag with this single policy rather than having to build a separate policy for individual tables and columns.
It is also possible to use custom functions in custom WHERE row-level policies for more complex use cases.
These wrap Immuta context into free-form SQL logic for the row-level policy. That context can be things like the attributes (@attributeValuesContains()
) or groups (@groupsContains()
) possessed by the user or the username (@username
) - injected into the SQL at runtime.
Avoid referencing explicit column names in custom functions and instead use the @columnTagged('tag name')
function in SQL. In doing so, you can avoid having to reference the physical database world with the custom SQL policies and instead continue to target the metadata/tag world.
Private preview
This feature is only available to select accounts. Contact your Immuta representative to enable this feature.
Orchestrated masking policies (OMP) reduce conflicts between masking policies that apply to a single column, allowing policies to scale more effectively across your organization. Furthermore, OMP fosters distributed data stewardship, empowering policy authors who share responsibility of a data set to protect it while allowing data consumers acting under various roles or purposes to access the data.
When multiple masking policies apply to a column, Immuta combines the exception conditions of the masking policy so that data subscribers can access the data when they satisfy one of those exception conditions. Multiple masking policies will be enforced on a column if the following conditions are true:
Policies use the same masking type.
Policies use the for everyone except
condition.
Databricks Spark or Starburst (Trino) integration
OMP supports the following masking types:
Constant
Hashing
Format preserving masking
Null
Regex
Rounding
Governors can apply policies to all columns in a data source or target specific columns with tags or a regular expression. Without orchestrated masking policies enabled, when multiple global policies apply to the same columns, Immuta could only apply one of those policies.
Consider the following example to examine how policies behaved when one tag is used in two different policies:
Mask PII Global Policy 1: Mask using hashing the value in columns tagged email
except when user is acting under the purpose Email Campaign
.
Mask PII Global Policy 2: Mask using hashing the value in columns tagged email
except when user is acting under purpose Marketing
.
For columns tagged email
, only one of these policies is enforced. The Mask PII Global Policy 2 is not applied to the data source, so Immuta is not enforcing the masking policy properly for users who should be able to see emails because they are acting under the Marketing purpose.
Consider the following example where multiple masking policies apply to columns that have multiple tags, resulting in one policy applying:
Global Policy 3: Mask using hashing the value in columns tagged Employee Data
unless users are acting under the purpose Retention Analysis
.
Global Policy 4: Mask using hashing the value in columns tagged HR Data
unless users are acting under the purpose Employee Satisfaction Survey
.
If a column is tagged Employee Data
and HR Data
, Immuta will only apply one of the policies.
With orchestrated masking policies, Immuta applies multiple global masking policies that apply to a single column by combining the policy exceptions with OR. For these policies to combine, the masking type must be identical and the policy must use the for everyone except condition.
Consider the following example, both of these policies will apply to the data source:
Mask PII Global Policy 1: Mask using hashing the value in columns tagged email
except when user is acting under the purpose Email Campaign
.
Mask PII Global Policy 2: Mask using hashing the value in columns tagged email
except when user is acting under purpose Marketing
.
Users acting under the purpose Marketing
or Email Campaign
will be able to see emails in the clear.
However, in the following example, only one of these policies will apply to the data source because one masks using a constant and the other masks using hashing:
Global Policy 5: Mask using the constant REDACTED
the value in columns tagged Employee Data
unless users are acting under the purpose Retention Analysis
.
Global Policy 6: Mask using hashing the value in columns tagged HR Data
unless users are acting under the purpose Employee Satisfaction Survey
.
No UI enhancements were made in this release. Multiple masking policies applied to the same column are visible on a data source, but there is no indication that the exceptions are combined with OR
.
Masking types must match exactly for the policies to be combined. For example, both policies must mask using rounding.
Existing policies will not automatically migrate to the new policy logic when you enable the feature. To re-compute existing policies with the new logic, you must manually trigger global policy changes by staging and re-enabling each policy.
When building a global data policy, governors can create custom certifications, which must then be acknowledged by data owners when the policy is applied to data sources.
For example, data governors could add a custom certification that states that data owners must verify that tags have been added correctly to their data sources before certifying the policy.
When a global data policy with a custom certification is cloned, the certification is also cloned. If the user who clones the policy and custom certification is not a governor, the policy will only be applied to data sources that user owns.
In some cases, two conflicting global masking policies apply to a single data source. When this happens, the policy containing a tag deeper in the hierarchy will apply to the data source to resolve the conflict.
Consider the following global data policies created by a data governor:
Data policy 1:
Mask columns tagged
PII
by making null for everyone on data sources with columns taggedPII
Data policy 2:
Mask columns tagged
PII.SSN
using hashing for everyone on data sources with columns taggedPII.SSN
If a data owner creates a data source and applies the PII.SSN
tag to a column, both of these global masking policies will apply to the column with that tag. Instead of having a conflict, the policy containing a deeper tag in the hierarchy will apply.
In this example, data policy 2 will be applied to the data source because PII.SSN
is deeper and thus considered more specific than PII
. If data owners wanted to use data policy 1 on the data source instead, they would need to disable data policy 2.
Should two or more masking policies target the same column and have the same hierarchy depth, the policy that was authored first will win out. This is a conservative approach that avoids the original policy being changed unexpectedly.
Similar to masking policies, it is possible for two or more row-level policies to target the same table. When this occurs, all row-level policies will be applied and AND'ed together, meaning the user will need to meet all in some capacity to see any rows in the table at all.
To OR separate row-level policies together, build them into a single Immuta policy together with an OR.
When masking columns, the type of the column matters. For example, it is not possible to hash a numeric column, because the hash would render the number as a string.
Many data platforms make the user account for this by building separate data policies for every column type that could exist now or in the future, which is quite onerous.
Instead, Immuta has intelligent fallbacks. An intelligent fallback occurs when a masking type targets a column type that is incompatible with the masking type. In this case, Immuta will fall back to the most appropriate masking type which retains the level of privacy or better required by the previous type.
For example, if a hashing masking type hits a numeric type, it would intelligently fallback to nulling the column instead, since nulls are allowed in numeric types.
Sometimes a global data policy will target a table and the policy cannot be applied as written. This can happen for several reasons, but the most common is that the row-level policy logic is not relevant to the table in question.
For example, with the following policy
@attributeValuesContains('Attribute Name', 'SOME_COLUMN')
If SOME_COLUMN does not exist in the table, the row-level policy will not work (this is why it is always recommended to use the @columnTagged('tag name')
function instead of hard coding column names).
In the case where an error such as this occurs with a global data policy, the lockout policy will kick in. The lockout policy is a row-level policy that blocks any rows from returning for any users. This may seem extreme, but since Immuta does not know how to apply the policy, the lockout policy avoids data leaks until the policy is edited to work correctly.
Determine your policy scope:
Global policy: Click the Policies page icon in the left sidebar and select the Data Policies tab. Click Add Policy and enter a name for your policy.
Local policy: Navigate to a specific data source and click the Policies tab. Scroll to the Data Policies section and click Add Policy.
Select Minimize data source from the first dropdown.
Complete the enter percentage field to limit the amount of data returned at query-time.
Select for everyone except from the next dropdown menu to continue the condition. Additional options include for everyone and for everyone who.
Use the next field to choose the attribute, group, or purpose that you will match values against.
Notes:
If you choose for everyone who as a condition, complete the Otherwise clause before continuing to the next step.
You can add more than one condition by selecting + Add Another Condition. The dropdown menu in the far right of the Policy Builder contains conjunctions for your policy. If you select or, only one of your conditions must apply to a user for them to see the data. If you select and, all of the conditions must apply.
Opt to complete the Enter Rationale for Policy (Optional), and then click Add.
For global policies: Click the dropdown menu beneath Where should this policy be applied, and select On all data sources, On data sources, or When selected by data owners. If you select On data sources, finish the condition in one of the following ways:
tagged: Select this option and then search for tags in the subsequent dropdown menu.
with columns tagged: Select this option and then search for tags in the subsequent dropdown menu.
with column names spelled like: Select this option, and then enter a regex and choose a modifier in the subsequent fields.
in server: Select this option and then choose a server from the subsequent dropdown menu to apply the policy to data sources that share this connection string.
created between: Select this option and then choose a start date and an end date in the subsequent dropdown menus.
Click Create Policy. If creating a global policy, you then need to click Activate Policy or Stage Policy.
Determine your policy scope:
Global policy: Click the Policies page icon in the left sidebar and select the Data Policies tab. Click Add Policy and enter a name for your policy.
Local policy: Navigate to a specific data source and click the Policies tab. Scroll to the Data Policies section and click Add Policy.
Select the Only show rows action from the first dropdown.
Choose one of the following policy conditions:
Where user
Choose the condition that will drive the policy from the next dropdown: is a member of a group or possesses an attribute.
Use the next field to choose the attribute, group, or purpose that you will match values against.
Use the next dropdown menu to choose the tag that will drive this policy. You can add more than one condition by selecting + Add Another Condition. The dropdown menu in the far right of the policy builder contains conjunctions for your policy. If you select or, only one of your conditions must apply to a user for them to see the data. If you select and, all of the conditions must apply.
Where the value in the column tagged
Select the tag from the next dropdown menu.
From the subsequent dropdown, choose is or is not in the list, and then enter a list of comma-separated values.
Where
Enter a valid SQL WHERE clause in the subsequent field. When you place your cursor in this field, a tooltip details valid input and the column names of your data source. See Custom WHERE Clause Functions for more information about specific functions.
Never
The never condition blocks all access to the data source.
Choose the condition that will drive the policy from the next dropdown: for everyone, for everyone except, or for everyone who.
Select the condition that will further define the policy: is a member of group, is acting under a purpose, or possesses attribute.
Use the next field to choose the group, purpose, or attribute that you will match values against.
Choose for everyone, everyone except, or for everyone who to drive the policy. If you choose for everyone except, use the subsequent dropdown to choose the group, purpose, or attribute for your condition. If you choose for everyone who as a condition, complete the Otherwise clause before continuing to the next step.
Opt to complete the Enter Rationale for Policy (Optional) field, and then click Add.
For global policies: Click the dropdown menu beneath Where should this policy be applied, and select On all data sources, On data sources, or When selected by data owners. If you select On data sources, finish the condition in one of the following ways:
tagged: Select this option and then search for tags in the subsequent dropdown menu.
with columns tagged: Select this option and then search for tags in the subsequent dropdown menu.
with column names spelled like: Select this option, and then enter a regex and choose a modifier in the subsequent fields.
in server: Select this option and then choose a server from the subsequent dropdown menu to apply the policy to data sources that share this connection string.
created between: Select this option and then choose a start date and an end date in the subsequent dropdown menus.
Click Create Policy. If creating a global policy, you then need to click Activate Policy or Stage Policy.
Table ( | SELECT | SELECT, INSERT, UPDATE, DELETE, TRUNCATE |
View ( | SELECT |
Materialized view ( | SELECT |
External table ( | SELECT |
Event table ( | SELECT |
Iceberg table ( | SELECT |
Dynamic table ( | SELECT |
Data object from an incoming Data Share |
Table ( | SELECT | SELECT, MODIFY |
View ( | SELECT |
Materialized view ( | SELECT |
Streaming table ( | SELECT |
External table ( | SELECT |
Foreign table ( | SELECT |
Data object from incoming Delta Share |
Masking policies hide values in data, providing various levels of utility while still preserving privacy. Immuta offers column masking and cell-level masking.
Column masking policies allow you to hide the data in a column. However, there are several different approaches for masking data that allow you to make tradeoffs between privacy (how far you go with masking) vs utility (how much you want the masked data to be useful to the data consumer).
As with all Immuta policy types, it is recommended that you use global policies when authoring masking policies to manage policies at scale. When using global policies, tagging your data with metadata becomes critical and is described in detail in the Compliantly open more sensitive data for ML and analytics use case.
Categorical Randomized Response: Categorical values are randomized by replacing a value with some non-zero probability. Not all values are randomized, and the consumer of the data is not told which values are randomized and which ones remain unchanged. Values are replaced by selecting a different value uniformly at random from among all other values. If a randomized response policy were applied to a “state” column, a person’s residency could flip from Maryland to Virginia, which would provide ambiguity to the actual state of residency. This policy is appropriate when obscuring sensitive values such as medical diagnosis or survey responses.
Custom Function: This function uses SQL functions native to the underlying database to transform the values in a column. This can be used in numerous use cases, but notional examples include top-coding to some upper limit, a custom hash function, and string manipulation.
K-Anonymization: Masking through k-anonymization is a distinct policy that can operate over multiple attributes. A k-anonymization policy applies rounding and NULL masking policies over multiple columns so that the columns contain at least “K” records, where K is a positive integer. As a result, attributes will only be disclosed when there is a sufficient number of observations. This policy is appropriate to apply over indirect identifiers, such as zip code, gender, or age. Generally, each of these identifiers is not uniquely linked to an individual, but when combined with other identifiers can be associated with a single person. Applying k-anonymization to these attributes provides the anonymity of crowds so that individual rows are made indistinct from each other, reducing the re-identification risk by making it unclear which record corresponds to a specific person. Immuta requires that you opt in to use this masking policy type. To enable k-anonymization for your account, contact your Immuta representative. Immuta supports k-anonymization of text, numeric, and time-based data types.
Mask with Format Preserving Masking: This function masks using a reversible function but does so in a way that the underlying structure of a value is preserved. This means the length and type of a value are maintained. This is appropriate when the masked value should appear in the same format as the underlying value. Examples of this would include social security numbers and credit card numbers where Mask with Format Preserving Masking would return masked values in a format consistent with credit cards or social security numbers, respectively. There is larger overhead with this masking type, and it should really only be used when format is critically valuable, such as situations when an engineer is building an application where downstream systems validate content. In almost all analytical use cases, format should not matter.
Mask with Reversibility: This function masks in a way that an authorized user can “unmask” a value and reveal the value to an authorized user. Masking with Reversibility is appropriate when there is a need to obscure a value while allowing an authorized user to recover the underlying value. All of the same use cases and caveats that apply to Replace with Hashing apply to this function. Reversibly masked fields can leak the length of their contents, so it is important to consider whether or not this may be an attack vector for applications involving its use.
Randomized Response: This function randomizes the displayed value to make the true value uncertain, but maintains some analytic utility. The randomization is applied differently to both categorical and quantitative values. In both cases, the noise can be increased to enhance privacy or reduced to preserve more analytic value.
Datetime and Numeric Randomized Response: Numeric and datetime randomized response apply a tunable, unbiased noise to the nominal value. This noise can obscure the underlying value, but the impact of the noise is reduced in aggregate. This masking type can be applied to sensitive numerical attributes, such as salary, age, or treatment dates.
Replace with Constant: This function replaces any value in a column with a specified value. The underlying data will appear to be a constant. This masking carries the same privacy and utility guarantees as Replace with NULL. Apply this policy to strings that require a specific repeated value.
Replace with Hashing: This function masks the values with an irreversible hash, which is consistent for the same value throughout the data source, so you can count or track the specific values, but not know the true raw value. This is appropriate for cases where the underlying value is sensitive, but there is a need to segment the population. Such attributes could be addresses, time segments, or countries. It is important to note that hashing is susceptible to inference attacks based on prior knowledge of the population distribution. For example, if “state” is hashed, and the dataset is a sample across the United States, then an adversary could assume that the most frequently occurring hash value is California. As such, it's most secure to use the hashing mask on attributes that are evenly distributed across a population.
Replace with Null: This function replaces any value in a column with NULL
. This removes any identifiability from the column and removes all utility of the data. Apply this policy to numeric or text attributes that have a high re-identification risk, but little analytic value (names and personal identifiers).
Replace with REGEX: This function uses a regular expression to replace all or a portion of an attribute. REGEX replacement allows for some groupings to be maintained, while providing greater ambiguity to the disclosed value. This masking technique is useful when the underlying data has some consistent structure, the remasked underlying data represents some re-identification risk, and a regular expression can be used to mask the underlying data to be less identifiable.
Rounding: Immuta’s rounding policy reduces, rounds, or truncates numeric or datetime values to a fixed precision. This policy is appropriate when it is important to maintain analytic value of a quantity, but not at its native precision.
Date/Time Rounding: This policy truncates the precision of a datetime value to a user-defined precision. `minute`, `hour`, `day`, `months`, and `year` are the supported precisions.
Numeric Rounding: This policy maps the nominal value to the ceiling of some specified bandwidth. Immuta has a recommended bandwidth based on the Freedman-Diaconis rule.
Building a cell masking policy is done in the same manner as building a regular masking policy. The primary difference is when selecting who the policy should apply to, a where clause is injected.
For example, a regular masking policy looks like the following:
Mask columns tagged
Discovered.Entity.Social Security Number
using hashing for everyone except members of group admins
The cells can be conditionally masked by changing the for
to a where
:
Mask columns tagged
Discovered.Entity.Social Security Number
using hashing wherecountry_of_residence = 'US'
for everyone except members of group admins
That policy will check the country_of_residence
column in the table and if the value is US
the cell will be masked, otherwise the data will be presented in the clear as usual.
It is recommended that when referencing columns in custom SQL that you not use the physical column name as shown in the example above. Instead use the @columnTagged('tag name')
function. This will allow you to target the policy on any table with a country_of_residence
column no matter how that column is spelled on the physical table. For example, you would change the policy to the following example:
Mask columns tagged
Discovered.Entity.Social Security Number
using hashing where@columnTagged('country') = 'US'
for everyone except members of group admins
This example policy targets the column with the tag country
in the policy logic dynamically.
The masking functions described above can be implemented in a variety of use cases. Use the table below to determine the circumstance under which a function should be used.
Applicable to Numeric Data: The masking function can be applied to numeric values.
Column-Value Determinism: Repeated values in the same column are masked with the same output.
Introduces NULLs: The masking function may, under normal or irregular circumstances, return NULL values.
Performance: How performant the masking function will be (10/10 being the best).
Preserves Appearance: The output masked value resembles the valid column values. For example, a masking function would output phone numbers when given phone numbers. Here, NULL values are not counted against this property.
Preserves Averages: The average of the masked values (avg(mask(v))
) will be near the average of the values in the clear (avg(v)
).
Suitable for De-Identification: The masking function can be used to obscure record identifiers, hiding data subject identities and preventing future linking against other identified data.
Provides Deniability of Record Content: A (possibly identified) person can plausibly attribute the appearance of the value to the masking function. This is a desirable property of masking functions that retain analytic utility, as such functions must necessarily leak information about the original value. Fields masked with these functions provide strong protections against value inference attacks.
Preserves Equality and Grouping: Each value will be masked to the same value consistently without colliding with others. Therefore, equal values remain equal under masking while unequal values remain unequal, preserving equality. This implies that counting statistics are also preserved.
Preserves Message Length: The length of the masked value is equal to the length of the original value.
Preserves Range Statistics: The number of data values falling in a particular range is preserved. For strings, this can be interpreted as the number of strings falling between any two values by alphabetical order.
Preserves Value Locality: The output will remain near the input, which may be important for analytic purposes.
Reversible: Qualified individuals can reveal the original input value.
Masking policy support by integration
Since global policies can apply masking policies across multiple different databases at once, if an unsupported masking policy is applied to a column, Immuta will revert to NULLing that column.
See the integration support matrix for an outline of masking policies supported by each integration.
Data policies manage what users see when they query data in a table they have access to.
There are three different ways to restrict data with data policies:
Row-level: Filter rows from certain users at query time.
Column masking: Mask values in a column at query time.
Cell masking: Mask specific cells in a column based on separate values in the same row at query time.
When applying a data policy, it will always be enforced for all users, following the principle of least privilege, unless optional exceptions are added to policies. Data policy exceptions are built using any of the following conditions, which can be mixed with boolean logic:
If the user is a member of a group (or several groups)
If the user possesses a particular attribute (or several attributes)
If the user is acting under a purpose (or several purposes) for which the data is allowed to be used
Data policy exceptions are very similar to subscription policies from this perspective. With subscription policies, nobody has access to a newly created table until someone says otherwise with a subscription policy (as long as you follow best practices for newly created tables and views). Similarly, when a masking policy is set on a column or a row-level policy on a table, it applies to everyone until someone says otherwise with an exception to the data policy.
If user metadata is stored in a table in the same data platform where a policy is enforced, it is not necessary to move that user metadata in Immuta. Instead it can be referenced directly using custom WHERE functions in data policies.
Below is an example row-level policy that leverages a lookup table to dynamically drive access to rows in a table:
CREDIT_CARD_NUMBER | TRANSACTION_LOCATION | TRANSACTION_TIME | ACCESS_LEVEL |
---|---|---|---|
The final column in the table, ACCESS_LEVEL, defines who can see that row of data.
Now consider the following hierarchy:
In this diagram, there are 11 different access levels (AL) to data and the tree defines access. For example, if a user has Vegetables
, they get access levels 2, 3, 4, 9, 10, and 11. If a user has Pear
, they only get access level 8. In other words, a user with Vegetables
would see the first row of the above table, a user with Pear
would see the second row of the above table, and a user with Food
would see both rows of the table.
Taking the example further, that hierarchy tree is represented as a table in the data platform that we wish to use to drive the row-level policy:
That hierarchy lookup table can be referenced in the row-level policy as user metadata like this:
@columnTagged('access_level')
IN (SELECT ACCESS_LEVEL from [lookup table] where@attributeValuesContains('user_level', 'ROOT')
)
Walking through the policy step-by-step:
@columnTagged('access_level')
: This allows us to target multiple tables with an ACCESS_LEVEL column that needs protecting with a single policy. Simply tag all the ACCESS_LEVEL columns with the access_level
tag and this policy would apply to all of them.
IN (SELECT ACCESS_LEVEL from [lookup table]
: This is selecting the matching ACCESS_LEVEL from the lookup table to use as the IN clause for filtering the actual business table.
where @attributeValuesContains('user_level', 'ROOT')
: This is comparing the user's attribute user_level
to the value in the ROOT column, and if there's a match, that ACCESS_LEVEL is used for filtering in the previous step. See the custom WHERE documentation for more details on these functions.
So, you can then add metadata to your users in Immuta, such as Vegetables
or Pear
and that will result in them seeing the appropriate rows in the business table in question.
The above example used a row-level policy, but it could instead do cell masking using the same technique:
Mask columns tagged
Credit Card Number
using hashing where@columnTagged('access_level')
NOT IN (SELECT ACCESS_LEVEL from [lookup table] where@attributeValuesContains('user_level', 'ROOT')
)
In this case, the credit card number will be masked if the access_level is not found for the user for that row.
Even if not using a lookup table, the power of the @columnTagged('tag name')
function is apparent for applying your masking or row-level policies at scale.
There are several ways to mask data, and choosing the correct masking type has various tradeoffs. It's important to understand those tradeoffs when choosing masking types.
How data policy conflicts and intelligent fallbacks are managed
Once a user is subscribed to a data source, the data policies that are applied to that data source determine what data the user sees.
For all data policies, you must establish the conditions for which they will be enforced. Immuta allows you to append multiple conditions to the data. Those conditions are based on user attributes and groups (which can come from multiple identity management systems and applied as conditions in the same policy), or purposes they are acting under through Immuta projects.
Conditions can be directed as exclusionary or inclusionary, depending on the policy that's being enforced:
exclusionary condition example: Mask using hashing values in columns tagged PII
on all data sources for everyone except users in the group AUDIT
.
inclusionary condition example: Only show rows where user is a member of a group that matches the value in the column tagged Department
.
The table below outlines the policy types supported by each integration. Details about each of these policies are included in the policy types section.
*Supported with Caveats:
On Databricks data sources, joins will not be allowed on data protected with replace with NULL/constant policies.
On Trino data sources, the Immuta function @iam
for WHERE clause policies can block the creation of views.
For all policies except purpose-based restriction policies, inclusionary logic allows governors to vary policy actions with an Otherwise clause.
For example, governors could mask values using hashing for users acting under a specified purpose while masking those same values by making null for everyone else who accesses the data.
This variation can be created by selecting for everyone who when available from the condition dropdown menus and then completing the Otherwise clause.
Purposes help define the scope and use of data within a project and allow users to meet purpose restrictions on policies. Governors create and manage purposes and their sub-purposes, which project owners then add to their project(s) and use to drive Data Policies.
Purposes can be constructed as a hierarchy, meaning that purposes can contain nested sub-purposes, much like tags in Immuta. This design allows more flexibility in managing purpose-based restriction policies and transparency in the relationships among purposes.
For example, if the purpose Research
included Marketing
, Product
, and Onboarding
as sub-purposes, a governor could write the following global policy:
Limit usage to purpose(s) Research for everyone on data sources tagged PHI.
This hierarchy allows you to create this as a single purpose instead of creating separate purposes, which must then each be added to policies as they evolve.
Now, any user acting under the purpose or sub-purpose of Research
- whether Research.Marketing
or Research.Onboarding
- will meet the criteria of this policy. Consequently, purpose hierarchies eliminate the need for a governor to rewrite these global policies when sub-purposes are added or removed. Furthermore, if new projects with new Research purposes are added, for example, the relevant global policy will automatically be enforced.
Refer to the data governor policy guide for a tutorial on purpose-based restrictions on data.
Masking policies hide values in data, providing various levels of utility while still preserving privacy.
This policy masks the values with an irreversible sha256 hash, which is consistent for the same value throughout the data source, so you can count or track the specific values, but not know the true raw value.
Hashed values are different across data sources, so you cannot join on hashed values unless you enable masked joins on data sources within a project. Immuta prevents joins on hashed values to protect against link attacks where two data owners may have exposed data with the same masked column (a quasi-identifier), but their data combined by that masked value could result in a sensitive data leak.
This policy makes values null, removing any utility of the data the policy applies to.
With this policy, you can replace the values with the same constant value you choose, such as 'Redacted', removing any utility of that data.
This policy is similar to replacing with a constant, but it provides more utility because you can retain portions of the true value. When authoring the policy in Immuta, the regex and the replacement value do not need to be in single or double quotes.
The following regex rule would mask the final digits of an IP address:
Mask using a regex
\d+$
the value in the columnsip_address
for everyone.
In this case, the regular expression \d+$
\d
matches a digit (equal to [0-9])
+
Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
$
asserts position at the end of the string, or before the line terminator right at the end of the string (if any)
This ensures we capture the last digit(s) after the last .
in the ip address. We then can enter the replacement for what we captured, which in this case is XXX
. So the outcome of the policy, would look like this: 164.16.13.XXX
This regex rule applies masking to telephone numbers variably depending on the presence of a dash (implying a prefix), space, or only digits:
Mask using a regex (\+?\d{0,3}[-\s]?)?\d{4} the value in the column tagged
Discovered...Telephone Number
for everyone.
The image below illustrates authoring a regex global policy that will apply to Databricks Unity Catalog data sources:
Databricks Unity Catalog integration regex_replace function
The Databricks Unity Catalog integration uses Spark’s built in regex_replace
function. That Databricks function currently only supports global pattern flags set as global (g
) and case-sensitive. Regex will not work on this platform unless these settings are appropriately configured.
This is a technique to hide precision from numeric values while providing more utility than simply hashing. For example, you could remove precision from a geospatial coordinate. You can also use this type of policy to remove precision from dates and times by rounding to the nearest hour, day, month, or year.
Deprecation notice
Support for unmask requests has been deprecated.
This option masks the values using hashing, but allows users to submit an unmasking request to users who meet the exceptions of the policy.
Note: The user receiving the unmasking request must send the unmasked value to the requester.
With Reversible Masking, the raw values are switched out with consistent values to allow analysis without revealing the underlying sensitive data. The direct identifier is replaced with a token that can still be tracked or counted.
This option masks the value, but preserves the length and type of the value.
This option also allows users to submit an unmasking request to users who meet the exceptions of the policy.
Preserving the data format is important if the format has some relevance to the analysis at hand. For example, if you need to retain the integer column type or if the first 6 digits of a 12-digit number have an important meaning.
This option uses functions native to the underlying database to transform the column. Single quotes enclosing the regex and escaping special characters are required. The following example masks telephone numbers variably depending on the presence of a dash (implying a prefix), space, or only digits:
The image below illustrates authoring a global policy using this custom function:
Limitations
The masking functions are executed against the remote database directly. A poorly written function could lead to poor quality results, data leaks, and performance hits.
Using custom functions can result in changes to the original data type. In order to prevent query errors you must ensure that you cast this result back to the original type.
The function must be valid for the data type of the selected column. If it is not
Local policies will error and show a message that the function is not valid.
Global policies will error and change to the default masking type (hashing for text and NULL for all others).
For all of the policies above, both at the local and global policy levels, you can conditionally mask the value based on a value in another column. This allows you to build a policy that looks something like: "Mask bank account number where country = 'USA'" instead of blindly stating you want bank account masked always.
Note: When building conditional masking policies with custom SQL statements, avoid using a column that is masked using randomized response in the SQL statement, as this can lead to different behavior depending on your data platform and may produce results that are unexpected.
Sample data is processed during computation of k-anonymization policies
When a k-anonymization policy is applied to a data source, the columns targeted by the policy are queried under a fingerprinting process that generates rules enforcing k-anonymity. The results of this query, which may contain data that is subject to regulatory constraints such as GDPR or HIPAA, are stored in Immuta's metadata database.
The location of the metadata database depends on your deployment:
Self-managed Immuta deployment: The metadata database is located in the server where you have your external metadata database deployed.
SaaS Immuta deployment: The metadata database is located in the AWS global segment you have chosen to deploy Immuta.
To ensure this process does not violate your organization's data localization regulations, you need to first activate this masking policy type before you can use it in your Immuta tenant. To enable k-anonymization for your account, contact your Immuta representative.
K-anonymity is measured by grouping records in a data source that contain the same values for a common set of quasi identifiers (QIs) - publicly known attributes (such as postal codes, dates of birth, or gender) that are consistently, but ambiguously, associated with an individual.
The k-anonymity of a data source is defined as the number of records within the least populated cohort, which means that the QIs of any single record cannot be distinguished from at least k other records. In this way, a record with QIs cannot be uniquely associated with any one individual in a data source, provided k is greater than 1.
In Immuta, masking with k-anonymization examines pairs of values across columns and hides groups that do not appear at least the specified number of times (k). For example, if one column contains street numbers and another contains street names, the group 123, "Main Street"
probably would appear frequently while the group 123, "Diamondback Drive"
probably would show up much less. Since the second group appears infrequently, the values could potentially identify someone, so this group would be masked.
After the fingerprint service identifies columns with a low number of distinct values, users will only be able to select those columns when building the policy. Users can either use a minimum group size (k) given by the fingerprint or manually select the value of k.
Note: The default cardinality cutoff for columns to qualify for k-anonymization is 500.
Masking multiple columns with k-anonymization
Governors can write global data policies using k-anonymization in the global data policy builder.
When this global policy is applied to data sources, it will mask all columns matching the specified tag.
Applying k-anonymization over disjoint sets of columns in separate policies does not guarantee k-anonymization over their union.
If you select multiple columns to mask with k-anonymization in the same policy, the policy is driven by how many times these values appear together. If the groups appear fewer than k times, they will be masked.
For example, if Policy A
Policy A: Mask with k-anonymization the values in the columns
gender
andstate
requiring a group size of at least 2 for everyone
was applied to this data source
the values would be masked like this:
Note: Selecting many columns to mask with k-anonymization increases the processing that must occur to calculate the policy, so saving the policy may take time.
However, if you select to mask the same columns with k-anonymization in separate policies, Policy C and Policy D,
Policy C: Mask with k-anonymization the values in the column
gender
requiring a group size of at least 2 for everyonePolicy D: Mask with k-anonymization the values in the column
state
requiring a group size of at least 2 for everyone
the values in the columns will be masked separately instead of as groups. Therefore, the values in that same data source would be masked like this:
This policy masks data by slightly randomizing the values in a column, preserving the utility of the data while preventing outsiders from inferring content of specific records.
For example, if an analyst wanted to publish data from a health survey she conducted, she could remove direct identifiers and apply k-anonymization to indirect identifiers to make it difficult to single out individuals. However, consider these survey participants, a cohort of male welders who share the same zip code:
All members of this cohort have indicated substance abuse, sensitive personal information that could have damaging consequences, and, even though direct identifiers have been removed and k-anonymization has been applied, outsiders could infer substance abuse for an individual if they knew a male welder in this zip code.
In this scenario, using randomized response would change some of the Y's in substance_abuse
to N's and vice versa; consequently, outsiders couldn't be sure of the displayed value of substance_abuse
given in any individual row, as they wouldn't know which rows had changed.
How the randomization works
Immuta applies a random number generator (RNG) that is seeded with some fixed attributes of the data source, column, backing technology, and the value of the high cardinality column, an approach that simulates cached randomness without having to actually cache anything.
For string data, the random number generator essentially flips a biased coin. If the coin comes up as tails, which it does with the frequency of the replacement rate configured in the policy, then the value is changed to any other possible value in the column, selected uniformly at random from among those values. If the coin comes up as heads, the true value is released.
For numeric data, Immuta uses the RNG to add a random shift from a 0-centered Laplace distribution with the standard deviation specified in the policy configuration. For most purposes, knowing the distribution is not important, but the net effect is that on average the reported values should be the true value plus or minus the specified deviation value.
Preserving data utility
Using randomized response doesn't destroy the data because data is only randomized slightly; aggregate utility can be preserved because analysts know how and what proportion of the values will change. Through this technique, values can be interpreted as hints, signals, or suggestions of the truth, but it is much harder to reason about individual rows.
Additionally, randomized response gives deniability of record content not dataset participation, so individual rows can be displayed.
In some cases, you may want several different masking policies applied to the same column through Otherwise policies. To build these policies, select everyone who instead of everyone or everyone except. After you specify who the masking policy applies to, select how it applies to everyone else in the Otherwise condition.
You can add and remove tags in Otherwise conditions for global policies (unlike local policy Otherwise conditions), as illustrated above; however, all tags or regular expressions included in the initial everyone who rule must be included in an everyone or everyone except rule in the additional clauses.
Feature limitations
Masking struct and array columns is only available for Databricks data sources.
Immuta only supports Parquet and Delta table types.
Spark supports a class of data types called complex types, which can represent multiple data values in a single column. Immuta supports masking fields within array and struct columns:
array: an ordered collection of elements
struct: a collection of elements that are primitive or complex types
Without this feature enabled, the struct and array columns of a data source default to jsonb
in the Data Dictionary, and the masking policies that users can apply to jsonb
columns are limited. For example, if a user wanted to mask PII inside the column patient
in the image below, they would have to apply null masking to the entire column or use a custom function instead of just masking name
or address
.
After Complex Data Types is enabled on the App Settings page, the column type for struct columns for new data sources will display as struct
in the Data Dictionary. (For data sources that are already in Immuta, users can edit the data source and change the column types for the appropriate columns from jsonb
to struct
.) Once struct fields are available, they can be searched, tagged, and used in masking policies. For example, a user could tag name
, ssn
, and street
as PII instead of the entire patient
column.
After a global or local policy masks the columns containing PII, users who do not meet the exception specified in the policy will see these values masked:
Note: Immuta uses the >
delimiter to indicate that a field is nested instead of the .
delimiter, since field and column names could include .
.
Caveats
Struct columns with many fields
If users have struct columns with many fields, they will need to either
create the data source against a cluster running Spark 3 or
add spark.debug.maxToStringFields 1000
to their Spark 2 cluster's configuration.
To get column information about a data source, Immuta executes a DESCRIBE
call for the table. In this call, Spark returns a simple string representation of the schema for each column in the table. For the patient
column above, the simple string would look like this:
struct<name:string,ssn:string,age:int,address:struct<city:string,state:string,zipCode:string,street:text>>
Immuta then parses this string into the following format for the data source's dictionary:
However, if the struct contains more than 25 fields, Spark truncates the string, causing the parser to fail and fall back to jsonb
. Immuta will attempt to avoid this failure by increasing the number of fields allowed in the server-side property setting, maxToStringFields
; however, this only works with clusters on a Spark 3 runtime. The maxToStringFields
configuration in Spark 2 cannot be set through the ODBC driver and can only be set through the Spark configuration on the cluster with spark.debug.maxToStringFields 1000
on cluster startup.
These policies hide entire rows or objects of data based on the policy being enforced; some of these policies require the data to be tagged as well.
Note: When building row-level policies with custom SQL statements, avoid using a column that is masked using randomized response in the SQL statement, as this can lead to different behavior depending on whether you’re using the Spark or Snowflake and may produce results that are unexpected.
These policies match a user attribute with a row/object/file attribute to determine if that row/object/file should be visible. This process uses a direct string match, so the user attribute would have to match exactly the data attribute in order to see that row of data.
For example, to restrict access to insurance claims data to the state for which the user's home office is located, you could build a policy such as this:
Only show rows where user possesses an attribute in
Office Location
that matches the value in the columnState
for everyone except when user is a member of groupLegal
.
In this case, the Office Location
is retrieved by the identity management system as a user attribute or group. If the user's attribute (Office Location
) was Missouri
, rows containing the value Missouri
in the State
column in the data source would be the only rows visible to that user.
This policy can be thought of as a table "view" created automatically for the user based on the condition of the policy. For example, in the policy below, users who are not members of the Admins
group will only see taxi rides where passenger_count < 2
.
Only show rows where
public.us.taxis.passenger_count <2
for everyone except when user is a member of group Admins.
You can put any valid SQL WHERE clause in the policy. See the Custom WHERE clause functions for a list of custom functions.
WHERE clause policy requirement
All columns referenced in the policy must have fully qualified names. Any column names that are unqualified (just the column name) will default to a column of the data source the policy is being applied to (if one matches the name).
These policies restrict access to rows/objects/files that fall within the time restrictions set in the policy. If a data source has time-based restriction policies, queries run against the data source by a user will only return rows/blobs with a date in its event-time
column/attribute from within a certain range.
The time window is based on the event time you select when creating the data source. This value will come from a date/time column in relational sources.
These policies return a limited percentage of the data, which is randomly sampled, at query time. but it is the same sample for all the users. For example, you could limit certain users to only 10% of the data. Immuta uses a hashing policy to return approximately 10% of the data, and the data returned will always be the same; however, the exact number of rows exposed depends on the distribution of high cardinality columns in the database and the hashing type available. Additionally, Immuta will adjust the data exposed when new rows are added or removed.
Best practice: row count
Immuta recommends you use a table with over 1,000 rows for the best results when using a data minimization policy.
Public preview: This feature is currently in public preview and available to all accounts.
If a global masking policy applies to a column, you can still use that masked column in a global row-level policy.
Consider the following policy examples:
Masking policy: Mask values in columns tagged Country
for everyone except users in group Admin
.
Row-level policy: Only show rows where user possesses an attribute in OfficeLocation
that matches the value in column tagged Country
for everyone.
Both of these policies use the Country
tag to restrict access. Therefore, the masking policy and the row-level policy would apply to data source columns with the tag Country
for users who are not in the Admin
group.
Limitations
This feature is only available for Snowflake and Databricks Unity Catalog integrations.
This feature is only supported for global data policies, not local data policies.
This policy pairs with schema monitoring to mask newly added columns to data sources until data owners review and approve these changes from the requests tab of their profile page.
When this policy is activated by a governor, it will automatically be enforced on data sources that have the New
tag applied to them.
To learn how to activate this policy, navigate to the tutorial.
Write access is controlled through and
View-based integrations are read-only
View-based integrations are read-only
View-based integrations are read-only
ACCESS_LEVEL | ROOT |
---|---|
Gender | State |
---|---|
Gender | State |
---|---|
Gender | State |
---|---|
participant_id | zip_code | gender | occupation | substance_abuse |
---|---|---|---|---|
0123456789
Lewes, DE
00:07:34
4
9876543210
College Park, MD
09:16:08
8
1
Food
2
Food
3
Food
4
Food
5
Food
6
Food
7
Food
8
Food
9
Food
10
Food
11
Food
2
Vegetables
3
Vegetables
4
Vegetables
9
Vegetables
10
Vegetables
11
Vegetables
5
Fruits
6
Fruits
7
Fruits
8
Fruits
4
Carrots
9
Leafy
10
Leafy
11
Leafy
5
Orange
6
Orange
7
Orange
8
Pear
10
Lettuce
11
Lettuce
Female
Ohio
Female
Florida
Female
Florida
Female
Arkansas
Male
Florida
Null
Null
Female
Florida
Female
Florida
Null
Null
Null
Null
Female
Null
Female
Florida
Female
Florida
Female
Null
Null
Florida
...
...
...
...
...
880d0096
75002
Male
Welder
Y
f267334b
75002
Male
Welder
Y
bfdb43db
75002
Male
Welder
Y
260930ce
75002
Male
Welder
Y
046dc7fb
75002
Male
Welder
Y
...
...
...
...
...