1 of 100

Secure Your Data

Immuta allows you to secure your data through various access control policies you configure to mitigate compliance and security risks.

Getting started

This section provides use cases to guide you through implementing Immuta Secure.

Introduction

This section provides a conceptual overview of Immuta Secure and its benefits.

Authoring policies in Secure

This section provides how-to and references guides for authoring policies to enforce data access controls.

The how-to and references guides in this section illustrate how to use domains. Domains are containers of data sources that allow you to assign data ownership and access management to specific business units, subject matter experts, or teams at the nexus of cross-functional groups.

Projects and purpose-based access control

Projects combine users and data sources under a common purpose, which can then be used to restrict access to data and streamline collaboration.

Data consumers

The guides in this section illustrate how to subscribe to data sources in Immuta, run health jobs, and complete other actions as a data source subscriber.

Getting Started with Secure

Select your use case

Immuta allows you to secure your data through various access control policies you configure to mitigate the compliance and security risks.

Prerequisites

Data platform integration configured in Immuta
Users and data sources have been registered in Immuta

Use cases

Immuta Secure allows you to build complex access control policies in a simple, scalable manner. However, there are many different ways customers think about access control, so Secure is extremely flexible. Because of this, Immuta has broken up Secure into common use cases that customers follow to speed up their onboarding process. Choose the use case below that best fits your goals. If none of them fit, contact your customer support representative for a more personalized onboarding experience:

Automate data access control decisions: This is the most common use case. It walks you through how to build table access control policies in a scalable manner and clarifies how to think about table access and adjust existing paradigms.
Compliantly open more sensitive data for ML and analytics: This is an approach where every user has access to every table, yet you mask the sensitive columns appropriately using data policies. You can apply some of the concepts from the automate data access control decisions use case to how you think about masking policy rules.
Federated governance for data mesh and self-serve data access: Data mesh is a concept where you delegate operational control of data products to data producers across your organization. This requires delegation of policy management as well, which is the goal of this use case. This use case also covers a generic data shopping experience where users can discover and subscribe to data.

The use cases are a starting point for your Secure journey - you don’t necessarily need to follow them exactly, but they should give you the conceptual framework for thinking through your own use cases, so it is encouraged to read them all.

Automate Data Access Control Decisions

Who is this for?

This guide is intended for users who want to build table access control policies in a scalable manner using Immuta.

Prerequisite

It's best to start with the Monitor and secure sensitive data platform query activity use case prior to this use case because it takes you through the basic configuration of Immuta.

Goals

This use case is the most common across customers. With it, you solve the problem of entitlement of users to data based on metadata about the user (attributes and groups) and metadata about the data (tags).

This use case is unique to Immuta, because rather than coupling an access decision with a role, you are able to instead decouple access decisions from user and data metadata. This is powerful because it means when metadata about the users or metadata about the data changes, the access for a user or set of users may also change - it is a dynamic decision.

Decoupling access decisions from metadata also eliminates the classic problem of role explosion. In a world where policy decisions are coupled to a role, you must manage a new role for every permutation of access, causing an explosion of roles to manage. Instead, with this use case you decouple that logic from the user metadata and data metadata, so real-time, dynamic decisions are possible. Immuta’s method of building policies allows for these many permutations in a clear, concise manner through the decoupling of policy logic from the user and data metadata.

This use case also eliminates the need to have a human in-the-loop for approval to data access. If you can describe clearly why the approver would approve an access request - that can instead be expressed as an Immuta subscription policy, as you’ll see below. Removing humans from this process increases the speed to access data, makes access decisions consistent, and removes error and favoritism.

Want to learn more? Check out this ABAC 101 blog on the methodology.

Table vs column access

Lastly, this use case is primarily focused on automating table grants, termed subscription policies in Immuta. We recommend you also read the Compliantly open more sensitive data for ML and analytics use case to learn about column masking, which will allow you to enforce more granular controls so that you can open more data. It's common to see customers mix the automate data access control decisions use case with the compliantly open more sensitive data for ML and analytics use case, so it's recommended to read both.

Business value

Following this use case will reap huge operational cost savings. You will have to manage far fewer data policies; those policies will be more granular, accurate, and easily authored; and you will be able to prove compliance more easily. Furthermore, instead of an explosion of incomprehensible roles, the users and data will have meaningful metadata matched with clear policies.

Quantified benefits:

75x fewer policy changes required to accomplish the same use cases
70% reduction in dedicated resources to data access management and policy administration
Onboarding new employees two weeks faster and provisioning new data one week faster
5% to 15% increase in client contract values to improve revenue

Unquantified benefits:

Improved compliance standard and security posture
Enhanced employee satisfaction
Better customer experiences

More details on the business value can be found in these reports:

GIGAOM: ABAC vs RBAC: The Advantage of Attribute-Based Access Control over Role-Based Access Control (Immuta is an OT-ABAC approach)
Forrester: The Total Economic Impact Of Immuta

Configuration steps

Follow these steps to learn more about and start using Immuta Secure to automate data access control decisions:

Complete the Monitor and secure sensitive data platform query activity use case.
Read the overview of the two access control paths: orchestrated RBAC and ABAC. These are the two different approaches (or mix) you can take to managing policy in Secure and their tradeoffs.
Manage user metadata. This step is critical to building scalable policy and understanding the considerations around how and what to capture. Tag your users with attributes and groups that are meaningful for Immuta global policies.
Manage data metadata. This is the final setup step you must complete before authoring policy. Tag your columns with tags that are meaningful.
Author policy. In this step, you will define your global data policy logic. Optionally test and deploy policy.

The Two Paths: Orchestrated RBAC and ABAC

When following this use case, you need to decide which path you fall in. This decision drives how you will manage user and data metadata as well as policies, so it's a critical decision.

If you aren’t sure, you should strive for ABAC. While it may seem more complicated to get started, in the long run it will provide you powerful flexibility and scalability of policy management. In this method, you tag your users and data with facts and prescribe policies that leverage those facts to make real-time decisions.

Orchestrated RBAC puts more strain on managing access decisions outside of your access logic (Immuta) because you need all access decisions in a single attribute on the user. Because of this, it more closely resembles the role explosion problem, and if you incorrectly select this path you will end up there over time. Orchestrated RBAC is tag-orchestrated RBAC and is supported by Immuta (in fact, many customers stick to this because of the benefits of the tag-orchestration).

Orchestrated RBAC: one-to-one

Do you have many permutations of static access with no overlap?

This method is for organizations whose access decision typically depends on a single variable:

If you have x, you have access to everything tagged y.

Furthermore, the access decision to objects tagged y never strays beyond x. In other words, there’s only ever one way to get access to objects tagged y - a 1:1 relationship. A good real-world example of orchestrated RBAC is

You must have signed data use agreement x to have access to data y.

ABAC: many-to-many

Do you have a lot of variables at play to determine access?

This method is for organizations that may have many differing x’s for access to objects y. Furthermore, x and y may actually be many different variables, such as

If you have a, b, and c you get access to objects tagged x, y, and z.

If you have d, you get access to objects tagged x.

Notice in this example that access to objects tagged x can happen through different decisions.

ABAC is also important when you have federated governance in play, where many different people are making policy logic decisions and that logic may stack on the same objects. A good real world example of ABAC is

you must reside in the US and be a full time employee to see data tagged US and Highly Sensitive.

Managing Data Metadata

Prerequisites

Your schema metadata is registered using either of the Detect use cases:

Monitor and secure sensitive data platform query activity use case (Snowflake only)
General Immuta configuration use case (if not using Snowflake)

Data tags for Detect

You should have already done some data tagging while configuring Immuta in the Detect getting started. That guide focuses on understanding where compliance issues may exist in your data platform and may not have fully covered all tags required for policy. Please read on to see if there's more work to be done with data tagging.

Considerations

Now that we’ve enriched facts about our users, let’s focus on the second point on the policy triangle diagram: the data tags. Just like you need user metadata, you need metadata on your data (tags) in order to decouple policy logic from referencing physical tables or columns. You must choose between the orchestrated RBAC or ABAC method of data access:

Orchestrated RBAC method: tag data sources at the table level
ABAC method: tag data at the table and column level

While it is possible to target policies using both table- and column-level tags, for ABAC it’s more common to target column tags because they represent more granularly what is in the table. Just like user metadata needs to be facts about your users, the data metadata must be facts about the data. The tags on your tables should not contain any policy logic.

Fact-based column tags are descriptive (recommended):

Column ssn has column tag social security number
Column f_name has column tag name
Column dob has column tags date and date of birth

Logic-based column tags requires subjective decisions (not recommended):

Column ssn has column tag PII
Column f_name has column tag sensitive
Column dob has column tag indirect identifier

But can't I get policy authoring scalability by tagging things with higher level classifications, like PII, so I can build broader policies? This is what Immuta’s classification frameworks are for.

Entity tags are facts about the contents of individual columns in isolation. Entity tags are what we listed above: social security number, name, date, and data of birth. Entity tags do not attempt to contextualize column contents with neighboring columns' contents. Instead, categorization and classification tags describe the sensitive contents of a table with the context of all its columns, which is what is listed in the logic-based tags above, things like PII, sensitive, and indirect identifier.

For example, under the HIPAA framework a list of procedures a doctor performed is only considered protected health information (PHI) if it can be associated with the identity of patients. Since entity tagging operates on a single column-by-column basis, it can’t reason whether or not a column containing procedure codes merits classification as PHI. Therefore, entity tagging will not tag procedure codes as PHI. But categorization tagging will tag it PHI if it detects patient identity information in the other columns of the table.

Additionally, entity tagging does not indicate how sensitive the data is, but categorization tags carry a sensitivity level, the classification tag. For example, an entity tag may identify a column that contains telephone numbers, but the entity tag alone cannot say that the column is sensitive. A phone number associated with a person may be classified as sensitive, while the publicly-listed phone number of a company might not be considered sensitive.

Contextual tags are really what you should target with policy where possible. This provides a way to create higher level objects for more scalable and generic policy. Rather than building a policy like “allow access to tables with columns tagged person name and phone number,” it would be much easier to build it like “allow access to tables with columns tagged PII.”

In short, you must tag your entities, and then rely on a classification framework (provided by Immuta or customized by you) to provide the higher level context, also as tags. Remember, the owners of the tables (those who created them) can tag the data with facts about what is in the columns without having to understand the higher level implications of those tags (categorization and classification). This allows better separation of duty.

For orchestrated-RBAC, the data tags are no longer facts about your data, they are instead a single variable that determines access. As such, they should be table-level tags (which also improves the amount of processing Immuta must do).

Applying data tags

There are several options for doing this, and if you are following along with the use cases for Detect getting started, you may have already accomplished the recommended option 1.

Immuta Discover's sensitive data discovery (SDD): This is the most powerful option. Immuta is able to discover your sensitive data, and you are able to extend what types of entities are discovered to those specific to your business. SDD can run completely within your data platform, with no data leaving at all for Immuta to analyze. SDD is more relevant for the ABAC approach because the tags are facts about the data.
Tags from an external source: You may have already done all the work tagging your data in some external catalog, such as Collibra, Alation, or your own homegrown tool. If so, Immuta can pull those tags in and use them. Out of the box Immuta supports Alation, Collibra, and Snowflake tags, and for anything else you can build a Custom REST Catalog Interface. But remember, just like user metadata, these should represent facts about your data and not policy decisions.
Manually tag: Just like with user metadata, you are able to manually tag tables and columns in Immuta from within the UI, using the Immuta API, or when registering the data, either during initial registration or subsequent tables discovered in the future through schema monitoring.

Data tag hierarchy

Just like hierarchy has an impact with user metadata, so can data tag hierarchy. We discussed the matching of user metadata to data metadata in the Managing user metadata guide. However, there are even simpler approaches that can leverage data tag hierarchy beyond matching. This will be covered in more detail in the Author policy guide, but is important to understand as you think through data tagging.

As a quick example, it is possible to tag your data with Cars and then also tag that same data with more specific tags (in the hierarchy) such as Cars.Nissan.Xterra. Then, when you build policies, you could allow access to tables tagged Cars to administrators, but only those tagged Cars.Nissan.Xterra to suv_inspectors. This will result in two separate policies landing on the same table, and the beauty of Immuta is that it will handle the conflict of those two separate policies. This provides a large amount of scalability because you have to manage far fewer policies.

Imagine if you didn’t have this capability? You would have to include administrators access to every policy you created for the different vehicle makes - and if that policy needed to evolve, such as adding more than administrators to all cars, it would be an enormous effort to make that change. With Immuta, it’s one policy change.

Author Policy

Now we are ready to focus on the third point of the triangle for data access decisions: access control policies, and, more specifically, how you can author them with Immuta now that you have the foundation with your data and user metadata in place.

It’s important to understand that you only need a starting point to begin onboarding data access control use cases - you do not have to have every user perfectly associated with metadata for all use cases, nor do you need all data tagged perfectly for all use cases. Start small on a focused access control use case, and grow from there.

The policy authoring approaches are broken into the two paths you should choose between discussed previously.

Path 1: Orchestrated RBAC policy authoring

With orchestrated RBAC you have established one-to-one relationships with how your users are tagged (attribute or group) and what that single tag explicitly gives them access to. Remember, the user and data metadata should be facts and not reflect policy decisions veiled in a single fact; otherwise, you are setting yourself up for role explosion and should be using ABAC instead.

Since orchestrated RBAC is all about one-to-one matching of user metadata to data metadata, Immuta has special functions in our subscription policies for managing this:

@hasTagAsAttribute('Attribute Name', 'dataSource' | 'column')
@hasTagAsGroup('dataSource' | 'column')

The first argument for @hasTagAsAttribute is the attribute key whose value tags will be matched against (or for @hasTagAsGroup simply the group name), and the second argument specifies whether the attribute values are matched against data source tags or column tags.

Understanding this, let’s use a simple example before we get into a hierarchical example. For this simple example, let’s say there are Strictly Confidential tables and Public Tables. Everyone has access to Public, and only special people have access to Strictly Confidential. Let’s also pretend there aren’t other decisions involved with someone getting access to Strictly Confidential (there probably is, but we’ll talk more about this in the ABAC path). It is just a single fact: you do or you don’t have that fact associated with you.

Then we could tag our users with their access under the Access attribute key:

Bob: Access: Strictly Confidential
Steve: Access: Public

Then we could also tag our tables with the same metadata:

Table 1: Strictly Confidential
Table 2: Public

Now we build a policy like so:

And ensure we target all tables with the policy in the builder:

Here is the policy definition payload YAML for the above policy using the v2 api:

actions:
  advanced: '@hasTagAsAttribute(''Access'', ''dataSource'')'
  allowDiscovery: false
  automaticSubscription: true
  shareResponsibility: false
  type: entitlements
name: Access
policyKey: Access
staged: false
template: false
type: subscription

Since this policy applies to all data sources, you only need a single policy in Immuta. When you change user metadata or data metadata, the access to those tables will automatically be updated by Immuta through the matching algorithm (user metadata matches the data metadata).

We could also use column tags instead, by tagging the columns appropriately:

Table 1 --> Column 2: Strictly Confidential
This tag change would require the following alteration to the function: @hasTagAsAttribute('Access', 'column') instead of @hasTagAsAttribute('Access', 'dataSource').

Finally, we could use groups instead of attributes:

Bob: Strictly Confidential
Steve: Public
This change would require the following alteration to the function: @hasTagAsGroup('Access', 'column') instead of @hasTagAsAttribute('Access', 'column').

In the above case, we are using column tags again, but it could (and probably should) be data source tags. The reason is that orchestrated RBAC is not using facts about what is in your data source (the table) to determine access. Instead, it's using the single variable as a tag, and because of that, the single variable is better represented as a table tag. This also reduces the amount of processing Immuta has to do behind the scenes.

We can get more complicated with a hierarchy as well, but the approach remains the same: you need a single policy and you use the hierarchy to solve who should gain access to what by moving from most restrictive to least restrictive in your hierarchy tree on both the users and data metadata:

Groups approach:

Bob: Strictly Confidential
Steve: Strictly Confidential.Confidential.Internal.Public

In this case, Steve would only gain access to tables or tables with columns tagged Strictly Confidential.Confidential.Internal.Public, but Bob would have access to tables or tables with columns tagged Strictly Confidential, Strictly Confidential.Confidential, Strictly Confidential.Confidential.Internal, and Strictly Confidential.Confidential.Internal.Public.

Orchestrated RBAC one-to-one matching can be powerful, but can also be a slippery slope to role explosion. You can quickly fall into rolling-up many decisions into a single group or attribute tied to the user (like Strictly Confidential) instead of prescriptively describing the real root reason of why someone should have access to Strictly Confidential in your policy engine. This is the power behind ABAC, and using the ABAC method of access can give you ultimate scalability and evolvability of policy with ease.

Path 2: ABAC policy authoring

Let’s say that having access to Strictly Confidential data isn’t a single simple fact attached to a user like we described in the orchestrated RBAC section (it likely isn't). Instead, let’s try to boil down why you would give someone access to Strictly Confidential. Let’s say that to have access to Strictly Confidential, someone should be

an employee (not contractor)
in the US
part of the Legal team

Those are clearly facts about users, more so than Strictly Confidential. If you can get to the root of the question like this, you should, because Immuta allows you to pull in that information about users to drive policy.

Why should you bother getting to the root?

Well, consider what we just learned about how to author policy with orchestrated RBAC above. Now let’s say your organization changes their mind and now wants the policy for access to Strictly Confidential to be this:

be an employee (not contractor)
be in the US or France
be part of the Legal team

With orchestrated RBAC this would be a nightmare, unless you already have a process to handle it. You would have to manually figure out all the users that meet that new criteria (in France) and update them in and out of Strictly Confidential. This is why orchestrated RBAC should always use a single fact that gains a user access.

However, if you are on the ABAC path, it’s a single simple policy logic change because of our powerful triangle that decouples policy logic from user and metadata attributes.

Let’s walk through an example:

The best approach is to build individual policies for each segment of logic associated with why someone should have access to Strictly Confidential information. Coming back to the original logic above, that would mean 3 (much more understandable) separate policies:

Here is the policy definition payload YAML for the above policy using the v2 api:

actions:
  allowDiscovery: false
  automaticSubscription: true
  entitlements:
    groups:
      - Employees
    operator: all
  shareResponsibility: false
  type: entitlements
circumstanceOperator: any
circumstances:
  - columnTag: Strictly Confidential
    type: columnTags
name: Employee Access
policyKey: Employee Access
staged: false
template: false
type: subscription

Here is the policy definition payload YAML for the above policy using the v2 api:

actions:
  allowDiscovery: false
  automaticSubscription: true
  entitlements:
      attributes:
        - name: Country
          value: US
      operator: all
  shareResponsibility: false
  type: entitlements
circumstanceOperator: any
circumstances:
  - columnTag: Strictly Confidential
    type: columnTags
name: Country Access
policyKey: Country Access
staged: false
template: false
type: subscription

Here is the policy definition payload YAML for the above policy using the v2 api:

actions:
  allowDiscovery: false
  automaticSubscription: true
  entitlements:
    groups:
      - Legal Team
    operator: all
  shareResponsibility: false
  type: entitlements
circumstanceOperator: any
circumstances:
  - columnTag: Strictly Confidential
    type: columnTags
name: Legal Team Accesss
policyKey: Legal Team Accesss
staged: false
template: false
type: subscription

Each policy targets tables tagged Strictly Confidential (they could alternatively target columns tagged that):

When each of those policies converge on a single table tagged Strictly Confidential, they will be merged together appropriately by Immuta, demonstrating Immuta's conflict resolution:

And whether merged with an AND or an OR is prescribed by this setting in the policy builder for each of the 3 individual policies:

If either side of the policies are Always Required that will result in an AND; otherwise, if both sides agree to Share Responsibility it will result in an OR. In this case, each of the 3 policies chose Always Required, which is why the three policies were AND’ed together on merge. Note this merging behavior is relevant to how you can have different users manage policies without risk of conflicts.

Now let’s move into our nightmare (without Immuta) scenario where the business has changed the definition of who should have access to Strictly Confidential data by adding France along with the other existing three requirements.

With Immuta, all you’ve got to do is change that single Country Access subscription policy to include France, and voila, you’re done:

Here is the policy definition payload YAML for the above policy using the v2 api:

actions:
  allowDiscovery: false
  automaticSubscription: true
  entitlements:
    attributes:
      - name: Country
        value: US
      - name: Country
        value: France
    operator: any
  shareResponsibility: false
  type: entitlements
circumstanceOperator: any
circumstances:
  - columnTag: Strictly Confidential
    type: columnTags
name: Country Access Two
policyKey: Country Access
staged: false
template: false
type: subscription

Which results in this merged policy on all the Strictly Confidential tagged tables:

As you can see, ABAC is much more scalable and evolvable. Policy management becomes a breeze if you are able to capture facts about your users and facts about your data, and decouple policy decisions from it.

Also be aware, this auto-merging logic Immuta provides enables you to also combine orchestrated RBAC and ABAC policies.

Manual overrides

Last but not least are manual overrides. While this use case is all about automating access decisions, there may be cases where you want to give a user access even if they don’t meet the automated criteria.

This is possible through this setting in each policy:

In this scenario, users marked as Owner of the registered data source in Immuta have the power to approve this override request by the user. There are ways to specify different approvers as well, or multiple approvers.

Also notice the grayed out Allow Data Source Discovery: normally, if you do not meet the criteria of a policy, the data source is not even visible to you in the Immuta UI or the data platform. However, if you have this checked, it is visible in the Immuta UI even if you don’t meet the policy. This must be checked in order to request approval to access. Otherwise, users would never find the data source to ask for approval in the Immuta UI.

Be aware that all policies that land on the table must have this same setting turned on in order to allow the manual override.

Managing User Metadata

You may have read the Automate data access control decisions use case already. If so you are aware of the two paths you must choose between: orchestrated-RBAC vs ABAC. To manage user metadata with this particular use case, you should use ABAC.

This is because you must know the contents and sensitivity of every column in your data ecosystem to follow this use case. With orchestrated RBAC, you tag your columns with access logic baked in. ABAC means you tag your columns with facts: what is in the column. It is feasible to do the latter, extremely hard to do the former (unless you use tag lineage described in the next topic), especially in a data ecosystem with constant change. This means that your users will need to have facts about them that drive policy decisions (ABAC) rather than single variables that drive access (as in orchestrated-RBAC).

Understanding that, please read the ABAC section in the automate data access control decisions use case's Managing user metadata guide.

Managing Data Products

A data product is a valuable output derived from data. It is designed to provide insights, support decision-making, and drive business value. Data products are created to meet business needs or solve problems within a domain or organization. These products can take various forms. In most cases, it is a collection of tables and views, which is what is supported by Immuta.

Data product metadata registration

The use case covers how to register your data metadata, but it's worth discussing in more depth here because you have multiple data product owners.

As a first step, all the tables/views on the data platform(s) must be onboarded in Immuta. With the (which is the Immuta default), onboarding objects in Immuta will not make any changes to existing accesses already in place.

Immuta only starts taking control when the first policy has been applied. We suggest that one team registers all objects from your data platform(s). Alternatively, different domain owners can register their own domain data. It is recommended to use a system account to do the data registration.

Regardless of which user registered the data, it's critical that is enabled to ensure future tables and views are discovered by Immuta and automatically registered to the proper domain.

Be aware that registering your data product metadata with Immuta does not make it available to users. It simply makes Immuta aware of its existence. To allow users to see it and subscribe, you would manage policies on it, discussed in the guide next.

Who will do this?

Any team member that has the necessary user credentials in the underlying platform to read the data.

Managing Data Metadata

You may have read the use case already. If so you are aware of the you must choose between: orchestrated-RBAC vs ABAC. To manage data metadata with this particular use case, you should use ABAC.

This is because you want your data product owners to tag data with facts - what they have intimate knowledge of because they built the data product - and not have to be knowledgeable about all policies across your organization. With orchestrated RBAC, you tag your columns with access logic baked in. ABAC means you tag your columns with facts, what is in the column, which is why ABAC makes much more sense.

Understanding that, read the automate data access control decisions use case's guide.

Tags in a federated governance world

It is important to distinguish between tag definition and tag application. While tag definition (e.g., a tag called “Business Unit” with the values “Finance”, “Marketing”, “Sales”) should be strongly governed to guarantee consistency and coherence, tag application can be fully decentralized, meaning every domain or data owner can apply tag values (from the centrally governed list) to their data. There needs to be a process in place for data owners to request the definition of a new tag in case they identify any gaps.

Monitoring data products

It is important to leverage Immuta Discover's sensitive data discovery (SDD) to monitor your data products. This allows you to uncover if and when sensitive data may be leaked unknowingly by a data product and mitigate that leak quickly.

The use case covers this in great detail and is highly recommended for a data mesh environment.

Apply Federated Governance

In the distributed domains of a data mesh architecture, data governance and access control must be applied vertically (locally) within specific domains or data products and horizontally (globally). Global policies should be authored and applied in line with the ecosystem’s most generic and all-encompassing principles, regardless of the data’s domain (e.g., mask all PII data). Localized domain- or product-level policies should be fine-grained and applicable to only context-specific purposes or use cases (e.g., only show rows in the Sales table where the country value matches the user's office location). In Immuta we distinguish between subscription policies and data policies.

Subscription policies

It is possible to build Immuta subscription policies across all data products (horizontal), as shown in the diagram above, and then have those policies merged with additional subscription policies authored at the domain level (vertical). When this occurs, those subscription policies are merged as prescribed by the two or more policies being merged.

Whether the requirements for access are merged with an AND or an OR is prescribed by this setting in the policy builder for each of the individual policies:

Always Required = AND
Share Responsibility = OR

Subscription policies for a data product "shopping" experience

When building subscription policies, it can impact what a user can discover and, if desired, "put in their shopping cart" to use.

Allow Data Source Discovery: Normally, if a user does not meet the subscription policy, that data source is hidden from them in the Immuta UI. Should you check this option when building your subscription policy, the inverse is true: anyone can see this data source. This is important if you want users to understand if the data product exists, even if they don't have access.
Require Manual Subscription: Even if the user does meet the policy, instead of automatically subscribing them, they would have to discover and subscribe themselves. If they meet the policy, they will automatically be subscribed with no intervention. This is important if you want the users to maintain the list of data products they see in the data platform rather than all data products they have access to.
Request Approval to Access: This allows the user to request access, even if they don't meet the policy. Rules can determine what user manually overrides the policy to let them in.

Data policies

Once a user is subscribed to a data source via Immuta or has a pre-existing direct access on the underlying data platform, the data policies that are applied to that data source determine what data the user sees. Data policy types include masking, row-level, and other privacy-enhancing techniques.

An exemplary three-step approach to managing data policies would be

Update the global data policies as new sensitive data is potentially released by data products and discovered using Immuta Detect.

Strategy

Data mesh is a higher level use case that pulls from concepts learned across the other use cases:

Introduction

Immuta Secure is the final piece of the puzzle: Now that you understand where sensitive data lives (via Discover) and can monitor activity against that data (via Detect), you can now mitigate risk using Immuta Secure.

In short, Immuta Secure enables the management and delivery of trusted data at scale.

Challenge and goals

Managing access control in your data platform typically starts off easy, but over time becomes a house of cards. This concept is termed role explosion and is a result of having to keep up with every permutation of access across your organization. Once this occurs, it becomes difficult to evolve policies for fear of breaking existing access or because of a lack of understanding across your extensive role list.

Secure allows you to apply engineering principles to how you manage data access, giving your team the agility to lower time-to-data across your organization while meeting your stringent and granular compliance requirements. Immuta allows massively scalable, evolvable, and understandable automation around data policies; creates stability and repeatability around how those policies are maintained; allows distributed stewardship across your organization, but provides consistency of enforcement across your data ecosystem no matter your compute or data warehouse; and fosters more availability of data through the use of highly granular data controls.

How does it work?

Each of the guides below explains Secure principles in detail:

: A scalable and evolvable data management system allows you to make changes that impact thousands of tables at once, accurately. It also allows you to evolve your policies over time with minor changes (or no changes at all) through policy logic.
: Immuta can present policies in a natural language form that is easily understood and provide an audit history of changes to create a trust and verify environments. This allows you to prove policy is being implemented correctly to business leaders concerned with compliance and risk, and your business can meet audit obligations to external parties or customers.
: Immuta enables fine-grained data ownership and controls over organizational domains, allowing a data mesh environment for sharing data - embracing the ubiquity of your organization. You can enable different parts of your organization to manage their data policies in a self-serve manner without involving you in every step, and you can make data available across the organization without the need to centralize both the data and authority over the data. This frees your organization to share more data more quickly.
: With inconsistency comes complexity, both for your team and the downstream analysts trying to read data. That complexity from inconsistency removes all value of separating policy from compute. Immuta provides complete consistency so that you can build a policy once, in a single location, and have it enforced scalably and consistently across all your data platforms.
: Because of these highly granular decisions at the access control level, you can increase data access by over 50% in some cases when using Immuta because friction between compliance and data access is reduced.

Scalability and Evolvability

ABAC vs RBAC

Do you find yourself spending too much time managing roles and defining permissions in your system? When there are new requests for data, or a policy change, does this cause you to spend an inordinate amount of time to make those changes? Scalability and evolvability will completely remove this burden. When you have a scalable and evolvable data policy management system, it allows you to make changes that impact hundreds if not thousands of tables at once, accurately. It also allows you to evolve your policies over time with minor changes or no changes at all, through future-proof policy logic.

Lack of scalability and evolvability are rooted in the fact that you are attempting to apply a coarse role-based access control (RBAC) model to your modern data architecture. Using Apache Ranger, a well known legacy RBAC system built for Hadoop, as an example, independent research has shown the explosion of management required to do the most basic of tasks with an RBAC system: Apache Ranger Evaluation for Cloud Migration and Adoption Readiness.

In a scalable solution such as Immuta, that count of policy changes required will remain extremely low, providing the scalability and evolvability. GigaOm researched this exactly, comparing Immuta’s ABAC model to what they called Ranger’s RBAC with Object Tagging (OT-RBAC) model and showed a 75 times increase in policy management with Ranger.

https://gigaom.com/report/cloud-data-security/

Value to you: You have more time to spend on the complex tasks you should be spending time on and you don’t fear making a policy change.
Value to the business: Policies can be easily enforced and evolved, allowing the business to be more agile and decrease time-to-data across your organization and avoid errors.

Separating policy definition from role definition

When building access control into our database platforms, the concept of role-based access control (RBAC) is familiar. Roles both define who is in them, but also determine what those roles get access to. A good way to think about this is roles conflate the who and what: who is in them and what they have access to (but lack the why).

In contrast, attribute-based access control (ABAC) allows you to decouple your roles from what they have access to, essentially separating the what and why from the who, which also allows you to explicitly explain the “why” in the policy. This gives you an incredible amount of scalability and understandability in policy building. Note this does not mean you have to throw away your roles necessarily, you can make them more powerful and likely scale them back significantly.

If you remember this picture and article from the start of this introduction, most of the Ranger, Snowflake, Databricks, etc. access control scalability issues are rooted in the fact that it’s an RBAC model vs ABAC model.

Example: Building row-level security with an ABAC model

Consider that you have a table which contains a transaction_country column and you have data localization needs which requires you to limit specific countries to specific users.

With a classic RBAC approach, you would need to create a role for every permutation of country access. Remember that it's not necessarily just a role per country, because some users may need access to more than one country. Every time a new permutation of country combination is required, a new role must be managed to represent that access.

With Immuta's ABAC approach, since Immuta is able to decouple policy logic from users, you can simply assign users countries and Immuta will filter appropriately on the fly. This can be done with a single policy in Immuta which references the user country metadata. If you add a new user with a never before seen combination of countries, in the RBAC model, you would have to remember to create a new role and policy for them to see data. In the ABAC model it will “just work” since everything is dynamic - future proofing your policies.

For more discussion about this model, see the Role-Based Access Control vs. Attribute-Based Access Control — Explained blog or the NIST article on ABAC, Guide to Attribute Based Access Control (ABAC) Definition and Considerations.

Policy boolean logic

The only way to support AND boolean logic with a role-based model (RBAC) is by creating a new role that conflates the two or more roles you want to AND together.

For example, a governor wants users to only see certain data if they have security awareness training and have consumer privacy training. It would be natural to assume you need both separately as metadata attached to users to drive the policy. However, when you build policies in a role based model, it assumes roles are either OR’ed together in the policy logic or you can only act under one role at a time, and because of this, you will have to create a single role to represent this combination of requirements “users with security awareness training AND consumer privacy training.” This is completely silly and unmanageable - you need to account for every possible combination relevant to a policy, and you have no way of knowing that ahead of time.

With Immuta and its ABAC model, you are able to keep user attributes as meaningful separate facts about the users and then use boolean logic to combine those facts in policy logic. As an example, consider the country filtering policy described in the prior section: you could build the filtering, as described, but additionally add an exception such as "do this filtering for everyone except members of group security awareness training and members of group consumer privacy training" without the need to create a new role that represents those combined.

Exception-based policy authoring

This next section draws on an analogy: Imagine you are planning your wedding reception. It’s a rather posh affair, so you have a bouncer checking people at the door.

Do you tell your bouncer who’s allowed in? (exception-based) Or, do you tell the bouncer who to keep out? (rejection-based)

The answer to that question should be obvious, but many policy engines allow both exception- and rejection-based policy authoring, which causes a conflict nightmare. Exception-based policy authoring in our wedding analogy means the bouncer has a list of who should be let into the reception. This will always be a shorter list of users/roles if following the principle of least privilege, which is the idea that any user, program, or process should have only the bare minimum privileges necessary to perform its function - you can’t go to the wedding unless invited. This aligns with the concept of privacy by design, the foundation of the CPRA and GDPR, which states “Privacy as the default setting.”

What this means in practice is that you should define what should be hidden from everyone, and then slowly peel back exceptions as needed.

How could your data leak if it wasn’t exception based?

What if you did two policies:

Mask Person Name using hashing for everyone who possesses attribute Department HR.
Mask Person Name using constant REDACTED for everyone who possesses attribute Department Analytics.

Now, some user comes along who is in Department Finance - guess what, they will see the Person Name columns in the clear because they were not accounted for, just like the bouncer would let them into your wedding because you didn’t think ahead of time to add them to your deny list.

There are two main issues with allowing bi-directional policies, which is why Immuta only allows exception-based policies, aligning to the industry standard of least privileged access:

Ripe for data leaks: Rejection-based policies are extremely dangerous and why Immuta does not allow them except with a catch-all OTHERWISE statement at the end. Again this is because if a new role/attribute comes along that you haven’t accounted for, that data will be leaked. It is impossible for you to anticipate every possible user/attribute/group that could possibly exist ahead of time just like it’s impossible for you to anticipate any person off the street that could try to enter your posh wedding that you would have to account for on your deny list.
Ripe for conflicts and confusion: Tools that specifically allow both rejection-based and exception-based policy building create a conflict disaster. Let’s walk through a simple example, noting this is very simple, imagine if you have hundreds of these policies:
- Policy 1: mask name for everyone who is member of group A
- Policy 2: mask name for everyone except members of group B

What happens if someone is in both groups A and B? The policy will have to fall back on policy ordering to avoid this conflict, which requires users to understand all other policies before building their policy and it is nearly impossible to understand what a single policy does without looking at all policies.

Hierarchical tag-based policy definitions

While many platforms support the concept of object tagging / sensitive data tagging, very few truly support hierarchical tag structures.

First, a quick overview of what hierarchical tag structure means:

This would be a flat tag structure:

SUV
Subaru
Truck
Jeep
Gladiator
Outback

Each tag stands on its own and is not associated with one another in any way; there’s no correlation between Jeep and Gladiator nor Subaru and Outback.

A hierarchical tagging structure establishes these relationships:

SUV.Subaru.Outback
Truck.Jeep.Gladiator

Support for a tagging hierarchy is more than just supporting the tag structure itself. More importantly, policy enforcement should respect the hierarchy as well. Let’s run through a quick contrived example; you want the following policies:

Mask by making null any SUV data
Mask using hashing any Outback data

With a flat structure, if you build those policies they will be in conflict with one another. To avoid that problem you would have to order which policies take precedence, which can get extremely complex when you have many policies.

Instead, if your policy engine truly supports a tagging hierarchy, like Immuta does, it will recognize that Outback is more specific than SUV, and have that policy take precedence.

Mask by making null any SUV data
Mask using hashing any SUV.Subaru.Outback data

Policies are applied correctly without any need for complex ordering of policies.

Yes, this does put some work on the business to correctly build specificity, or depth, into their tagging hierarchy. This is not necessarily easy; however, the logic will have to live somewhere, and having it in the tagging hierarchy rather than policy order again allows you to separate policy definition from data definition. This provides you scalability, evolvability, understandability, and, most importantly, correctness because policy conflicts can be caught at policy-authoring-time.

Subscription policies: benefits of attribute-based table GRANTs

There are a myriad of techniques and processes companies use to determine what users should have access to which tables. Some customers have had 7 people responding to an email chain for approval before a DBA runs a table GRANT statement, for example. Manual approvals are sometimes necessary, of course, but there’s a lot of power and consistency in establishing objective criteria for gaining access to a table rather than subjective human approvals.

Let’s take the “7 people approve with an email chain” example. Ask the question, “Why do any of you 7 say yes to the user gaining access?” If it’s objective criteria, you can completely automate this process. For example, if the approver says, “I approve them because they are in group x and work in the US,” that is user metadata that could allow the user to automatically gain access to the tables, either ahead of time or when requested. This removes a huge burden from your organization and avoids mistakes.

Being objective is always better than subjective: it increases accuracy, removes bias, eliminates errors, and proves compliance. If you can be objective and prescriptive about who should gain access to what tables - you should.

The anti-pattern is manual approvals. Although there are some regulatory requirements for this, if there’s any possible way to switch to objective approvals, you should do it. With subjective human-driven approvals, there is bias, larger chance for errors, and no consistency - this makes it very difficult to prove compliance and is simply passing the buck (and risk) to the approvers and wasting their valuable time.

One could argue that it’s subjective or biased to assign a user the Country.JP attribute. This is not true, because, remember, data policy is separated from user metadata. The act of giving a user the Country.JP attribute is simply defining that user - it is a fact about that user, there is no implied access given to the user from this act and that attribute will be objective - e.g., you know if they are in Japan or not.

The approach where an access decision is conflated with a role or group is common practice. So not only do you end up with manual approval flows, but you also end up with role explosion from so many roles to meet every combination of access.

Availability of Data

Example of anonymizing a column rather than blocking it

By having highly granular controls coupled with anonymization techniques, more data than ever can be at the fingertips of your analysts and data scientists (in some cases, up to 50% more).

Why is that?

Let’s start with a simple example and get more complex. Obviously, if you can’t do row- and column-level controls and are limited to only GRANTing access to tables, you are either over-sharing or under-sharing. In most cases, it’s under-sharing: there are rows and columns in that table the users can see, just not all of them, but they are blocked completely from the table.

That example was obvious, but it can get a little more complex. If you have column-level controls, now you can give them access to the table, but you can completely hide a column from a user by making all the values in it null, for example. Thus, they’ve lost all data/utility from that column, but at least they can get to the other columns.

That masked column can be more useful, though. If you hash the values in that column instead, utility is gained because the hash is consistent - you can track and group by the values, but can’t know exactly what they are.

But you can make that masked column even more useful! If you use something like k-anonymization instead of hashing, they can know many of the values, but not all of them, gaining almost complete utility from that column. As your anonymization techniques become more advanced, you gain utility from the data while preserving privacy. These are termed privacy enhancing technologies (PETs) and Immuta places them at your fingertips.

This is why advanced anonymization techniques can get significantly more data into your analysts' hands.

Using k-anonymization to mask columns

While columns like first_name, last_name, email, and social security number can certainly be directly identifying, something like gender and race, on the surface, seem like they may not be directly identifying, but it could be. Imagine if there are very few Tongan men in a data set...in fact, for the sake of this example, lets say there’s only one. So if I know of a Tongan man in that company, I can easily run a query like this and figure out that person’s salary without using their name, email, or social security number:

select salary from [table] where race = 'Tongan' and gender = 'Male';

This is the challenge with indirect identifiers. It comes down to how much your adversary, the person trying to break privacy, knows externally, which is unknowable to you. In this case, all they had to know was the person was Tongan and a man (and there happens to be only one of them in the data) to figure out their salary, sensitive information. Let's also pretend the result of that query was a salary of 106072. This is called a linkage attack and is specifically called out in privacy regulations as something you must contend with, for example, from GDPR:

Article 4(1): "Personal data" means any information relating to an identified or identifiable natural person ("data subject"); an identifiable person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that person.

Almost any useful column with many unique values will be a candidate for indirectly identifying an individual, but also be an important column for your analysis. So if you completely hide every possible indirectly identifying column, your data is left useless.

You can solve this problem with PETs. Take note of two things by querying the data:

If you only search for “Tongan” alone (no Male), there are several Tongan women, so this linkage attack no longer works: select salary, gender from [table] where race = 'Tongan';
There are no null values in the gender or race columns.

Now let's say you apply the k-anonymization masking policy using Immuta.

Then you run this query again to find the Tongan man's salary: select salary from immuta_fake_hr_data where race = 'Tongan' and gender = 'Male';

You get no results.

Now you run this query ignoring the gender: select salary, gender from immuta_fake_hr_data where race = 'Tongan';

Only the women are returned.

The linkage attack was successfully averted. Remember, from our queries prior to the policy, the salary was 106072, so let’s run a query with that: select race, gender from immuta_fake_hr_data where salary = 106072;

There he is! But race will be suppressed (NULL) so this linkage attack will not work. It was also smart enough to not suppress gender because that did not contribute to the attack; suppressing race alone averts the attack. This is the magic of k-anonymization: it provides as much utility as possible while preserving privacy by suppressing values that appear so infrequently (along with other values in that row) that they could lead to a linkage attack.

Cell-level security

Cell-level security is not exactly an advanced privacy enhancing technology (PET) as in the example above, but it does provide impressive granular controls within a column for common use cases.

What is cell-level security?

If you have values in a column that should sometimes be masked, but not always, that is masking at the cell-level, meaning the intersection of a row with a column. What drives whether that cell should be masked or not is some other value (or set of values) in the rest of the row shared with that column (or a joined row from another table).

For example, a user wants to mask the credit card numbers but only when the transaction amount is greater than $500. This allows you to drive masking in a highly granular manner based on other data in your tables.

This technique is also possible using Immuta, and you can leverage tags on columns to drive which column in the row should be looked at to mask the cell in question, providing further scalability.

Data Engineering with Limited Policy Downtime

When executing transforms in your data platform, new tables and views are constantly being created, columns added, data changed - transform DDL. This constant transformation can cause latency between the DDL and Immuta policies discovering, adapting, and attaching to those changes, which can result in data leaks. This policy latency is referred to as policy downtime.

The goal is to have as little policy downtime as possible. However, because Immuta is separate from data platforms and those data platforms do not currently have webhooks or eventing service, Immuta does not receive alerts of DDL events. This causes policy downtime.

This page describes the appropriate steps to minimize policy downtime as you execute transforms using dbt or any other transform tool and links to tutorials that will help you complete these steps.

Prerequisites

Required:

: This feature detects destructively recreated tables (from CREATE OR REPLACE statements) even if the table schema wasn’t changed. Enable schema monitoring when you register your data sources.

Recommended (if you are using Snowflake):

: This feature implements Immuta subscription policies as table GRANTS in Snowflake rather than Snowflake row access policies. Note this feature may not be automatically enabled if you were an Immuta customer before January 2023; see to enable.
: This feature removes unnecessary Snowflake row access policies when Immuta project workspaces or impersonation are disabled, which improves the query performance for data consumers.

Step 1: Create global policies and prepare tags for data sources

To benefit from the scalability and manageability provided by Immuta, you should author all Immuta policies as . Global policies are built at the semantic layer using tags rather than having to reference individual tables with policy. When using global policies, as soon as a new tag is discovered by Immuta, any applicable policy will automatically be applied. This is the most efficient approach for reducing policy downtime.

There are three different approaches for tagging in Immuta:

: This approach uses to automatically tag data.
: This approach pulls in the tags from an external catalog. Immuta supports Snowflake, Databricks Unity Catalog, Alation, and Collibra to pull in external tags.
: This approach requires a user to create and manually apply tags to all data sources using the Immuta API or UI.

Note that there is added complexity with manually tagging new columns with Alation, Collibra, or Immuta. These listed catalogs can only tag columns that are registered in Immuta. If you have a new column, you must wait until after schema detection runs and detects that new column. Then that column must be manually tagged. This issue is not present when manually tagging with Snowflake or Databricks Unity catalog because they are already aware of the column or using SDD because it runs after schema monitoring.

Auto-tagging (recommended)

Manually tagging with an external catalog

Manually tagging in Immuta

Step 2: Register your data in Immuta

Step 3: Consider the result and user making transformations

Views vs tables

Access to and registration of views created from Immuta-protected tables only need to be taken into consideration if you are using both data and subscription policies.

Views will have existing data policies (row-level security, masking) enforced on them that exist on the backing tables by nature of how views work (the query is passed down to the backing tables). So when you tag and register a view with Immuta, you are re-applying the same data policies on the view that already exist on the backing tables, assuming the tags that drive the data policies are the same on the view’s columns.

If you do not want this behavior or its possible negative performance consequences, then Immuta recommends the following based on how you are tagging data:

For auto-tagging, place your incremental views in a separate database that is not being monitored by Immuta. Do not register them with Immuta, and schema monitoring will not detect them from the separate database.
For either manually tagging option, do not tag view columns.

Using either option, the views will only be accessible to the person who created them. The views will not have any subscription policies applied to give other users access because the views are either not registered in Immuta or there are no tags. To give other users access to the data in the view, they should subscribe to the table at the end of the transform pipeline.

However, if you do want to share the views using subscription policies, you should ensure that the tags that drive the subscription policies exist on the view and that those tags are not shared with tags that drive your data policies. It is possible to target subscription policies on all tables or tables from a specific database rather than using tags.

Access level of job executioner

Policy is enforced on READ. Therefore, if you run a transform that creates a new table, the data in that new table will represent the policy-enforced data.

For example, if the credit_card_number column is masked for Steve, on read, the real credit card numbers will be dynamically masked. If Steve then copies them into a new table via the transform, he is physically loading masked credit card numbers into that table. Now if another user, Jane, is allowed to see credit card numbers and queries the table, her query will not show the credit card numbers. This is because credit card numbers are already masked in that table. This problem only exists for tables, not views, since tables have the data physically copied into them.

To address this situation, you can do one of the following:

Use views for all transforms.
Ensure the users who are executing the transforms always have a higher level of access than the users who will consume the results of the transforms. Or, if this is not possible,
Set up a dev environment for creating the transformation code; then, when ready for production, have a promotion process to execute those production transformations using a system account free of all policies. Once the jobs execute as that system account, Immuta will discover, tag, and apply the appropriate policy.

Step 4: Force data downtime

Data downtime refers to the techniques you can use to hide data after transformations until Immuta policies have had a chance to synchronize. It makes data inaccessible; however, it is preferred to the data leaks that could occur while waiting for policies to sync.

Whenever DDL occurs, it can result in policy downtime, such as in the following examples:

A new column is added to a table that needs to be masked from users that have access to that table (potential data leak).
A new table is created in a space where other users have read access on future tables (potential data leak).
A tag that drives a policy is updated, deleted, or newly added with no other changes to the schema or table. This is a limitation of how Immuta can discover changes - it is too inefficient to search for tag changes, so schema changes are what drives Immuta to take action.

Best practices for Snowflake

Immuta recommends all of the following best practices to ensure data downtime occurs during policy downtime:

Use CREATE OR REPLACE for all DDL, including altering tags, so that access is always revoked.

Without these best practices, you are making unintentional policy decision in Snowflake that may be in conflict with your organization's actual policies enforced by Immuta.

For example, if the CREATE OR REPLACE added a new column that contains sensitive data, and the user COPY GRANTS, it opens that column to users, causing a data leak. Instead, access must be blocked using the above data downtime techniques until Immuta has synchronized.

Best practices for Databricks Unity Catalog

Immuta recommends all of the following best practices to ensure data downtime occurs during policy downtime:

Use CREATE OR REPLACE for all DDL, including altering tags, so that access is always revoked.

Without these best practices, you are making unintentional policy decision in Unity Catalog that may be in conflict with your organization's actual policies enforced by Immuta.

For example, if you GRANT SELECT on a schema, and then someone writes a new table with sensitive data into that schema, it could cause a data leak. Instead, access must be blocked using the above data downtime techniques until Immuta has synchronized.

Step 5: Initiate policy uptime

When schema monitoring is run globally, it will detect the following:

Any new table
Any new view
Changes to the object type backing an Immuta data source (for example, when a TABLE changes to a VIEW); when an object type changes, Immuta reapplies existing policies to the data source.
Any existing table destructively recreated through CREATE OR REPLACE (even if there are no schema changes)
Any existing view destructively recreated through CREATE OR REPLACE (even if there are no schema changes)
Any dropped table
Any new column
Any dropped column
Any column type change (which can impact policy application)
Any tag created, updated, or deleted (but only if the schema changed; otherwise tag changes alone are detected with Immuta’s health check)

Then, if any of the above is detected, for those tables or views, Immuta will complete the following:

Synchronize the existing policy back to the table or view to reduce data downtime
If SDD is enabled, execute SDD on any new columns or tables
If an external catalog is configured, execute a tag synchronization
Synchronize the final updated policy based on the SDD results and tag synchronization

See the image below for an illustration of this process with Snowflake.

The two options for running schema monitoring are described in the sections below. You can implement them together or separately.

Alert Immuta through the API or a custom function

If the data platform supports custom UDFs and external functions, you can wrap the /dataSource/detectRemoteChanges endpoint with one. Then, as your transform jobs complete, you can use SQL to call this UDF or external function to tell Immuta to execute schema monitoring. The reason for wrapping in a UDF or external function is because dbt and transform jobs always compile to SQL, and the best way to make this happen immediately after the table is created (after the transform job completes) is to execute more SQL in the same job.

Consult your Immuta professional for a custom UDF compatible with Snowflake or Databricks Unity Catalog.

Periodic schema monitoring

The default schedule for Immuta to run schema monitoring is every night at 12:30 a.m. UTC. However, this schedule can be updated by changing some advanced configuration. The processing time for schema monitoring is dependent on the number of tables and columns changed in your data environment. If you want to change the schedule to run more frequently than daily, Immuta recommends you test the runtime (with a representative set of DDL changes) before making the configuration change.

Consult your Immuta professional to update the schema monitoring schedule, if desired.

Recommended Immuta policy types

There are some use cases where you want all users to have access to all tables, but want to mask sensitive data within those tables. While you could do this using just data policies, Immuta recommends you still utilize subscription policies to ensure users are granted access.

How-to Guides

Author a Subscription Policy

Best practice: write global policies

Build global subscription policies using attributes and Discovered tags instead of writing local policies to manage data access. This practice prevents you from having to write or rewrite single policies for every data source added to Immuta and from manually approving data access.

Write access policy requirements

Private preview

Write policies are only available to select accounts. Contact your Immuta representative to enable this feature.

At least one of the following permissions is required to manage write policies:
- CREATE_DATA_SOURCE Immuta permission (to create local write policies)
- GOVERNANCE Immuta permission (to create local or global write policies)
- MANAGE_POLICIES domain permission (to create global write policies)
Databricks Unity Catalog, Snowflake, or Starburst (Trino) integration
Snowflake table grants enabled (for Snowflake integrations)

Enable write access policies

Once support for this feature has been enabled in your Immuta tenant,

Navigate to the App Settings page.
Scroll to the Preview Features section.
Click the Enable Write Policies checkbox and Save your changes.

Create a policy

Determine your policy scope:
- Global policy: Click the Policies page icon in the left sidebar and select the Subscription Policies tab. Click Add Subscription Policy and complete the Enter Name field.
- Local policy: Navigate to a specific data source and click the Policies tab. Click Add Subscription Policy and select New Local Subscription Policy.
Select the access type:
- Read Access: Control who can view the data source.
- Write Access: Control who can view and modify data in the data source.
Select the level of access restriction you would like to apply:
- Allow anyone: Check the Require Manual Subscription checkbox to turn off automatic subscription. Enabling this feature will require users to manually subscribe to the data source if they meet the policy.
- Allow anyone who asks (and is approved):
  1. Click Anyone or An individual selected by user from the first dropdown menu in the subscription policy builder.
    Note: If you choose An individual selected by user, when users request access to a data source they will be prompted to identify an approver with the permission specified in the policy and how they plan to use the data.
  2. Select the Owner (of the data source), USER_ADMIN, GOVERNANCE, or AUDIT permission from the subsequent dropdown menu.
    Note: You can add more than one approving party by selecting + Add Another Approver.
- Allow users with specific groups/attributes: See the ABAC subscription policy guide for instructions.
- Allow individually selected users
For global policies: From the Where should this policy be applied dropdown menu, select When selected by data owners, On all data sources, or On data sources. If you selected On data sources, finish the condition in one of the following ways:
- tagged: Select this option and then search for tags in the subsequent dropdown menu.
- with columns tagged: Select this option and then search for tags in the subsequent dropdown menu.
- with column names spelled like: Select this option, and then enter a regex and choose a modifier in the subsequent fields.
- in server: Select this option and then choose a server from the subsequent dropdown menu to apply the policy to data sources that share this connection string.
- created between: Select this option and then choose a start date and an end date in the subsequent dropdown menus.
Click Create Policy. If creating a global policy, you then need to click Activate Policy or Stage Policy.

Manually grant access

Read and write access can also be granted manually by a data owner. See the Manage data source members guide for details.

Additional global ABAC subscription policies

When you have multiple global ABAC subscription policies to enforce, create separate global ABAC subscription policies, and then Immuta will use boolean logic to merge all the relevant policies on the tables they map to.

Subscription Policies Advanced DSL Guide

This page details how users can create more complex policies using functions and variables in the Advanced DSL policy builder than the Subscription Policy builder allows.

For instructions on writing Global Subscription Policies, see the following tutorial.

Enabling Enhanced Subscription Policy Variables (Public Preview)

Navigate to the App Settings Page.
Click Advanced Settings in the left panel, and scroll to the Preview Features section.
Check the Enable Enhanced Subscription Policy Variables checkbox.
Click Save.

Create and Edit Global Subscription Policies Using Advanced DSL

Navigate to the Policies Page.
Select Subscription Policies and click + Add Subscription Policy.
Choose a name for your policy and select how the policy should grant access.
Select Create using Advanced DSL.
Select the rules for your policy from the Advanced DSL options. For example, creating a @hasTagsAsAttribute('Department', 'dataSource') would subscribe all users who have an attribute that matches a tag on a data source to that data source. So users with the attribute Department.Marketing would be subscribed to data sources with the tag Marketing.
Check the Require Manual Subscription checkbox to turn off automatic subscription. Enabling this feature will require users to manually subscribe to the data source if they meet the policy.
If you would like to make your data source visible in the list of all data sources in the UI to all users, click the Allow Data Source Discovery checkbox. Otherwise, this data source will not be discoverable by users who do not meet the criteria established in the policy.
If you would like users to have the ability to request approval to the data source, even if they do not have the required attributes or traits, check the Request Approval to Access checkbox. This will require an approver with permissions to be set.
Select how you want Immuta to merge multiple global subscription policies that apply to a single data source.
- Always Required: Users must meet all the conditions outlined in each policy to get access (i.e., the conditions of the policies are combined with AND).
- Share Responsibility: Users need to meet the condition of at least one policy that applies (i.e., the conditions of the policies are combined with OR).
Select where this policy should be applied, On data sources, When selected by data owners, or On all data sources
- If a user selects On data sources options include, with columns tagged, with columns spelled like, in server, and created between.
Click Create Policy.

Author a Restricted Subscription Policy

Data owners who are not governors can write restricted and , which allow them to enforce policies on multiple data sources simultaneously, eliminating the need to write redundant .

Unlike global policies, the application of these policies is restricted to the data sources owned by the users or groups specified in the policy and will change as users' ownerships change.

Write access policy requirements

Private preview

are only available to select accounts. Contact your Immuta representative to enable this feature.

At least one of the following permissions is required to manage write policies:
- CREATE_DATA_SOURCE Immuta permission (to create local write policies)
- GOVERNANCE Immuta permission (to create local or global write policies)
- MANAGE_POLICIES domain permission (to create global write policies)
, , or integration
(for Snowflake integrations)

Enable write access policies

Once support for this feature has been enabled in your Immuta tenant,

Navigate to the App Settings page.
Scroll to the Preview Features section.
Click the Enable Write Policies checkbox and Save your changes.

Create a restricted subscription policy

Click the Policies in the left sidebar and select Subscription Policies.
Click Add Policy, complete the Enter Name field.
Select the access type:
- Read Access: Control who can view the data source.
- Write Access: Control who can view and modify data in the data source.
Select the level of access restriction you would like to apply to your data sources:
- Allow anyone: Check the Require Manual Subscription checkbox to turn off automatic subscription. Enabling this feature will require users to manually subscribe to the data source if they meet the policy.
- Allow anyone who asks (and is approved):
  1. Click Anyone or An individual selected by user from the first dropdown menu in the subscription policy builder.
    Note: If you choose An individual selected by user, when users request access to a data source they will be prompted to identify an approver with the permission specified in the policy and how they plan to use the data.
  2. Select the Owner (of the data source), USER_ADMIN, GOVERNANCE, or AUDIT permission from the subsequent dropdown menu.
    Note: You can add more than one approving party by selecting + Add Another Approver.
- Allow users with specific groups/attributes:
  2. Use the subsequent dropdown to choose the group or attribute for your condition. You can add more than one condition by selecting + Add Another Condition. The dropdown menu in the subscription policy builder contains conjunctions for your policy. If you select or, only one of your conditions must apply to a user for them to see the data. If you select and, all of the conditions must apply.
  3. Check the Require Manual Subscription checkbox to turn off automatic subscription. Enabling this feature will require users to manually subscribe to the data source if they meet the policy.
  4. If you would like to make your data source visible in the list of all data sources in the UI to all users, click the Allow Data Source Discovery checkbox. Otherwise, this data source will not be discoverable by users who do not meet the criteria established in the policy.
  5. If you would like users to have the ability to request approval to the data source, even if they do not have the required attributes or traits, check the Request Approval to Access checkbox. This will require an approver with permissions to be set.
  6. Select how you want Immuta to merge multiple global subscription policies that apply to a single data source.
    Always Required: Users must meet all the conditions outlined in each policy to get access (i.e., the conditions of the policies are combined with AND).
    Share Responsibility: Users need to meet the condition of at least one policy that applies (i.e., the conditions of the policies are combined with OR).
- Allow individually selected users
From the Where should this policy be applied dropdown menu, select When selected by data owners, On all data sources, or On data sources. If you selected On data sources, finish the condition in one of the following ways:
- tagged: Select this option and then search for tags in the subsequent dropdown menu.
- with columns tagged: Select this option and then search for tags in the subsequent dropdown menu.
- with column names spelled like: Select this option, and then enter a regex and choose a modifier in the subsequent fields.
- in server: Select this option and then choose a server from the subsequent dropdown menu to apply the policy to data sources that share this connection string.
- created between: Select this option and then choose a start date and an end date in the subsequent dropdown menus.
Beneath Whose Data Sources should this policy be restricted to, add users or groups to the policy restriction by typing in the text fields and selecting from the dropdown menus that appear.
Opt to complete the Enter Rationale for Policy (Optional) field.
Click Create Policy, and then click Activate Policy or Stage Policy.

Clone, Activate, or Stage a Global Policy

Clone a policy

Click the Policies icon in the left sidebar and navigate to the Data Policies or Subscription Policies tab.
Click the dropdown menu in the Actions column of a policy and select Clone.
Open the dropdown menu and click Edit in the Global Policy Builder. Then make your changes using the dropdown menus.
Click Create Policy, select Activate Policy, and then click Confirm.

The policy will display as Pending until it is enforced on all data sources it applies to. See the for details.

Note: If a cloned policy contains custom certifications, the certifications will also be cloned.

Activate a templated policy

Click the Policies icon in the left sidebar and navigate to the Data Policies tab.
Click the dropdown menu in the Actions column of one of the templated policies and select Activate. Note: If data governors decide to stage an active policy, they select Stage from this dropdown menu.

The policy will display as Pending until it is enforced on all data sources it applies to. See the for details.

Stage the policy

Click the dropdown menu in the Actions column of a policy and select Stage. Note: If data governors decide to make a staged policy active, they select Activate from this dropdown menu.
Click Confirm in the dialog that appears.

Reference Guides

Subscription Policy Access Types

Private preview

Write policies are only available to select accounts. Contact your Immuta representative to enable this feature.

Immuta offers two types of to manage read and write access in a single system:

Read access policies manage who can read data.
Write access policies manage who can modify data.

Both of these access types can be enforced at any of the restriction levels outlined in the .

The table below illustrates the access types supported by each integration.

Integration

Read access policies

Write access policies

To create a read or write access policy, see the .

Policy enforcement

Once a read or write access policy is enforced on an Immuta data source, it translates to the relevant privileges on the table, view, or object in the remote platform. The sections below detail how these access types are enforced for each integration.

Granting Snowflake privileges

The Snowflake integration supports read and write access subscription policies. However, when applying read and write access policies to Snowflake data sources, the privileges granted by Immuta vary depending on the object type. For example, users can register Snowflake views as Immuta data sources and apply read and write policies to them, but when a write policy is applied to a view only the SELECT privilege will take effect in Snowflake, as views are read-only objects.

Users can register any object stored in Snowflake’s information_schema.tables view as an Immuta data source. The table below outlines the Snowflake privileges Immuta issues when read and write policies are applied to various object types in Snowflake. Beyond the privileges listed, Immuta always grants the USAGE privilege on the parent schema and database for any object that access is granted to for a particular user.

Granting Databricks Unity Catalog privileges

The Databricks Unity Catalog integration supports read and write access subscription policies. When users create a subscription policy in Immuta, Immuta uses the Unity Catalog API to issue GRANTS or REVOKES against the catalog, schema, or table in Databricks for every user affected by that subscription policy.

Users can register any object stored in Databricks Unity Catalog’s information_schema.tables view as an Immuta data source. However, when applying read and write access policies to these data sources, the privileges granted by Immuta vary depending on the object type. For example, users can register federated tables as Immuta data sources and apply read and write policies to them, but only a read policy will take effect in Databricks and allow users to SELECT those tables. If a write policy is applied, Immuta will not issue SELECT or MODIFY privileges in Databricks.

The table below outlines the Databricks privileges Immuta issues when read and write policies are applied to various object types in Databricks Unity Catalog. Beyond the privileges listed, Immuta always grants the USAGE privilege on the parent schema and catalog for any object that access is granted to for a particular user.

Granting Databricks Spark privileges

The Databricks Spark integration supports read access subscription policies. When a read access policy is applied to a data source, Immuta modifies the logical plan that Spark builds when a user queries data to enforce policies that apply that user. If the user is subscribed to the data source, the user is granted SELECT on the object in Databricks. If the user does not have read access to the object, they are denied access.

Granting Starburst (Trino) privileges

The Starburst (Trino) integration supports read and write access subscription policies. In the Starburst (Trino) integration's default configuration, the following access values grant read and write access to Starburst (Trino) data when a user is granted access through a subscription policy:

READ: When a user is granted read access to a data source, they can SELECT on tables or views and SHOW on tables, views, or columns in Starburst (Trino). This setting in enabled by default when you configure the Starburst (Trino) integration.
WRITE: In its default setting, the Starburst (Trino) integration's write access value controls the authorization of SQL operations that perform data modification (such as INSERT, UPDATE, DELETE, MERGE, and TRUNCATE). When users are granted write access to a data source through a subscription policy, they can INSERT, UPDATE, DELETE, MERGE, and TRUNCATE on tables and REFRESH on materialized views. This setting is enabled by default when you configure the Starburst (Trino) integration.

Custom configuration

Because Starburst (Trino) can govern certain table modification operations (like ALTER) separately from data modification operations (like INSERT), Immuta allows users to specify what modification operations are permitted on data in Starburst (Trino). Administrators can allow table modification operations (such as ALTER and DROP tables) to be authorized as write operations through advanced configuration in the Immuta web service or Starburst (Trino) cluster with the following access values:

OWN: When mapped via advanced configuration to Immuta write policies, users who are granted write access to Starburst (Trino) data can ALTER and DROP tables and SET comments and properties on a data source.
CREATE: When this privilege is granted on Starburst (Trino) data, an Immuta user can create catalogs, schemas, tables, or views on a Starburst (Trino) cluster. CREATE is a Starburst (Trino) privilege that is not controlled by Immuta policies, and this property can only be set in the access-control.properties file on the Starburst (Trino) cluster.

Administrators can customize table and data modification settings in one or both of the following places; however, the access-control.properties overrides the settings configured in the Immuta web service:

Immuta web service access grants mapping

Customizing read and write access in the Immuta web service affects operations on all Starburst (Trino) data registered as Immuta data sources in that Immuta tenant. This configuration method should be used when all Starburst (Trino) data source operations should be affected identically across Starburst (Trino) clusters connected to the Immuta web service. Example configurations are provided below. Contact your Immuta representative to customize the mapping of read or write access policies for your Immuta tenant.

Default configuration

The default setting shown below maps WRITE to READ and WRITE permissions and maps READ to READ. Both the READ and WRITE permission should always include READ.

In this example, if a user is granted write access to a data source through a subscription policy, that user can perform data modification operations (INSERT, UPDATE, MERGE, etc.) on the data.

accessGrantMapping:
  WRITE: ['READ', 'WRITE']
  READ: ['READ']

Custom configuration

The following configuration example maps WRITE to READ, WRITE, and OWN permissions and maps READ to READ. Both READ and WRITE permissions should always include READ.

In this example, if a user gets write access to a data source through a subscription policy, that user can perform both data (INSERT, UPDATE, MERGE, etc.) and table (ALTER, DROP, etc.) modification operations on the data.

accessGrantMapping:
  WRITE: ['READ', 'WRITE', 'OWN']
  READ: ['READ']

Starburst (Trino) cluster access grants mapping

The Starburst (Trino) integration can also be configured to allow read and write policies to apply to any data source (registered or unregistered in Immuta) on a specific Starburst (Trino) cluster.

Two properties customize the behavior of read or write access for all Immuta users on that Starburst (Trino) cluster:

immuta.allowed.immuta.datasource.operations: This property governs objects (catalogs, schemas, tables, etc.) that are registered as data sources in Immuta. For these permissions to apply, the user must be subscribed in Immuta and not be an administrator (who gets all permissions).
immuta.allowed.non.immuta.datasource.operations: This property governs objects (catalogs, schemas, tables, etc.) that are not registered as data sources in Immuta. This is the only property that allows the CREATE permission, since CREATE is enforced on new objects that do not exist in Starburst (Trino) or Immuta yet (such as a new table being created with CREATE TABLE).

Default configuration

By default, Immuta allows READ and WRITE operations to be authorized on data registered in Immuta, while all operations are permitted for data sources that are not registered in Immuta.

immuta.allowed.immuta.datasource.operations=READ,WRITE
immuta.allowed.non.immuta.datasource.operations=READ,WRITE,CREATE,OWN

Custom configuration

In the example below, the configuration allows READ, WRITE, and OWN operations to be authorized on data sources registered in Immuta and all operations are permitted on data that is not registered in Immuta. If a user gets write access to data registered in Immuta through a subscription policy, that user can perform both data (INSERT, UPDATE, MERGE, etc.) and table (ALTER, DROP, etc.) modification operations on the data.

immuta.allowed.immuta.datasource.operations=READ,WRITE,OWN
immuta.allowed.non.immuta.datasource.operations=READ,WRITE,CREATE,OWN

Granting Redshift privileges

The Redshift integration supports read access subscription policies. Immuta grants the SELECT Redshift privilege to the PUBLIC role when the integration is configured, which allows all users who meet the conditions of a subscription policy to access the Immuta-managed view. When a data source is created, Immuta creates a corresponding dynamic view of the table with a join to a secure view that contains all Immuta users, their entitlements, their projects, and a list of the tables they have access to. When a read policy is created or updated (or when a user's entitlements change, they switch projects, or when their data source access is approved or revoked), Immuta updates the secure view to grant or revoke users' access to the data source. If a user is granted access to the data source, they can access the view. If a user does not have read access to the view, zero rows are returned when they attempt to query the view.

Granting Azure Synapse Analytics privileges

The Azure Synapse Analytics integration supports read access subscription policies. Immuta grants the SELECT privilege to the PUBLIC role when the integration is configured, which allows all users who meet the conditions of a subscription policy to access the Immuta-managed view. When a read policy is created or removed (or when a user's entitlements change, they switch projects, or when their data source access is approved or revoked), Immuta updates the view that contains the users' entitlements, projects, and a list of tables they have access to grant or revoke their access to the dynamic view. Users' read access is enforced through an access check function in each individual view. If a user is granted access to the data source, they can access the view. If a user does not have read access to the view, they receive an Access denied: you are not subscribed to the data source error when they attempt to query the view.

Granting Google BigQuery privileges

The Google BigQuery integration supports read access subscription policies. In this integration, Immuta creates views that contain all policy logic. Each view has a 1-to-1 relationship with the original table, and read access controls are applied in the view. After data sources are registered, Immuta uses the custom user and role, created before the integration is enabled, to push the Immuta data sources as views into a mirrored dataset of the original table. Immuta manages grants on the created view to ensure only users subscribed to the Immuta data source will see the data.

Granting Amazon S3 privileges

S3 Access Grants `READ` and `READWRITE` access levels

Write access policy limitations

With the exception of the Starburst (Trino) integration, users can only modify existing data when they are granted write access to data; they cannot create new tables or delete tables.
Write actions are not currently captured in audit logs.

Advanced Use of Special Functions

There are several different that are available for building subscription policies. Some of these functions, listed below, are narrowly focused on orchestrated RBAC use cases. Orchestrated RBAC is when an organization has many roles that represent access, and rather than switching to using the ABAC model provided by Immuta, they use these special functions to orchestrate existing roles using Immuta.

Specifically, the functions to enable orchestrated-RBAC are:

@hostname
@database
@schema
@table
@hasTagAsAttribute('Attribute Name', 'dataSource' or 'column')
@hasTagAsGroup('dataSource' or 'column')

Example 1

Policy:

@hasAttribute('SpecialAccess', '@hostname.@database.*')

User:

has the attribute SpecialAccess with the value us-east-1-snowflake.default.*

The user would be subscribed to all the data sources in the default database. Note this has nothing to do with tags, it is based purely on the physical name of the host, database, schema, and table in the remote data platform. Also note that the user attribute contains an asterisk * to denote everything under the default database hierarchy. Asterisks are supported only for the infrastructure special functions:

@hostname
@database
@schema
@table

This is because, since it's an infrastructure view, Immuta can assume a 4-level hierarchy (hostname.database.schema.table) and an asterisk can be placed between any two objects in that 4-level hierarchy to represent any object, such as us-east-1-snowflake.*.hr. That would give the user access to any schema named hr in host us-east-1-snowflake no matter the database.

However, that is not possible when using the tag-based special functions:

@hasTagAsAttribute('Attribute Name', 'dataSource' or 'column')
@hasTagAsGroup('dataSource' or 'column')

Lastly, the asterisk represents any object, but cannot be used for a concatenated wildcard like so: snowfl*.tpc.*.*

Example 2

Policy:

@hasTagAsAttribute('PersonalData', 'dataSource')

User:

has the attribute key PersonalData with the values
Discovered.PII
Discovered.Entity

Data source 1:

tagged:
Discovered.Identifier Indirect
Discovered.PHI
Discovered.PII

Data source 2:

tagged:
Discovered.Identifier Direct
Discovered.PCI
Discovered.Entity.Social Security Number

Data source 3:

tagged:
Discovered.Identifier Direct
Discovered.PHI

The user would be subscribed to data source 1 and 2, but the user would not be subscribed to data source 3. This is because access moves from left-to-right in the hierarchy based on what the user possesses (the wildcard asterisk is implied).

So if a user had a more specific attribute key PersonalData with the values Discovered.Entity.Social Security Number, they would only get access to hypothetical data source 2, because their attribute is further left or matches (in this case matches) Discovered.Entity.Social Security Number.

The below table provides more examples:

Merging special functions

This could be helpful for use cases with a policy like the following:

If user has the attribute “Allowed_Domain.Domain A” they get access to generic data that is part of domain A.
If user has the attribute “Badge_Allowed.Badge X” they should gain access to both “generic data + any additional data (only in domain A because they only have “Data Domain A General Access”) that has been tagged as “Badge X”.

In this case it can be two separate subscription policies, such as

Policy 1: @hasTagAsAttribute(Allowed_Domain, ‘datasource’) this would limit to the domains where they are allowed to see generic data.

Policy 2: @hasTagAsAttribute(Badge_Allowed, ‘datasource’) this would limit to the badges they are allowed to see.

Then, when the data sources are tagged with table tags that represent access, if the table only has the domain tag, only policy 1 will apply; however, if it has a domain tag and a badge tag, both policies will be applied and merged successfully by Immuta.

Use with caution

How-to Guides

Author a Masking Data Policy

Best practice: write global policies

Build global policies with tags instead of writing local policies to manage data access. This practice will prevent you from having to write or rewrite single policies for every data source added to Immuta.

Determine your policy scope:
- : Click the Policies page icon in the left sidebar and select the Data Policies tab. Click Add Policy and enter a name for your policy.
- : Navigate to a specific data source and click the Policies tab. Scroll to the Data Policies section and click Add Policy.
Select Mask from the first dropdown menu.
Select columns tagged, columns with any tag, columns with no tags, all columns, or columns with names spelled like.
Select a masking type:
- : Enter a constant in the field that appears next to the masking type dropdown.
- :
  1. Enter a regular expression and replacement value in the fields that appear next to the masking type dropdown.
  2. From the next dropdown, choose to make the regex Case Insensitive and/or Global.
- : Select the Bucket Type and then enter the bucket size.
- : Select either using fingerprint or requiring group size of at least and enter a group size in the subsequent dropdown menu.
- : Enter the custom function native to the underlying database.
  Note: The function must be valid for the data type of the column. If it is not, the default masking type will be applied to the column.
Select everyone except, everyone, or everyone who to continue the condition.
- everyone except: In the subsequent dropdown menus, choose is a member of group, possesses attribute, or is acting under purpose. Complete the condition with the subsequent dropdown menus.
- for everyone who: Complete the Otherwise clause. You can add more than one condition by selecting + Add Another Condition. The dropdown menu in the policy builder contains conjunctions for your policy. If you select or, only one of your conditions must apply to a user for them to see the data. If you select and, all of the conditions must apply.
Opt to complete the Enter Rationale for Policy (Optional) field, and then click Add.
For global policies: Click the dropdown menu beneath Where should this policy be applied and select When selected by data owners, On all data sources, or On data sources. If you selected On data sources, finish the condition in one of the following ways:
- tagged: Select this option and then search for tags in the subsequent dropdown menu.
- with columns tagged: Select this option and then search for tags in the subsequent dropdown menu.
- with column names spelled like: Select this option, and then enter a regex and choose a modifier in the subsequent fields.
- in server: Select this option and then choose a server from the subsequent dropdown menu to apply the policy to data sources that share this connection string.
- created between: Select this option and then choose a start date and an end date in the subsequent dropdown menus.
Click Create Policy. If creating a global policy, you then need to click Activate Policy or Stage Policy.

Create a custom certification for a global policy

Click Add Certification in the data policy builder.
Enter a Certification Label and Certification Text in the corresponding fields of the dialog that appears.
Click Save.

Author a Purpose-Based Restriction Policy

Requirement and prerequisite:

CREATE_DATA_SOURCE or GOVERNANCE Immuta permission
A purpose has been created

Build the policy

Determine your policy scope:
- Global policy: Click the Policies page icon in the left sidebar and select the Data Policies tab. Click Add Policy and enter a name for your policy.
- Local policy: Navigate to a specific data source and click the Policies tab. Scroll to the Data Policies section and click Add Policy.
Select Limit usage to purpose(s) in the first dropdown menu.
In the next field, select a specific purpose that you would like to restrict usage of this data source to or ANY PURPOSE. You can add more than one condition by selecting + Add Another Condition. The dropdown menu in the policy builder contains conjunctions for your policy. If you select or, only one of your conditions must apply to a user for them to see the data. If you select and, all of the conditions must apply.
Select for everyone or for everyone except. If you select for everyone except, you must select conditions that will drive the policy such as group, purpose, or attribute.
Opt to complete the Enter Rationale for Policy (Optional) field, and then click Add.
For global policies: Click the dropdown menu beneath Where should this policy be applied, and select On all data sources, On data sources, or When selected by data owners. If you select On data sources, finish the condition in one of the following ways:
- tagged: Select this option and then search for tags in the subsequent dropdown menu.
- with columns tagged: Select this option and then search for tags in the subsequent dropdown menu.
- with column names spelled like: Select this option, and then enter a regex and choose a modifier in the subsequent fields.
- in server: Select this option and then choose a server from the subsequent dropdown menu to apply the policy to data sources that share this connection string.
- created between: Select this option and then choose a start date and an end date in the subsequent dropdown menus.
Click Create Policy. If creating a global policy, you then need to click Activate Policy or Stage Policy.

How-to guides

Create a project: To restrict access to data and associate your data source with a purpose, create a project and add the purpose and relevant data sources to the project.
Manage project purposes

Conceptual guide

Why use projects?

Author a Restricted Data Policy

Data owners who are not governors can write restricted subscription and data policies, which allow them to enforce policies on multiple data sources simultaneously, eliminating the need to write redundant local policies.

Unlike global policies, the application of these policies is restricted to the data sources owned by the users or groups specified in the policy and will change as users' ownerships change.

Click Policies in the left sidebar and select Data Policies.
Click Add Policy and complete the Enter Name field.
Select how the policy should protect the data. Click a link below for instructions on building that specific data policy:
Opt to complete the Enter Rationale for Policy (Optional) field, and then click Add.
From the Where should this policy be applied dropdown menu, select When selected by data owners, On all data sources, or On data sources. If you selected On data sources, finish the condition in one of the following ways:
- tagged: Select this option and then search for tags in the subsequent dropdown menu.
- with columns tagged: Select this option and then search for tags in the subsequent dropdown menu.
- with column names spelled like: Select this option, and then enter a regex and choose a modifier in the subsequent fields.
- in server: Select this option and then choose a server from the subsequent dropdown menu to apply the policy to data sources that share this connection string.
- created between: Select this option and then choose a start date and an end date in the subsequent dropdown menus.
Beneath Whose Data Sources should this policy be restricted to, add users or groups to the policy restriction by typing in the text fields and selecting from the dropdown menus that appear.
Click Create Policy, and then click Activate Policy or Stage Policy.

Author a Row-Level Policy

Determine your policy scope:
- Global policy: Click the Policies page icon in the left sidebar and select the Data Policies tab. Click Add Policy and enter a name for your policy.
- Local policy: Navigate to a specific data source and click the Policies tab. Scroll to the Data Policies section and click Add Policy.
Select the Only show rows action from the first dropdown.
Choose one of the following policy conditions:
- Where user
  1. Choose the condition that will drive the policy from the next dropdown: is a member of a group or possesses an attribute.
  2. Use the next field to choose the attribute, group, or purpose that you will match values against.
  3. Use the next dropdown menu to choose the tag that will drive this policy. You can add more than one condition by selecting + Add Another Condition. The dropdown menu in the far right of the policy builder contains conjunctions for your policy. If you select or, only one of your conditions must apply to a user for them to see the data. If you select and, all of the conditions must apply.
- Where the value in the column tagged
  1. Select the tag from the next dropdown menu.
  2. From the subsequent dropdown, choose is or is not in the list, and then enter a list of comma-separated values.
- Where
  1. Enter a valid SQL WHERE clause in the subsequent field. When you place your cursor in this field, a tooltip details valid input and the column names of your data source. See Custom WHERE Clause Functions for more information about specific functions.
- Never
  The never condition blocks all access to the data source.
  1. Choose the condition that will drive the policy from the next dropdown: for everyone, for everyone except, or for everyone who.
  2. Select the condition that will further define the policy: is a member of group, is acting under a purpose, or possesses attribute.
  3. Use the next field to choose the group, purpose, or attribute that you will match values against.
Choose for everyone, everyone except, or for everyone who to drive the policy. If you choose for everyone except, use the subsequent dropdown to choose the group, purpose, or attribute for your condition. If you choose for everyone who as a condition, complete the Otherwise clause before continuing to the next step.
Opt to complete the Enter Rationale for Policy (Optional) field, and then click Add.
For global policies: Click the dropdown menu beneath Where should this policy be applied, and select On all data sources, On data sources, or When selected by data owners. If you select On data sources, finish the condition in one of the following ways:
- tagged: Select this option and then search for tags in the subsequent dropdown menu.
- with columns tagged: Select this option and then search for tags in the subsequent dropdown menu.
- with column names spelled like: Select this option, and then enter a regex and choose a modifier in the subsequent fields.
- in server: Select this option and then choose a server from the subsequent dropdown menu to apply the policy to data sources that share this connection string.
- created between: Select this option and then choose a start date and an end date in the subsequent dropdown menus.
Click Create Policy. If creating a global policy, you then need to click Activate Policy or Stage Policy.

Author a Time-Based Restriction Policy

Determine your policy scope:
- Global policy: Click the Policies page icon in the left sidebar and select the Data Policies tab. Click Add Policy and enter a name for your policy.
- Local policy: Navigate to a specific data source and click the Policies tab. Scroll to the Data Policies section and click Add Policy.
Select Only show data by time from the first dropdown.
Select where data is more recent than or older than from the next dropdown, and then enter the number of minutes, hours, days, or years that you would like to restrict the data source to. Note that unlike many other policies, there is no field to select a column to drive the policy. This type of policy will be driven by the data source's event-time column, which is selected at data source creation.
Choose for everyone, everyone except, or for everyone who to drive the policy. If you choose for everyone except, use the subsequent dropdown to choose the group, purpose, or attribute for your condition. If you choose for everyone who as a condition, complete the Otherwise clause before continuing to the next step.
Opt to complete the Enter Rationale for Policy (Optional) field, and then click Add.
For global policies: Click the dropdown menu beneath Where should this policy be applied, and select On all data sources, On data sources, or When selected by data owners. If you select On data sources, finish the condition in one of the following ways:
- tagged: Select this option and then search for tags in the subsequent dropdown menu.
- with columns tagged: Select this option and then search for tags in the subsequent dropdown menu.
- with column names spelled like: Select this option, and then enter a regex and choose a modifier in the subsequent fields.
- in server: Select this option and then choose a server from the subsequent dropdown menu to apply the policy to data sources that share this connection string.
- created between: Select this option and then choose a start date and an end date in the subsequent dropdown menus.
Click Create Policy. If creating a global policy, you then need to click Activate Policy or Stage Policy.

Certifications Exemptions and Diffs

Required permissions

To manage and apply existing policies to data sources, a user must have either the CREATE_DATA_SOURCE Immuta permission or be manually assigned the owner role on a data source.

Certify global policies

After a policy with a certification requirement is applied to a data source, data owners will receive a notification indicating that they need to certify the policy.

Navigate to the Policies tab of the affected data source, and review the policy in the Data Policies section.
Click Certify Policy.
In the Policy Certification modal, click Sign and Certify.

Add policy exemptions

Once this setting is enabled on the app settings page, data owners can exempt users from policies on a per-data-source basis to allow those users to see all the data, regardless of the global or local policies applied. Note: By default, policy exemptions are disabled in Immuta.

Select a data source and click the Policies tab.
In the Data Policies menu, click Add Exemptions. This button will only be visible if policy exemptions have been enabled.
Enter the names of the users or groups to exempt from your policies.
Click Create to finish your exemption policy.
Click Save All to apply the policy to your data source.

View policy diffs

Once you have a data policy in effect, you can view the changes in your policies by clicking the Policy Diff button in the data policies section on a data source's policies tab.

The Policy Diff button displays previous policies and the current policy applied to the data source.

External Masking Interface

Deprecation notice: Support for this feature has been deprecated.

Use Deterministic IVs/Salt

Use deterministic IVs/salt to ensure the same value is masked consistently throughout the data, as Immuta always pushes down the masked version of the literal when the querying user is exempt from the policy.

Authentication

Immuta can make requests to your External Masking service with one of two authentication methods:

Username and password authentication: Immuta can send requests with a username and a password in the Authorization HTTP header. In this case, your service will need to be able to parse a Basic Authorization Header and validate the credentials sent with it.
PKI Certificate: Immuta can send requests using a CA certificate, a certificate, and a key.

Alternatively, Immuta can make unauthenticated requests to your REST masking service. This is recommended only if you have other security measures in place (e.g., if the service is in an isolated network that's reachable only by your Immuta environment.)

Endpoints

POST /

Description

The unmask action allows Immuta to build predicates that can be used to query data that is consistently masked at rest in the remote database; it does not dynamically mask data at query time.

To dynamically mask data, use Immuta’s standard masking policies.

This endpoint accepts a set of values and a directive to either mask or unmask them.

Request Body

Your service will need to parse and process the following body parameters:

Parameter

Type

Description

action

Enum(MASK, UNMASK)

Either MASK or UNMASK the values

values

Map(string, String[])

A map of columns containing an array of masked/unmasked values

Below is an example request payload to mask values in the ssn and ccn columns:

{
  "action": "MASK",
  "values": {
    "ssn": ["123-4567-890", "098-4567-321"],
    "ccn": ["0000-1111-2222-3333", "5555-6666-7777-8888", "9999-0000-1111-2222"]
  }
}

Below is an example request payload to unmask values in the ssn and ccn columns:

{
  "action": "UNMASK",
  "values": {
    "ssn": ["AX6YYTURNHtyDkjm", "ue6AUyQYgWiTLAUT"],
    "ccn": ["oDpEKiFutdWG46A2", "Tdz3BEVjjfVLrW6x", "nVANLwkgoLUAe4NQ"]
  }
}

Response Body

Your service will need to return a map of values that corresponds to the columns and values that were specified in the request. It is important that your service returns the same column keys and that the position of each masked/unmasked value in your response corresponds to the masked/unmasked value from the request.

For example, the following request

curl --location --request POST 'https://your.external.mask.service/' \
--data-raw '{
  "action": "MASK",
  "values": {
    "ssn": ["123-4567-890", "098-4567-321"],
    "ccn": ["0000-1111-2222-3333", "5555-6666-7777-8888", "9999-0000-1111-2222"]
  }
}'

could return the following body:

{
  "ssn": ["AX6YYTURNHtyDkjm", "ue6AUyQYgWiTLAUT"],
  "ccn": ["oDpEKiFutdWG46A2", "Tdz3BEVjjfVLrW6x", "nVANLwkgoLUAe4NQ"]
}

Notice that both ssn and ccn columns are present and that each of them contains the exact number of values specified in the request. Immuta will fail to validate responses to its request under the following circumstances:

The response contains column keys that were not present in the request.
The response is missing column keys that were present in the request.
The response doesn't contain the exact number of values for each of the corresponding column keys in the request.

Examples

Below are some very simplistic implementation examples of a service with mask() and unmask() functions:

const mask = (value) => {
  // TODO: implement your custom masking logic and return a masked value
};
const unmask = (value) => {
  // TODO: implement your custom unmasking logic and return an unmasked value
};
// Use as a POST handler:
const externalMaskingHandler = async (request, response) => {
  const { action, values } = request.payload;
  const result = {};
  Object.keys(values).forEach(column => {
      result[column] = values[column].map(action === 'MASK' ? mask : unmask);
  });
  return result;
};

from flask import Flask, request

app = Flask()

def mask(value):
    # TODO: implement your custom masking logic and return a masked value
    pass

def unmask(value):
    # TODO: implement your custom unmasking logic and return an unmasked value
    pass

@app.route("/path/to/my/services", methods=["POST"])
def external_masking_handler():
    payload = request.get_json()
    action = payload.get("action")
    values = payload.get("values")
    result = {}
    for k, v in values:
        result[k] = [mask(value) if action == "MASK" else unmask(value) for value in v]
    return result

</div>

</div>

    ```

Reference Guides

Custom WHERE Clause Functions

Overview

The Immuta policy builder allows you to use custom functions that reference important Immuta metadata from within your where clause. These custom functions can be seen as utilities that help you create policies easier. Using the Immuta Policy Builder, you can include these functions in your policy queries by choosing where in the sub-action drop-down menu.

Custom Functions

The `@attributeValuesContains()` Function

This function returns true for a given row if the provided column evaluates to an attribute value for which the querying user has a corresponding attribute value. This function requires two arguments and accepts no more than three arguments.

Parameters

Parameter

Type

Required

Description

Attribute Name

String

Required

The name of the attribute to retrieve values for

Column Name/SQL Expression

String

Required

The column that contains the value to match the attribute key against

Placeholder

String

Optional

A placeholder in case the list of values is empty

The `@columnTagged()` Function

This function returns the column name with the specified tag.

If this function is used in a Global Policy and the tag doesn't exist on a data source, the policy will not be applied.

Parameters

Parameter

Type

Required

Description

Tag Name

String

Required

The name of the tag

The `@groupsContains()` Function

This function returns true for a given row if the provided column evaluates to a group to which the querying user belongs. This function requires at least one argument.

Parameters

Parameter

Type

Required

Description

Column Name/SQL Expression

String

Required

The column that contains the value to match the group against

Placeholder

String

Optional

A placeholder in case the list of values is empty

The `@hasAttribute()` Function

This function returns a boolean indicating if the current user has the specified attribute name and value combination. If the specified attribute name or attribute value has a single quote, you will need to escape it using a \'\' expression within a custom WHERE policy.

Parameters

Parameter

Type

Required

Description

Attribute Name

String

Required

The name of the attribute

Attribute Value

String

Required

The value to correspond with the attribute name

The `@iam` Function

This function returns the IAM ID for the current user.

Parameters

None.

The `@isInGroups()` Function

This function returns a boolean indicating if the current user is a member of all of the specified groups. If any of the specified groups has a single quote, you will need to escape it using a \'\' expression within a custom WHERE policy.

Parameters

Parameter

Type

Required

Description

Group names

Array (String)

Required

A list of group names, e.g. groups('group_a', 'group_b', 'group_c')

The `@isUsingPurpose()` Function

This function returns a boolean indicating if the current user is using the specified purpose. If the specified purpose has a single quote, you will need to escape it using a \'\' expression within a custom WHERE policy.

Parameters

Parameter

Type

Required

Description

Purpose

String

Required

The name of the purpose to check the user against

The `@purposesContains()` Function

This function returns true for a given row if the provided column evaluates to a purpose under which the querying user is currently acting. This function requires at least one argument and accepts no more than two arguments.

Parameters

Parameter

Type

Required

Description

Column Name/SQL Expression

String/Expression

Required

The column that contains the value to match the purpose against

Placeholder

String

Optional

A placeholder in case the list of values is empty

The `@username` Function

This function returns the current user's user name.

Parameters

None.

Data Policy Conflicts and Fallback

Masking policy conflicts

In some cases, two conflicting global masking policies apply to a single data source. When this happens, the policy containing a tag deeper in the hierarchy will apply to the data source to resolve the conflict.

Consider the following global data policies created by a data governor:

Data policy 1:

Mask columns tagged PII by making null for everyone on data sources with columns tagged PII

Data policy 2:

Mask columns tagged PII.SSN using hashing for everyone on data sources with columns tagged PII.SSN

If a data owner creates a data source and applies the PII.SSN tag to a column, both of these global masking policies will apply to the column with that tag. Instead of having a conflict, the policy containing a deeper tag in the hierarchy will apply.

In this example, data policy 2 will be applied to the data source because PII.SSN is deeper and thus considered more specific than PII. If data owners wanted to use data policy 1 on the data source instead, they would need to disable data policy 2.

Should two or more masking policies target the same column and have the same hierarchy depth, the policy that was authored first will win out. This is a conservative approach that avoids the original policy being changed unexpectedly.

Row-level policy conflicts

Similar to masking policies, it is possible for two or more row-level policies to target the same table. When this occurs, all row-level policies will be applied and AND'ed together, meaning the user will need to meet all in some capacity to see any rows in the table at all.

To OR separate row-level policies together, build them into a single Immuta policy together with an OR.

Masking policy intelligent fallbacks

When masking columns, the type of the column matters. For example, it is not possible to hash a numeric column, because the hash would render the number as a string.

Many data platforms make the user account for this by building separate data policies for every column type that could exist now or in the future, which is quite onerous.

Instead, Immuta has intelligent fallbacks. An intelligent fallback occurs when a masking type targets a column type that is incompatible with the masking type. In this case, Immuta will fall back to the most appropriate masking type which retains the level of privacy or better required by the previous type.

For example, if a hashing masking type hits a numeric type, it would intelligently fallback to nulling the column instead, since nulls are allowed in numeric types.

Lockout policies

Sometimes a global data policy will target a table and the policy cannot be applied as written. This can happen for several reasons, but the most common is that the row-level policy logic is not relevant to the table in question.

For example, with the following policy

@attributeValuesContains('Attribute Name', 'SOME_COLUMN')

If SOME_COLUMN does not exist in the table, the row-level policy will not work (this is why it is always recommended to use the @columnTagged('tag name') function instead of hard coding column names).

In the case where an error such as this occurs with a global data policy, the lockout policy will kick in. The lockout policy is a row-level policy that blocks any rows from returning for any users. This may seem extreme, but since Immuta does not know how to apply the policy, the lockout policy avoids data leaks until the policy is edited to work correctly.

Orchestrated Masking Policies

Private preview

This feature is only available to select accounts. Contact your Immuta representative to enable this feature.

Orchestrated masking policies (OMP) reduce conflicts between masking policies that apply to a single column, allowing policies to scale more effectively across your organization. Furthermore, OMP fosters distributed data stewardship, empowering policy authors who share responsibility of a data set to protect it while allowing data consumers acting under various roles or purposes to access the data.

When multiple masking policies apply to a column, Immuta combines the exception conditions of the masking policy so that data subscribers can access the data when they satisfy one of those exception conditions. Multiple masking policies will be enforced on a column if the following conditions are true:

Policies use the same masking type.
Policies use the for everyone except condition.

Requirements

Databricks Spark or Starburst (Trino) integration

Supported masking policy types

OMP supports the following masking types:

Constant
Hashing
Format preserving masking
Null
Regex
Rounding

Global policy logic

Previous policy logic

Governors can apply policies to all columns in a data source or target specific columns with tags or a regular expression. Without orchestrated masking policies enabled, when multiple global policies apply to the same columns, Immuta could only apply one of those policies.

Consider the following example to examine how policies behaved when one tag is used in two different policies:

Mask PII Global Policy 1: Mask using hashing the value in columns tagged email except when user is acting under the purpose Email Campaign.
Mask PII Global Policy 2: Mask using hashing the value in columns tagged email except when user is acting under purpose Marketing.

For columns tagged email, only one of these policies is enforced. The Mask PII Global Policy 2 is not applied to the data source, so Immuta is not enforcing the masking policy properly for users who should be able to see emails because they are acting under the Marketing purpose.

Consider the following example where multiple masking policies apply to columns that have multiple tags, resulting in one policy applying:

Global Policy 3: Mask using hashing the value in columns tagged Employee Data unless users are acting under the purpose Retention Analysis.
Global Policy 4: Mask using hashing the value in columns tagged HR Data unless users are acting under the purpose Employee Satisfaction Survey.

If a column is tagged Employee Data and HR Data, Immuta will only apply one of the policies.

Orchestrated masking policy logic

With orchestrated masking policies, Immuta applies multiple global masking policies that apply to a single column by combining the policy exceptions with OR. For these policies to combine, the masking type must be identical and the policy must use the for everyone except condition.

Consider the following example, both of these policies will apply to the data source:

Mask PII Global Policy 1: Mask using hashing the value in columns tagged email except when user is acting under the purpose Email Campaign.
Mask PII Global Policy 2: Mask using hashing the value in columns tagged email except when user is acting under purpose Marketing.

Users acting under the purpose Marketing or Email Campaign will be able to see emails in the clear.

However, in the following example, only one of these policies will apply to the data source because one masks using a constant and the other masks using hashing:

Global Policy 5: Mask using the constant REDACTED the value in columns tagged Employee Data unless users are acting under the purpose Retention Analysis.
Global Policy 6: Mask using hashing the value in columns tagged HR Data unless users are acting under the purpose Employee Satisfaction Survey.

Limitations

No UI enhancements were made in this release. Multiple masking policies applied to the same column are visible on a data source, but there is no indication that the exceptions are combined with OR.
Masking types must match exactly for the policies to be combined. For example, both policies must mask using rounding.
Existing policies will not automatically migrate to the new policy logic when you enable the feature. To re-compute existing policies with the new logic, you must manually trigger global policy changes by staging and re-enabling each policy.

How-to Guides

Project Management

Reference Guides

How-to Guides

Reference Guides

Query Data

Data Policy Types

Once a user is subscribed to a data source, the data policies that are applied to that data source determine what data the user sees.

For all data policies, you must establish the conditions for which they will be enforced. Immuta allows you to append multiple conditions to the data. Those conditions are based on user attributes and groups (which can come from multiple identity management systems and applied as conditions in the same policy), or purposes they are acting under through Immuta projects.

Conditions can be directed as exclusionary or inclusionary, depending on the policy that's being enforced:

exclusionary condition example: Mask using hashing values in columns tagged PII on all data sources for everyone except users in the group AUDIT.
inclusionary condition example: Only show rows where user is a member of a group that matches the value in the column tagged Department.

Policy support

Integration support matrix

Certain policies are not supported, or supported with caveats*, depending on the integration:

*Supported with Caveats:

On Databricks data sources, joins will not be allowed on data protected with replace with NULL/constant policies.
Snowflake k-anonymization: This policy type is only supported if you are using the query engine, which is disabled by default. Reach out to your Immuta representative if you need to enable this policy type for your account.
Starburst (Trino):
- K-anonymization, randomized response, and format preserving masking are only supported if you are using the query engine, which is disabled by default. Reach out to your Immuta representative if you need to enable this policy type for your account.
- The Immuta function @iam for WHERE clause policies can block the creation of views.

Policy types

Inclusionary policies

For all policies except purpose-based restriction policies, inclusionary logic allows governors to vary policy actions with an Otherwise clause.

For example, governors could mask values using hashing for users acting under a specified purpose while masking those same values by making null for everyone else who accesses the data.

This variation can be created by selecting for everyone who when available from the condition dropdown menus and then completing the Otherwise clause.

Limit to purpose policies

Purposes help define the scope and use of data within a project and allow users to meet purpose restrictions on policies. Governors create and manage purposes and their sub-purposes, which project owners then add to their project(s) and use to drive Data Policies.

Purposes can be constructed as a hierarchy, meaning that purposes can contain nested sub-purposes, much like tags in Immuta. This design allows more flexibility in managing purpose-based restriction policies and transparency in the relationships among purposes.

For example, if the purpose Research included Marketing, Product, and Onboarding as sub-purposes, a governor could write the following global policy:

Limit usage to purpose(s) Research for everyone on data sources tagged PHI.

This hierarchy allows you to create this as a single purpose instead of creating separate purposes, which must then each be added to policies as they evolve.

Now, any user acting under the purpose or sub-purpose of Research - whether Research.Marketing or Research.Onboarding - will meet the criteria of this policy. Consequently, purpose hierarchies eliminate the need for a governor to rewrite these global policies when sub-purposes are added or removed. Furthermore, if new projects with new Research purposes are added, for example, the relevant global policy will automatically be enforced.

Please refer to the data governor policy guide for a tutorial on purpose-based restrictions on data.

Masking policies

Masking policies hide values in data, providing various levels of utility while still preserving privacy.

Hashing

This policy masks the values with an irreversible sha256 hash, which is consistent for the same value throughout the data source, so you can count or track the specific values, but not know the true raw value.

Hashed values are different across data sources, so you cannot join on hashed values unless you enable masked joins on data sources within a project. Immuta prevents joins on hashed values to protect against link attacks where two data owners may have exposed data with the same masked column (a quasi-identifier), but their data combined by that masked value could result in a sensitive data leak.

Replace with NULL

This policy makes values null, removing any utility of the data the policy applies to.

Replace with constant

With this policy, you can replace the values with the same constant value you choose, such as 'Redacted', removing any utility of that data.

Regular expression (regex)

This policy is similar to replacing with a constant, but it provides more utility because you can retain portions of the true value. For example, the following regex rule would mask the final digits of an IP address:

Mask using a regex \d+$ the value in the columns ip_address for everyone.

In this case, the regular expression \d+$

\d matches a digit (equal to [0-9])

+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)

$ asserts position at the end of the string, or before the line terminator right at the end of the string (if any)

This ensures we capture the last digit(s) after the last . in the ip address. We then can enter the replacement for what we captured, which in this case is XXX. So the outcome of the policy, would look like this: 164.16.13.XXX

Rounding

This is a technique to hide precision from numeric values while providing more utility than simply hashing. For example, you could remove precision from a geospatial coordinate. You can also use this type of policy to remove precision from dates and times by rounding to the nearest hour, day, month, or year.

With reversibility

This option masks the values using hashing, but allows users to submit an unmasking request to users who meet the exceptions of the policy.

Note: The user receiving the unmasking request must send the unmasked value to the requester.

With Reversible Masking, the raw values are switched out with consistent values to allow analysis without revealing the underlying sensitive data. The direct identifier is replaced with a token that can still be tracked or counted.

With format preserving masking

This option masks the value, but preserves the length and type of the value.

This option also allows users to submit an unmasking request to users who meet the exceptions of the policy.

Preserving the data format is important if the format has some relevance to the analysis at hand. For example, if you need to retain the integer column type or if the first 6 digits of a 12-digit number have an important meaning.

Custom function

This option uses functions native to the underlying database to transform the column.

Limitations

The masking functions are executed against the remote database directly. A poorly written function could lead to poor quality results, data leaks, and performance hits.
Using custom functions can result in changes to the original data type. In order to prevent query errors you must ensure that you cast this result back to the original type.
The function must be valid for the data type of the selected column. If it is not
- Local policies will error and show a message that the function is not valid.
- Global policies will error and change to the default masking type (hashing for text and NULL for all others).

Conditionally masking

For all of the policies above, both at the local and global policy levels, you can conditionally mask the value based on a value in another column. This allows you to build a policy that looks something like: "Mask bank account number where country = 'USA'" instead of blindly stating you want bank account masked always.

Note: When building conditional masking policies with custom SQL statements, avoid using a column that is masked using randomized response in the SQL statement, as this can lead to different behavior depending on your data platform and may produce results that are unexpected.

With k-anonymization

Sample data is processed during computation of k-anonymization policies

When a k-anonymization policy is applied to a data source, the columns targeted by the policy are queried under a fingerprinting process that generates rules enforcing k-anonymity. The results of this query, which may contain data that is subject to regulatory constraints such as GDPR or HIPAA, are stored in Immuta's metadata database.

The location of the metadata database depends on your deployment:

Self-managed Immuta deployment: The metadata database is located in the server where you have your external metadata database deployed.
SaaS Immuta deployment: The metadata database is located in the AWS global segment you have chosen to deploy Immuta.

To ensure this process does not violate your organization's data localization regulations, you need to first activate this masking policy type before you can use it in your Immuta tenant. To enable k-anonymization for your account, see the k-anonymization section on the app settings how-to guide.

K-anonymity is measured by grouping records in a data source that contain the same values for a common set of quasi identifiers (QIs) - publicly known attributes (such as postal codes, dates of birth, or gender) that are consistently, but ambiguously, associated with an individual.

The k-anonymity of a data source is defined as the number of records within the least populated cohort, which means that the QIs of any single record cannot be distinguished from at least k other records. In this way, a record with QIs cannot be uniquely associated with any one individual in a data source, provided k is greater than 1.

In Immuta, masking with k-anonymization examines pairs of values across columns and hides groups that do not appear at least the specified number of times (k). For example, if one column contains street numbers and another contains street names, the group 123, "Main Street" probably would appear frequently while the group 123, "Diamondback Drive" probably would show up much less. Since the second group appears infrequently, the values could potentially identify someone, so this group would be masked.

After the fingerprint service identifies columns with a low number of distinct values, users will only be able to select those columns when building the policy. Users can either use a minimum group size (k) given by the fingerprint or manually select the value of k.

Note: The default cardinality cutoff for columns to qualify for k-anonymization is 500. For details about adjusting this setting, navigate to the App Settings Tutorial.

Masking multiple columns with k-anonymization

Governors can write global data policies using k-anonymization in the global data policy builder.

When this global policy is applied to data sources, it will mask all columns matching the specified tag.

Applying k-anonymization over disjoint sets of columns in separate policies does not guarantee k-anonymization over their union.

If you select multiple columns to mask with k-anonymization in the same policy, the policy is driven by how many times these values appear together. If the groups appear fewer than k times, they will be masked.

For example, if Policy A

Policy A: Mask with k-anonymization the values in the columns gender and state requiring a group size of at least 2 for everyone

was applied to this data source

Gender

State

Female

Ohio

Female

Florida

Female

Florida

Female

Arkansas

Male

Florida

the values would be masked like this:

Gender

State

Null

Female

Florida

Female

Florida

Null

Note: Selecting many columns to mask with k-anonymization increases the processing that must occur to calculate the policy, so saving the policy may take time.

However, if you select to mask the same columns with k-anonymization in separate policies, Policy C and Policy D,

Policy C: Mask with k-anonymization the values in the column gender requiring a group size of at least 2 for everyone
Policy D: Mask with k-anonymization the values in the column state requiring a group size of at least 2 for everyone

the values in the columns will be masked separately instead of as groups. Therefore, the values in that same data source would be masked like this:

Gender

State

Female

Null

Female

Florida

Female

Florida

Female

Null

Florida

Using randomized response

This policy masks data by slightly randomizing the values in a column, preserving the utility of the data while preventing outsiders from inferring content of specific records.

For example, if an analyst wanted to publish data from a health survey she conducted, she could remove direct identifiers and apply k-anonymization to indirect identifiers to make it difficult to single out individuals. However, consider these survey participants, a cohort of male welders who share the same zip code:

participant_id

zip_code

gender

occupation

substance_abuse

...

880d0096

75002

Male

Welder

f267334b

75002

Male

Welder

bfdb43db

75002

Male

Welder

260930ce

75002

Male

Welder

046dc7fb

75002

Male

Welder

...

All members of this cohort have indicated substance abuse, sensitive personal information that could have damaging consequences, and, even though direct identifiers have been removed and k-anonymization has been applied, outsiders could infer substance abuse for an individual if they knew a male welder in this zip code.

In this scenario, using randomized response would change some of the Y's in substance_abuse to N's and vice versa; consequently, outsiders couldn't be sure of the displayed value of substance_abuse given in any individual row, as they wouldn't know which rows had changed.

How the randomization works

Immuta applies a random number generator (RNG) that is seeded with some fixed attributes of the data source, column, backing technology, and the value of the high cardinality column, an approach that simulates cached randomness without having to actually cache anything.

For string data, the random number generator essentially flips a biased coin. If the coin comes up as tails, which it does with the frequency of the replacement rate configured in the policy, then the value is changed to any other possible value in the column, selected uniformly at random from among those values. If the coin comes up as heads, the true value is released.

For numeric data, Immuta uses the RNG to add a random shift from a 0-centered Laplace distribution with the standard deviation specified in the policy configuration. For most purposes, knowing the distribution is not important, but the net effect is that on average the reported values should be the true value plus or minus the specified deviation value.

Preserving data utility

Using randomized response doesn't destroy the data because data is only randomized slightly; aggregate utility can be preserved because analysts know how and what proportion of the values will change. Through this technique, values can be interpreted as hints, signals, or suggestions of the truth, but it is much harder to reason about individual rows.

Additionally, randomized response gives deniability of record content not dataset participation, so individual rows can be displayed.

Mixing masking policies on the same column

In some cases, you may want several different masking policies applied to the same column through Otherwise policies. To build these policies, select everyone who instead of everyone or everyone except. After you specify who the masking policy applies to, select how it applies to everyone else in the Otherwise condition.

You can add and remove tags in Otherwise conditions for global policies (unlike local policy Otherwise conditions), as illustrated above; however, all tags or regular expressions included in the initial everyone who rule must be included in an everyone or everyone except rule in the additional clauses.

Complex data types: masking fields within struct columns (public preview)

Feature limitations

Masking struct and array columns is only available for Databricks data sources.
Immuta only supports Parquet and Delta table types.

Spark supports a class of data types called complex types, which can represent multiple data values in a single column. Immuta supports masking fields within array and struct columns:

array: an ordered collection of elements
struct: a collection of elements that are primitive or complex types

Without this feature enabled, the struct and array columns of a data source default to jsonb in the Data Dictionary, and the masking policies that users can apply to jsonb columns are limited. For example, if a user wanted to mask PII inside the column patient in the image below, they would have to apply null masking to the entire column or use a custom function instead of just masking name or address.

After Complex Data Types is enabled on the App Settings page, the column type for struct columns for new data sources will display as struct in the Data Dictionary. (For data sources that are already in Immuta, users can edit the data source and change the column types for the appropriate columns from jsonb to struct.) Once struct fields are available, they can be searched, tagged, and used in masking policies. For example, a user could tag name, ssn, and street as PII instead of the entire patient column.

After a global or local policy masks the columns containing PII, users who do not meet the exception specified in the policy will see these values masked:

Note: Immuta uses the > delimiter to indicate that a field is nested instead of the . delimiter, since field and column names could include ..

Caveats

Struct Columns with Many Fields

If users have struct columns with many fields, they will need to either

create the data source against a cluster running Spark 3 or
add spark.debug.maxToStringFields 1000 to their Spark 2 cluster's configuration.

To get column information about a data source, Immuta executes a DESCRIBE call for the table. In this call, Spark returns a simple string representation of the schema for each column in the table. For the patient column above, the simple string would look like this:

struct<name:string,ssn:string,age:int,address:struct<city:string,state:string,zipCode:string,street:text>>

Immuta then parses this string into the following format for the data source's dictionary:

{
  dataType: 'struct',
  children: [
    {
      name: 'name',
      dataType: 'text'
    },
    {
      name: 'ssn',
      dataType: 'text'
    },
    {
      name: 'age',
      dataType: 'integer'
    },
    {
      name: 'address',
      dataType: 'struct',
      children: [
        {
          name: 'city',
          dataType: 'text'
        },
        {
          name: 'state',
          dataType: 'text'
        },
        {
          name: 'zipCode',
          dataType: 'text'
        },
        {
          name: 'street',
          dataType: 'text'
        },
      ]
    }
  ]
}

However, if the struct contains more than 25 fields, Spark truncates the string, causing the parser to fail and fall back to jsonb. Immuta will attempt to avoid this failure by increasing the number of fields allowed in the server-side property setting, maxToStringFields; however, this only works with clusters on a Spark 3 runtime. The maxToStringFields configuration in Spark 2 cannot be set through the ODBC driver and can only be set through the Spark configuration on the cluster with spark.debug.maxToStringFields 1000 on cluster startup.

External masking

Deprecation notice: Support for this feature has been deprecated.

This feature allows Immuta to unmask data that is masked at rest in a remote database using a customer-provided encryption or masking algorithm. To do so,

System Administrators build their own custom logic and security in an external REST service. Because Immuta always pushes down the masked version of the literal when the user is exempt from the policy, the organization should use deterministic IVs/salt to ensure the same value is masked consistently throughout the data.
System Administrators give Immuta access to the external REST service and configure tags that will be used by data owners to indicate that data is masked at rest in the remote database.
Data owners apply these tags to columns that are masked (with encryption or another algorithm) in the remote database.
Data owners or governors create data policies that allow Immuta to reach out to this external REST service to unmask data according to the specifications in the policy.

Immuta’s External Masking feature expects data to be masked at rest by an external tool consistently on a per-cell basis in the remote database. Immuta then provides policy-based unmasking (and additional masking on top of this using standard masking policies).

Unmasking process

Immuta will only unmask externally masked data if two conditions are met:

A masking policy is applied against that tagged column.
The querying user is exempt from that policy.

When a user who is exempt from the policy restrictions queries that masked column using a filter, Immuta converts the literal being queried using the external algorithm provided. Consider the following example:

The social_security_number column is masked on-ingest and has the tag externally_masked_data applied to it.
This masking policy is applied to the data source in Immuta: Mask using hashing the values in the column tagged externally_masked_data except for users who belong to the group view_masked_values.
The querying user belongs to the view_masked_values group.

When the user above runs the query select * from table A where social_security_number = 220869988, Immuta converts 220869988 to the masked value using the provided algorithm to query the database and return matching rows.

Use equality queries only

Queries against masked values on-ingest should be equality queries only. For example, if an exempt user ran a query like select * from table A where social_security_number > 220869988, the results may not make sense (depending on the algorithm used for masking the data).

Tutorials

To configure External Masking, see the App Settings Tutorial.
For an implementation guide, see the External Masking Interface.

Row-level security policies

These policies hide entire rows or objects of data based on the policy being enforced; some of these policies require the data to be tagged as well.

Note: When building row-level policies with custom SQL statements, avoid using a column that is masked using randomized response in the SQL statement, as this can lead to different behavior depending on whether you’re using the Spark or Snowflake and may produce results that are unexpected.

Matching

These policies match a user attribute with a row/object/file attribute to determine if that row/object/file should be visible. This process uses a direct string match, so the user attribute would have to match exactly the data attribute in order to see that row of data.

For example, to restrict access to insurance claims data to the state for which the user's home office is located, you could build a policy such as this:

Only show rows where user possesses an attribute in Office Location that matches the value in the column State for everyone except when user is a member of group Legal.

In this case, the Office Location is retrieved by the identity management system as a user attribute or group. If the user's attribute (Office Location) was Missouri, rows containing the value Missouri in the State column in the data source would be the only rows visible to that user.

WHERE clause policy

This policy can be thought of as a table "view" created automatically for the user based on the condition of the policy. For example, in the policy below, users who are not members of the Admins group will only see taxi rides where passenger_count < 2.

Only show rows where public.us.taxis.passenger_count <2 for everyone except when user is a member of group Admins.

You can put any valid SQL WHERE clause in the policy. See the Custom WHERE clause functions for a list of custom functions.

WHERE clause policy requirement

All columns referenced in the policy must have fully qualified names. Any column names that are unqualified (just the column name) will default to a column of the data source the policy is being applied to (if one matches the name).

Time-based restrictions

These policies restrict access to rows/objects/files that fall within the time restrictions set in the policy. If a data source has time-based restriction policies, queries run against the data source by a user will only return rows/blobs with a date in its event-time column/attribute from within a certain range.

The time window is based on the event time you select when creating the data source. This value will come from a date/time column in relational sources.

Minimization

These policies return a limited percentage of the data, which is randomly sampled, at query time. but it is the same sample for all the users. For example, you could limit certain users to only 10% of the data. Immuta uses a hashing policy to return approximately 10% of the data, and the data returned will always be the same; however, the exact number of rows exposed depends on the distribution of high cardinality columns in the database and the hashing type available. Additionally, Immuta will adjust the data exposed when new rows are added or removed.

Best practice: row count

Immuta recommends you use a table with over 1,000 rows for the best results when using a data minimization policy.

Masked columns as input for row-level policies

Public preview: This feature is currently in public preview and available to all accounts.

If a global masking policy applies to a column, you can still use that masked column in a global row-level policy.

Consider the following policy examples:

Masking policy: Mask values in columns tagged Country for everyone except users in group Admin.
Row-level policy: Only show rows where user possesses an attribute in OfficeLocation that matches the value in column tagged Country for everyone.

Both of these policies use the Country tag to restrict access. Therefore, the masking policy and the row-level policy would apply to data source columns with the tag Country for users who are not in the Admin group.

Limitations

This feature is only available for Snowflake and Databricks Unity Catalog integrations.
This feature is only supported for global data policies, not local data policies.

New column added policy

This policy pairs with schema monitoring to mask newly added columns to data sources until data owners review and approve these changes from the requests tab of their profile page.

When this policy is activated by a governor, it will automatically be enforced on data sources that have the New tag applied to them by sensitive data discovery.

To learn how to activate this policy, navigate to the tutorial.

Secure Your Data

Getting started

Introduction

Authoring policies in Secure

Domains

Projects and purpose-based access control

Data consumers

Getting Started with Secure

Prerequisites

Use cases

Automate Data Access Control Decisions

Who is this for?

Prerequisite

Goals

Table vs column access

Business value

Configuration steps

The Two Paths: Orchestrated RBAC and ABAC

Orchestrated RBAC: one-to-one

ABAC: many-to-many

Managing Data Metadata

Prerequisites

Data tags for Detect

Considerations

Applying data tags

Data tag hierarchy

Author Policy

Path 1: Orchestrated RBAC policy authoring

Path 2: ABAC policy authoring

Manual overrides

Managing User Metadata

Managing Data Products

Data product metadata registration

Who will do this?

Related guide

Managing Data Metadata

Tags in a federated governance world

Monitoring data products

Apply Federated Governance

Subscription policies

Subscription policies for a data product "shopping" experience

Data policies

Strategy

Introduction

Challenge and goals

How does it work?

Scalability and Evolvability

ABAC vs RBAC

Separating policy definition from role definition

Example: Building row-level security with an ABAC model

Policy boolean logic

Exception-based policy authoring

Hierarchical tag-based policy definitions

Subscription policies: benefits of attribute-based table GRANTs

Availability of Data

Example of anonymizing a column rather than blocking it

Using k-anonymization to mask columns

Cell-level security

Data Engineering with Limited Policy Downtime

Prerequisites

Step 1: Create global policies and prepare tags for data sources

Auto-tagging (recommended)

Manually tagging with an external catalog

Manually tagging in Immuta

Step 2: Register your data in Immuta

Step 3: Consider the result and user making transformations

Views vs tables

Access level of job executioner

Step 4: Force data downtime

Best practices for Snowflake

Best practices for Databricks Unity Catalog

Step 5: Initiate policy uptime

Alert Immuta through the API or a custom function

Periodic schema monitoring

Recommended Immuta policy types

How-to Guides

Author a Subscription Policy

Write access policy requirements

Enable write access policies

Create a policy

S3 Access Grants `READ` and `READWRITE` access levels

The `@attributeValuesContains()` Function

The `@columnTagged()` Function

The `@groupsContains()` Function

The `@hasAttribute()` Function

The `@iam` Function