1 of 17

Getting Started with Secure

Select your use case

Immuta Secure allows you to secure your data through various access control policies you configure.

Prerequisites

Data platform integration configured in Immuta
Users and data sources have been registered in Immuta

Use cases

Immuta Secure allows you to build complex access control policies in a simple, scalable manner. However, there are many different ways customers think about access control, so Secure is extremely flexible. Because of this, Immuta has broken up Secure into common use cases that customers follow to speed up their onboarding process. Choose the use case below that best fits your goals. If none of them fit, contact your customer support representative for a more personalized onboarding experience:

Automate data access control decisions: This is the most common use case. It walks you through how to build table access control policies in a scalable manner and clarifies how to think about table access and adjust existing paradigms.
Compliantly open more sensitive data for ML and analytics: This is an approach where every user has access to every table, yet you mask the sensitive columns appropriately using data policies. You can apply some of the concepts from the automate data access control decisions use case to how you think about masking policy rules.
Federated governance for data mesh and self-serve data access: Data mesh is a concept where you delegate operational control of data products to data producers across your organization. This requires delegation of policy management as well, which is the goal of this use case. This use case also covers a generic data shopping experience where users can discover and subscribe to data.

The use cases are a starting point for your Secure journey - you don’t necessarily need to follow them exactly, but they should give you the conceptual framework for thinking through your own use cases, so it is encouraged to read them all.

Automate Data Access Control Decisions

Who is this for?

This guide is intended for users who want to build table access control policies in a scalable manner using Immuta.

Prerequisites

Data platform integration configured in Immuta
Users and data sources have been registered in Immuta

Goals

This use case is the most common across customers. With it, you solve the problem of entitlement of users to data based on metadata about the user (attributes and groups) and metadata about the data (tags).

This use case is unique to Immuta, because rather than coupling an access decision with a role, you are able to instead decouple access decisions from user and data metadata. This is powerful because it means when metadata about the users or metadata about the data changes, the access for a user or set of users may also change - it is a dynamic decision.

Decoupling access decisions from metadata also eliminates the classic problem of role explosion. In a world where policy decisions are coupled to a role, you must manage a new role for every permutation of access, causing an explosion of roles to manage. Instead, with this use case you decouple that logic from the user metadata and data metadata, so real-time, dynamic decisions are possible. Immuta’s method of building policies allows for these many permutations in a clear, concise manner through the decoupling of policy logic from the user and data metadata.

This use case also eliminates the need to have a human in-the-loop for approval to data access. If you can describe clearly why the approver would approve an access request - that can instead be expressed as an Immuta subscription policy, as you’ll see below. Removing humans from this process increases the speed to access data, makes access decisions consistent, and removes error and favoritism.

Want to learn more? Check out this ABAC 101 blog on the methodology.

Table vs column access

Lastly, this use case is primarily focused on automating table grants, termed subscription policies in Immuta. We recommend you also read the Compliantly open more sensitive data for ML and analytics use case to learn about column masking, which will allow you to enforce more granular controls so that you can open more data. It's common to see customers mix the automate data access control decisions use case with the compliantly open more sensitive data for ML and analytics use case, so it's recommended to read both.

Business value

Following this use case will reap huge operational cost savings. You will have to manage far fewer data policies; those policies will be more granular, accurate, and easily authored; and you will be able to prove compliance more easily. Furthermore, instead of an explosion of incomprehensible roles, the users and data will have meaningful metadata matched with clear policies.

Quantified benefits:

75x fewer policy changes required to accomplish the same use cases
70% reduction in dedicated resources to data access management and policy administration
Onboarding new employees two weeks faster and provisioning new data one week faster
5% to 15% increase in client contract values to improve revenue

Unquantified benefits:

Improved compliance standard and security posture
Enhanced employee satisfaction
Better customer experiences

More details on the business value can be found in these reports:

GIGAOM: ABAC vs RBAC: The Advantage of Attribute-Based Access Control over Role-Based Access Control (Immuta is an OT-ABAC approach)
Forrester: The Total Economic Impact Of Immuta

Configuration steps

Follow these steps to learn more about and start using Immuta to automate data access control decisions:

Complete the Monitor and secure sensitive data platform query activity use case.
Read the overview of the two access control paths: orchestrated RBAC and ABAC. These are the two different approaches (or mix) you can take to managing policy in Secure and their tradeoffs.
Manage user metadata. This step is critical to building scalable policy and understanding the considerations around how and what to capture. Tag your users with attributes and groups that are meaningful for Immuta global policies.
Manage data metadata. This is the final setup step you must complete before authoring policy. Tag your columns with tags that are meaningful.
Author policy. In this step, you will define your global data policy logic. Optionally test and deploy policy.

The Two Paths: Orchestrated RBAC and ABAC

When following this use case, you need to decide which path you fall in. This decision drives how you will manage user and data metadata as well as policies, so it's a critical decision.

If you aren’t sure, you should strive for . While it may seem more complicated to get started, in the long run it will provide you powerful flexibility and scalability of policy management. In this method, you tag your users and data with facts and prescribe policies that leverage those facts to make real-time decisions.

puts more strain on managing access decisions outside of your access logic (Immuta) because you need all access decisions in a single attribute on the user. Because of this, it more closely resembles the role explosion problem, and if you incorrectly select this path you will end up there over time. Orchestrated RBAC is tag-orchestrated RBAC and is supported by Immuta (in fact, many customers stick to this because of the benefits of the tag-orchestration).

Orchestrated RBAC: one-to-one

Do you have many permutations of static access with no overlap?

This method is for organizations whose access decision typically depends on a single variable:

If you have x, you have access to everything tagged y.

Furthermore, the access decision to objects tagged y never strays beyond x. In other words, there’s only ever one way to get access to objects tagged y - a 1:1 relationship. A good real-world example of orchestrated RBAC is

You must have signed data use agreement x to have access to data y.

ABAC: many-to-many

Do you have a lot of variables at play to determine access?

This method is for organizations that may have many differing x’s for access to objects y. Furthermore, x and y may actually be many different variables, such as

If you have a, b, and c you get access to objects tagged x, y, and z.

If you have d, you get access to objects tagged x.

Notice in this example that access to objects tagged x can happen through different decisions.

ABAC is also important when you have federated governance in play, where many different people are making policy logic decisions and that logic may stack on the same objects. A good real world example of ABAC is

you must reside in the US and be a full time employee to see data tagged US and Highly Sensitive.

Managing User Metadata

No matter if you choose orchestrated RBAC or ABAC as described in the prior guide, you must have metadata on your users. This can be done in Immuta, via your identity manager, or other custom sources.

Doing so allows you to build scalable policies that do not reference individual users and instead use metadata about them to target them with access decisions. In Immuta, user metadata is termed user attributes and groups.

Considerations

Referring back to our triangle of access decision, user metadata is the first point of that triangle. User metadata is made up of the following elements:

Attributes: key:value pairs tied to your users
Groups: a single value tied to your users - you can think of groups as containing users. Groups can also have their own attributes, a shortcut for assigning attributes to all members of a group. In either case, multiple users can be attached to the same attribute or group.

Referring back to the two paths: orchestrated RBAC and ABAC will help drive how you manage user metadata.

Fact-based user metadata (ABAC) allows you to decouple policy logic from user metadata, allowing highly scalable ABAC policy authoring:

Steve has attribute: country:USA
Sara has attribute: role:administrator
Stephanie has attribute: sales_region: Ohio, Michigan, Indiana
Sam is in group: developers
Group developers has attribute: organization:engineering

Logic-based user metadata (orchestrated-RBAC) couples user metadata with access logic:

Steve has attribute: access_to:USA
Sara has attribute: role:admin_functions
Stephanie has groups: Ohio_sales, Michigan_sales, Indiana_sales
Sam is in group: all_developer_data
Group developer has attribute: access_to:AWS_all

Typically these groups (most common) and attributes come from your identity management system. If they do, they also potentially contain access logic as part of them (examples in the logic-based user metadata list above). It’s not the end of the world if they do, though this means you are potentially set up for orchestrated RBAC. If each group from your identity management system represents a single static fact that provides access for a set of objects, with no overlap with other facts that provide access - then you can use orchestrated RBAC. A simple example to demonstrate groups ready for orchestrated RBAC is “you must have signed license agreement x to see data tagged x” - it’s a simple fact: you’ve signed license agreement x, that provides access to data tagged x. There are no other paths to data tagged x.

But the key to orchestrated RBAC is that it’s a 1:1 relationship with the group from your identity manager and the set of objects that group gives them access to. If this is not true, and there are multiple permutations of groups that have implied policy logic and that logic provides access to the same sets of objects, then you are in a situation with role explosion. This is a problem that Immuta can help you solve, and you should solve that problem with ABAC.

The one exception where you have the above scenario but may want to use orchestrated RBAC is if these multiple paths for access to the same objects are a true hierarchy. For example, data is tagged with Strictly Confidential, Confidential, Internal, and Public:

users assigned group Strictly Confidential can see data tagged Strictly Confidential, Confidential, Internal, and Public
users assigned group Confidential can see data tagged Confidential, Internal, and Public

In this case, you could use orchestrated RBAC even though there are multiple paths of access to the same objects because it is a one-to-one relationship represented in a hierarchy that contains all paths. How you actually write this policy will be described later.

Orchestrated RBAC: user attributes and groups

Since orchestrated RBAC is a one-to-one match (or a true hierarchy), you can use your groups as-is from your identity manager or other custom sources.

Without getting into the details of how the policy is authored (yet), it is possible to leverage a user attribute hierarchy to contribute to access decisions through hierarchical matching. For example, users with attributes that are either a hierarchical parent or an exact match to a given data tag can be subscribed to the data source via a hierarchical match. To better illustrate this behavior, see the matrix below:

User attributes

Data source Tags

Subscribed?

Notes

'Exercise': ['Gym.Treadmill', 'Sports.Football.Quarterback']

['Athletes.Performance', 'Sports.Football.Quarterback']

Yes

Exact match on 'Sports.Football.Quarterback'

'Exercise': ['Gym.Weightlifting', 'Sports.Football']

['Athletes.Performance', 'Sports.Football.Quarterback']

Yes

User attribute 'Sports.Football' is a hierarchical parent of data source tag 'Sports.Football.Quarterback'

'News Articles': ['Gym.Weightlifting', 'Sports.Football']

['Athletes.Performance', 'Sports.Football.Quarterback']

The policy is written to only match values under the 'Exercise' attribute key. Not 'News Articles'.

'Exercise': ['Sports']

['Sports.Baseball.Pitcher']

Yes

User attribute 'Sports' is a hierarchical parent of data source tag 'Sports.Baseball.Pitcher'

'Exercise': ['Gym.Gymnastics.Trapeze', 'Sports.Football.Quarterback']

['Gym.Gymnastics', 'Sports.Football']

Hierarchical matching only happens in one direction (user attribute contains data source tag). In this case, the user attributes are considered hierarchical children of the data source tags.

So you should take care organizing your user attributes with a hierarchy that will support the hierarchical matching.

Going back to our hierarchical example of Strictly Confidential, Confidential, Internal, and Public above, you would want to ensure that user attributes follow the same hierarchy. For example,

A user with access to all data: Classification: Strictly Confidential
A user with access to only Internal and Public: Classification: Strictly Confidential.Confidential.Internal

Remembering the last row in Figure 3, hierarchical matching only happens in a single direction → user attribute contains the data source tag. Since the user attribute is more specific (Strictly Confidential.Confidential.Internal) than data tagged Strictly Confidential and Strictly Confidential.Confidential, the user would only gain access to data tagged Strictly Confidential.Confidential.Internal and Strictly Confidential.Confidential.Internal.Public.

ABAC: user attributes and groups

To use ABAC, you need to divorce your mind from thinking that every permutation of access requires a new role. (We use the word role interchangeably with groups in this document, because many identity and access systems call groups roles, especially when they contain policy logic.)

Instead you need to focus on the facts about your users they must possess in order to grant them access to data. Ask yourself, why should someone have access to this data? It’s easiest to come at this from the compliance perspective and literally ask yourself that question for all the compliance rules that exist in your organization.

Why should someone have access to my PHI data? Example answers might be

If they are an analyst
If they have taken privacy training

So, at a minimum, you are going to need to attach attributes or groups to users representing if they are an analyst and if they have taken privacy training. Note you do not have to (nor should you) negate these attributes, meaning nobody should have a group not_analyst and no_privacy_training.

Repeat this process until you have a good starting point for the user attributes you need. This is a lot of upfront work and may lead to the perception that ABAC also suffers from attribute explosion. However, when attributes are well enumerated, there’s a natural limit to how many different relevant attributes will be present in an organization. This differs substantially from the world where every permutation of access is represented with a role, where role growth starts off slow, but increases linearly over time.

As an illustration, well-defined subject attributes tend to follow the red curve over time, while roles and poorly defined subject attributes tend to follow the blue and quickly become unwieldy.

Applying user metadata

Where do those attributes and groups come from? Really, wherever you want - Immuta has several interfaces to make this work, and you can combine them:

Option 1: Store them in your identity manager. If you have the power to do this, you should. It becomes the primary source of facts about your users instead of the source of policy decisions, which should not live in your identity manager. We recommend synchronizing this user metadata from your identity manager to Immuta over SCIM where possible. (This concept was covered in the Detect getting started.)

Option 2: Store them directly in Immuta. It is possible to manually add attributes and groups to users directly in the Immuta UI or through the API.

Option 3: Refer to them from some other external source of truth. For example, perhaps training courses are already stored in another system. If so, you can able to build an external user into endpoint to pull those groups (or attributes) into Immuta. However, there are some considerations with this approach tied to which IAM protocol you use. If your identity manager restricts you from using the approach in the “Consume attributes/groups from arbitrary sources” row in the IAM protocol table, you can instead periodically push the attributes from the source of truth into Immuta using our APIs.

All of these options assume you have already configured your identity manager in Immuta. This has to be done in order to create the user identities that these attributes and groups will be attached to.

Managing Data Metadata

Prerequisites

Your schema metadata is registered using either of the Detect use cases:

Monitor and secure sensitive data platform query activity use case (Snowflake only)
General Immuta configuration use case (if not using Snowflake)

Data tags for Detect

You should have already done some data tagging while configuring Immuta in the Detect getting started. That guide focuses on understanding where compliance issues may exist in your data platform and may not have fully covered all tags required for policy. Read on to see if there's more work to be done with data tagging.

Considerations

Now that we’ve enriched facts about our users, let’s focus on the second point on the policy triangle diagram: the data tags. Just like you need user metadata, you need metadata on your data (tags) in order to decouple policy logic from referencing physical tables or columns. You must choose between the orchestrated RBAC or ABAC method of data access:

Orchestrated RBAC method: tag data sources at the table level
ABAC method: tag data at the table and column level

While it is possible to target policies using both table- and column-level tags, for ABAC it’s more common to target column tags because they represent more granularly what is in the table. Just like user metadata needs to be facts about your users, the data metadata must be facts about the data. The tags on your tables should not contain any policy logic.

Fact-based column tags are descriptive (recommended):

Column ssn has column tag social security number
Column f_name has column tag name
Column dob has column tags date and date of birth

Logic-based column tags requires subjective decisions (not recommended):

Column ssn has column tag PII
Column f_name has column tag sensitive
Column dob has column tag indirect identifier

But can't I get policy authoring scalability by tagging things with higher level classifications, like PII, so I can build broader policies? This is what Immuta’s classification frameworks are for.

Entity tags are facts about the contents of individual columns in isolation. Entity tags are what we listed above: social security number, name, date, and data of birth. Entity tags do not attempt to contextualize column contents with neighboring columns' contents. Instead, categorization and classification tags describe the sensitive contents of a table with the context of all its columns, which is what is listed in the logic-based tags above, things like PII, sensitive, and indirect identifier.

For example, under the HIPAA framework a list of procedures a doctor performed is only considered protected health information (PHI) if it can be associated with the identity of patients. Since entity tagging operates on a single column-by-column basis, it can’t reason whether or not a column containing procedure codes merits classification as PHI. Therefore, entity tagging will not tag procedure codes as PHI. But categorization tagging will tag it PHI if it detects patient identity information in the other columns of the table.

Additionally, entity tagging does not indicate how sensitive the data is, but categorization tags carry a sensitivity level, the classification tag. For example, an entity tag may identify a column that contains telephone numbers, but the entity tag alone cannot say that the column is sensitive. A phone number associated with a person may be classified as sensitive, while the publicly-listed phone number of a company might not be considered sensitive.

Contextual tags are really what you should target with policy where possible. This provides a way to create higher level objects for more scalable and generic policy. Rather than building a policy like “allow access to tables with columns tagged person name and phone number,” it would be much easier to build it like “allow access to tables with columns tagged PII.”

In short, you must tag your entities, and then rely on a classification framework (provided by Immuta or customized by you) to provide the higher level context, also as tags. Remember, the owners of the tables (those who created them) can tag the data with facts about what is in the columns without having to understand the higher level implications of those tags (categorization and classification). This allows better separation of duty.

For orchestrated-RBAC, the data tags are no longer facts about your data, they are instead a single variable that determines access. As such, they should be table-level tags (which also improves the amount of processing Immuta must do).

Applying data tags

There are several options for doing this, and if you are following along with the use cases for Detect getting started, you may have already accomplished the recommended option 1.

Immuta Discover's sensitive data discovery (SDD): This is the most powerful option. Immuta is able to discover your sensitive data, and you are able to extend what types of entities are discovered to those specific to your business. SDD can run completely within your data platform, with no data leaving at all for Immuta to analyze. SDD is more relevant for the ABAC approach because the tags are facts about the data.
Tags from an external source: You may have already done all the work tagging your data in some external catalog, such as Collibra, Alation, or your own homegrown tool. If so, Immuta can pull those tags in and use them. Out of the box Immuta supports Alation, Collibra, and Snowflake tags, and for anything else you can build a Custom REST Catalog Interface. But remember, just like user metadata, these should represent facts about your data and not policy decisions.
Manually tag: Just like with user metadata, you are able to manually tag tables and columns in Immuta from within the UI, using the Immuta API, or when registering the data, either during initial registration or subsequent tables discovered in the future through schema monitoring.

Data tag hierarchy

Just like hierarchy has an impact with user metadata, so can data tag hierarchy. We discussed the matching of user metadata to data metadata in the Managing user metadata guide. However, there are even simpler approaches that can leverage data tag hierarchy beyond matching. This will be covered in more detail in the Author policy guide, but is important to understand as you think through data tagging.

As a quick example, it is possible to tag your data with Cars and then also tag that same data with more specific tags (in the hierarchy) such as Cars.Nissan.Xterra. Then, when you build policies, you could allow access to tables tagged Cars to administrators, but only those tagged Cars.Nissan.Xterra to suv_inspectors. This will result in two separate policies landing on the same table, and the beauty of Immuta is that it will handle the conflict of those two separate policies. This provides a large amount of scalability because you have to manage far fewer policies.

Imagine if you didn’t have this capability? You would have to include administrators access to every policy you created for the different vehicle makes - and if that policy needed to evolve, such as adding more than administrators to all cars, it would be an enormous effort to make that change. With Immuta, it’s one policy change.

Author Policy

Now we are ready to focus on the third point of the triangle for data access decisions: access control policies, and, more specifically, how you can author them with Immuta now that you have the foundation with your data and user metadata in place.

It’s important to understand that you only need a starting point to begin onboarding data access control use cases - you do not have to have every user perfectly associated with metadata for all use cases, nor do you need all data tagged perfectly for all use cases. Start small on a focused access control use case, and grow from there.

The policy authoring approaches are broken into the two paths you should choose between discussed previously.

Path 1: Orchestrated RBAC policy authoring

With orchestrated RBAC you have established one-to-one relationships with how your users are tagged (attribute or group) and what that single tag explicitly gives them access to. Remember, the user and data metadata should be facts and not reflect policy decisions veiled in a single fact; otherwise, you are setting yourself up for role explosion and should be using ABAC instead.

Since orchestrated RBAC is all about one-to-one matching of user metadata to data metadata, Immuta has special functions in our subscription policies for managing this:

@hasTagAsAttribute('Attribute Name', 'dataSource' | 'column')
@hasTagAsGroup('dataSource' | 'column')

The first argument for @hasTagAsAttribute is the attribute key whose value tags will be matched against (or for @hasTagAsGroup simply the group name), and the second argument specifies whether the attribute values are matched against data source tags or column tags.

Understanding this, let’s use a simple example before we get into a hierarchical example. For this simple example, let’s say there are Strictly Confidential tables and Public Tables. Everyone has access to Public, and only special people have access to Strictly Confidential. Let’s also pretend there aren’t other decisions involved with someone getting access to Strictly Confidential (there probably is, but we’ll talk more about this in the ABAC path). It is just a single fact: you do or you don’t have that fact associated with you.

Then we could tag our users with their access under the Access attribute key:

Bob: Access: Strictly Confidential
Steve: Access: Public

Then we could also tag our tables with the same metadata:

Table 1: Strictly Confidential
Table 2: Public

Now we build a policy like so:

And ensure we target all tables with the policy in the builder:

Here is the policy definition payload YAML for the above policy using the v2 api:

actions:
  advanced: '@hasTagAsAttribute(''Access'', ''dataSource'')'
  allowDiscovery: false
  automaticSubscription: true
  shareResponsibility: false
  type: entitlements
name: Access
policyKey: Access
staged: false
template: false
type: subscription

Since this policy applies to all data sources, you only need a single policy in Immuta. When you change user metadata or data metadata, the access to those tables will automatically be updated by Immuta through the matching algorithm (user metadata matches the data metadata).

We could also use column tags instead, by tagging the columns appropriately:

Table 1 --> Column 2: Strictly Confidential
This tag change would require the following alteration to the function: @hasTagAsAttribute('Access', 'column') instead of @hasTagAsAttribute('Access', 'dataSource').

Finally, we could use groups instead of attributes:

Bob: Strictly Confidential
Steve: Public
This change would require the following alteration to the function: @hasTagAsGroup('Access', 'column') instead of @hasTagAsAttribute('Access', 'column').

In the above case, we are using column tags again, but it could (and probably should) be data source tags. The reason is that orchestrated RBAC is not using facts about what is in your data source (the table) to determine access. Instead, it's using the single variable as a tag, and because of that, the single variable is better represented as a table tag. This also reduces the amount of processing Immuta has to do behind the scenes.

We can get more complicated with a hierarchy as well, but the approach remains the same: you need a single policy and you use the hierarchy to solve who should gain access to what by moving from most restrictive to least restrictive in your hierarchy tree on both the users and data metadata.

Groups approach:

Bob: Strictly Confidential
Steve: Strictly Confidential.Confidential.Internal.Public

In this case, Steve would only gain access to tables or tables with columns tagged Strictly Confidential.Confidential.Internal.Public, but Bob would have access to tables or tables with columns tagged Strictly Confidential, Strictly Confidential.Confidential, Strictly Confidential.Confidential.Internal, and Strictly Confidential.Confidential.Internal.Public.

Orchestrated RBAC one-to-one matching can be powerful, but can also be a slippery slope to role explosion. You can quickly fall into rolling-up many decisions into a single group or attribute tied to the user (like Strictly Confidential) instead of prescriptively describing the real root reason of why someone should have access to Strictly Confidential in your policy engine. This is the power behind ABAC, and using the ABAC method of access can give you ultimate scalability and evolvability of policy with ease.

Path 2: ABAC policy authoring

Let’s say that having access to Strictly Confidential data isn’t a single simple fact attached to a user like we described in the orchestrated RBAC section (it likely isn't). Instead, let’s try to boil down why you would give someone access to Strictly Confidential. Let’s say that to have access to Strictly Confidential, someone should be

an employee (not contractor)
in the US
part of the Legal team

Those are clearly facts about users, more so than Strictly Confidential. If you can get to the root of the question like this, you should, because Immuta allows you to pull in that information about users to drive policy.

Why should you bother getting to the root?

Well, consider what we just learned about how to author policy with orchestrated RBAC above. Now let’s say your organization changes their mind and now wants the policy for access to Strictly Confidential to be this:

be an employee (not contractor)
be in the US or France
be part of the Legal team

With orchestrated RBAC this would be a nightmare, unless you already have a process to handle it. You would have to manually figure out all the users that meet that new criteria (in France) and update them in and out of Strictly Confidential. This is why orchestrated RBAC should always use a single fact that gains a user access.

However, if you are on the ABAC path, it’s a single simple policy logic change because of our powerful triangle that decouples policy logic from user and metadata attributes.

Let’s walk through an example:

The best approach is to build individual policies for each segment of logic associated with why someone should have access to Strictly Confidential information. Coming back to the original logic above, that would mean 3 (much more understandable) separate policies:

Here is the policy definition payload YAML for the above policy using the v2 api:

actions:
  allowDiscovery: false
  automaticSubscription: true
  entitlements:
    groups:
      - Employees
    operator: all
  shareResponsibility: false
  type: entitlements
circumstanceOperator: any
circumstances:
  - columnTag: Strictly Confidential
    type: columnTags
name: Employee Access
policyKey: Employee Access
staged: false
template: false
type: subscription

Here is the policy definition payload YAML for the above policy using the v2 api:

actions:
  allowDiscovery: false
  automaticSubscription: true
  entitlements:
      attributes:
        - name: Country
          value: US
      operator: all
  shareResponsibility: false
  type: entitlements
circumstanceOperator: any
circumstances:
  - columnTag: Strictly Confidential
    type: columnTags
name: Country Access
policyKey: Country Access
staged: false
template: false
type: subscription

Here is the policy definition payload YAML for the above policy using the v2 api:

actions:
  allowDiscovery: false
  automaticSubscription: true
  entitlements:
    groups:
      - Legal Team
    operator: all
  shareResponsibility: false
  type: entitlements
circumstanceOperator: any
circumstances:
  - columnTag: Strictly Confidential
    type: columnTags
name: Legal Team Accesss
policyKey: Legal Team Accesss
staged: false
template: false
type: subscription

Each policy targets tables tagged Strictly Confidential (they could alternatively target columns tagged that):

When each of those policies converge on a single table tagged Strictly Confidential, they will be merged together appropriately by Immuta, demonstrating Immuta's conflict resolution:

And whether merged with an AND or an OR is prescribed by this setting in the policy builder for each of the 3 individual policies:

If either side of the policies are Always Required that will result in an AND; otherwise, if both sides agree to Share Responsibility it will result in an OR. In this case, each of the 3 policies chose Always Required, which is why the three policies were AND’ed together on merge. Note this merging behavior is relevant to how you can have different users manage policies without risk of conflicts.

Now let’s move into our nightmare (without Immuta) scenario where the business has changed the definition of who should have access to Strictly Confidential data by adding France along with the other existing three requirements.

With Immuta, all you’ve got to do is change that single Country Access subscription policy to include France, and voila, you’re done:

Here is the policy definition payload YAML for the above policy using the v2 api:

actions:
  allowDiscovery: false
  automaticSubscription: true
  entitlements:
    attributes:
      - name: Country
        value: US
      - name: Country
        value: France
    operator: any
  shareResponsibility: false
  type: entitlements
circumstanceOperator: any
circumstances:
  - columnTag: Strictly Confidential
    type: columnTags
name: Country Access Two
policyKey: Country Access
staged: false
template: false
type: subscription

Which results in this merged policy on all the Strictly Confidential tagged tables:

As you can see, ABAC is much more scalable and evolvable. Policy management becomes a breeze if you are able to capture facts about your users and facts about your data, and decouple policy decisions from it.

Also be aware, this auto-merging logic Immuta provides enables you to also combine orchestrated RBAC and ABAC policies.

Manual overrides

Last but not least are manual overrides. While this use case is all about automating access decisions, there may be cases where you want to give a user access even if they don’t meet the automated criteria.

This is possible through this setting in each policy:

In this scenario, users marked as Owner of the registered data source in Immuta have the power to approve this override request by the user. There are ways to specify different approvers as well, or multiple approvers.

Also notice the grayed out Allow Data Source Discovery: normally, if you do not meet the criteria of a policy, the data source is not even visible to you in the Immuta UI or the data platform. However, if you have this checked, it is visible in the Immuta UI even if you don’t meet the policy. This must be checked in order to request approval to access. Otherwise, users would never find the data source to ask for approval in the Immuta UI.

Be aware that all policies that land on the table must have this same setting turned on in order to allow the manual override.

Test and Deploy Policy

Now that you have a sense of what policies you want to enforce, it is sometimes necessary to first test those policies before deploying them. This is important if you have existing users accessing the tables you are protecting and you want to understand the impact to those users before moving a policy to production. However, consider this step optional.

It's also important to remember that Immuta subscription policies are additive, meaning no existing SELECT grants on tables will be revoked by Immuta when a subscription policy is created. It is your responsibility to revoke all pre-Immuta SELECT grants once you are happy with Immuta's controls.

Use a single Immuta for testing policies

While it may seem wise to have a separate Immuta tenant for development and production mapped to separate development and production data platforms, that is not necessary nor recommended because there are too many variables to account for and keep up-to-date in your data platform and identity management system:

Many Immuta data policies are enforced with a heavy reliance on the actual data values. Take for example the following row-level policy: only show rows where user possesses an attribute in Work Location that matches the value in the column tagged Discovered.Entity.Location. This policy compares the user’s office location to the data in the column tagged Location, so if you were to test this policy against a development table with incorrect values, it is an invalid test that could lead to false positives or false negatives.
Similar to #1, if you are not using your real production users, just like having invalid data can get you an invalid test result, so could having invalid or mock user attributes or groups.
Policies can (and should) target tables using tags discovered by or (such as Alation or Collibra). For SDD, that means if using development data, the data needs to match the production data so it is discovered and tagged correctly. For external catalogs, that would mean you need your external catalog to have everything in development tagged exactly like it is in production.
Users can have attributes and groups from , so similar to #3, you would need to have that all synchronized correctly against your development user set as it is synchronized for your production user set.
Your development user set may also lack all relevant permutations of access that need to be tested (sets of attributes/groups relevant to the policy logic). These permutations are not knowable a priori because they are dependent on the policy logic. So you would have to create all permutations (or validate existing ones) every time you create a new policy.
Lastly, you have a development data environment to test data changes before moving to production. That means your development data environment needs the production policies in place. In other words, policies are part of what needs to be replicated consistently across development environments.

Be aware, this is not to suggest that you don’t need a development data platform environment; that could certainly be necessary for your transformation jobs/testing (which should include policy, per #6). However, for policy testing it is a bad approach to use non-prod data and non-prod users because of the complexity of replicating everything perfectly in development - by the time you’ve done all that, it matches what’s in production exactly.

Best practice for policy testing

Immuta recommends testing against clones of production data, in a separate database, using your production data platform and production user identities with a single Immuta tenant. If you believe you do have a perfectly matching development environment that covers 1-5 in the section above, you can use it (and should for #6), but we still recommend a single Immuta tenant because Immuta allows logical separation of development and production policy work without requiring physically separated Immuta tenants.

Logically separating your data platform

So how do you test policies without impacting production workloads if we are testing against production? You create a logical separation of development and production in your data platform. What this means is that rather than physically separating your data into completely separate accounts or workspaces (or whatever term your data warehouse may use), logically separate development from production using the mechanisms provided by the data platform - such as databases. This reduces the amount of replication management burden you need to undertake significantly, but does put more pressure on ensuring your policy controls are accurate - which is why you have Immuta!

Many of our customers already take this approach for development and testing in their data platform. Instead of having physically separate accounts or workspaces of their data platform, they create logical separation using databases and table/view clones within the same data platform instance. If you are already doing this, you can skip the rest of this section.

Follow the below recommendations on how to create logical separation of development data in your data platform using the following approaches:

Redshift: Create new views that are backed by the tables in question to a different development database and register them with Immuta. Since Immuta enforces policies in Redshift using views, creating new views will not be impacted by any existing policy on the tables that back the views.
Starburst (Trino): You should virtualize the tables as new tables to a different development catalog in Starburst (Trino) for testing the policy.
Azure Synapse Analytics: You should virtualize the tables as new tables to a different development database in Synapse for testing the policy.

If you are managing creation of tables in an automated way, it may be prudent to automatically create the clones/views as described above as part of that process. That way you don’t have to pick and choose tables to logically separate every time you need to do policy work - they are always logically separated already (and presumably discovered and registered by Immuta because they are registered for monitoring as well).

Append the database name to the names of cloned data sources: When you register the clones you’ve created with Immuta, ensure you append the database to your naming convention so you don’t hit any naming conflicts in Immuta.

Logically separating Immuta

If using a physically separate development data platform, ensure you have the Immuta data integration configured there as well.

If you are unable to leverage domains, we recommend adding an additional tag when the data is registered so it can be included when you describe where to target the policy. For example, tag the tables with policy 1 tag.

Ensuring development data is tagged correctly

If using SDD to tag data:

If using an external catalog:

If manually tagging:

Testing a new policy

Once the policy is activated (using either technique), it will only apply to the cloned development tables/views because the policy was scoped to that domain/tag, or because the user creating the policy is scoped to managing policies in that domain. At this point, you can invite users that will be impacted by the policy to test the change or your standard set of testers.

Once you have your testers set, you can give them all an attribute such as policy test: [policy in question] and use it to create a "subscription-testers" subscription policy. It should target only the tables/views in that domain per the steps described in the above paragraph in order to give them access to the development tables/views to test.

If you are testing a masking or row-level security policy, you don't need to do anything else; it's ready for your policy testers. However, if your ultimate goal is to test a subscription policy, you need to build it separately from the above "subscription-testers" subscription policy, but ensure that you select the Always Required option for "subscription-testers" subscription policy. That means when the "subscription-testers" subscription policy is merged with the real subscription policy you are testing, both policies will be considered and some of your test users may accurately be blocked from the table (which is part of the testing). For example, notice how these two policies (the "subscription-testers" and real policy being tested) were merged with an AND.

There's nothing stopping you from testing a set of changes (many subscription policies, many data policies) all together as well.

Once you’ve received feedback from your test users and are happy with the policy change, you can move on to the deployment of the policy. For example, add prod-domain, remove dev-domain, and then apply the policy. Or build the policy using a user that has manage policy in the production domain.

Testing an edit to an existing policy

The approach here is the same as above; however, before applying your new policy to the development domain, you should first locally disable the pre-existing global policy in the data source. This can be done by going into the development data source in the Immuta UI as the data owner and disabling the policy you plan to change in the policy tab of the data source. This must be done because the clone will have the pre-existing policy applied because it should have all matching tags.

Now you can create the new version of the policy in its place without any conflict.

Policy approvals

Sometimes you may have a requirement that multiple users approve a policy before it can be enabled for everyone. If you are in this situation, you may want to first consider doing this in git (or any source control tool) using our policy-as-code approach, so all approvals can happen in git.

Compliantly Open More Sensitive Data for ML and Analytics

Who is this for?

This guide is intended for users who want to open more data for access by creating more granular and powerful policies at the data layer.

Prerequisites

and have been registered in Immuta

Goals

Firstly, it's crucial to remember that just because a subscription policy, as described in the , grants a user access to data, it doesn’t mean that they should have access to all of that data. Often, organizations stop at just granting access without considering the nuances of what specific columns or rows should be accessible to different users. It's important to see the process all the way through by masking sensitive values that are not necessary for a user's role. This ensures that while users have the data access they need, sensitive information is appropriately protected.

Secondly, when considering in the context of , an interesting perspective emerges. A subscription policy could essentially be seen as mirroring the functionality of a global masking policy. This is because, like a global masking policy, a subscription policy can be used to mask or redact the entirety of a table. This interpretation underscores the potential of global data policies for comprehensive data protection.

One of the primary advantages is an easy and maintainable way to manage data leak risks without impeding data access, which means more data for ML and analytics. By focusing on global data policies, organizations can ensure that sensitive data, down to the row and column level, is appropriately protected, regardless of who has access to it. This means that while data remains broadly accessible for business operations and decision-making, the risk of data leaks is significantly reduced. This is because you can

be more specific with your policies as described above and
mask using advanced that allow you to get utility from data in a column while still preserving privacy in that same column.

However, it's important to note that this approach does not mean that you should never create subscription policies. Subscription policies still have their place in data governance. The key point here is that the primary focus shifts away from subscription policies and towards global data policies, which offer a more comprehensive and effective approach to data protection. This shift in focus allows for more nuanced control over data access, enhancing both data security and compliance.

When is this appropriate?

It's also fitting when you grant access to tables to everyone. In such cases, the focus is less on who has access and more on what they can access. Global data policies can help ensure that while data is broadly accessible, sensitive information is appropriately masked or redacted, maintaining compliance and security.

In essence, this use case is appropriate when you want to maintain or improve data accessibility while ensuring robust data protection, regardless of your current table grants.

When is this not appropriate?

Data sensitivity

If the existence of certain tables, schemas, or columns is considered sensitive information within your organization, this solution pattern may not be appropriate. Revealing the existence of certain data, even without granting access to the actual data, can pose a security risk in some contexts. In such cases, a more restrictive strategy may be required.

With this use case, users might have to navigate through a large number of tables to find the data they need. This could potentially hinder user experience, especially in large organizations with extensive data environments.

Configuration steps

Follow these steps to learn more about and start using Immuta to compliantly open more sensitive data for ML and analytics:

Managing User Metadata

You may have read the already. If so you are aware of the you must choose between: orchestrated-RBAC vs ABAC. To manage user metadata with this particular use case, you should use ABAC.

This is because you must know the contents and sensitivity of every column in your data ecosystem to follow this use case. With orchestrated RBAC, you tag your columns with access logic baked in. ABAC means you tag your columns with facts: what is in the column. It is feasible to do the latter, extremely hard to do the former (unless you use described in the next topic), especially in a data ecosystem with constant change. This means that your users will need to have facts about them that drive policy decisions (ABAC) rather than single variables that drive access (as in orchestrated-RBAC).

Understanding that, read the ABAC section in the automate data access control decisions use case's guide.

Managing Data Metadata

Prerequisites

using either of the Detect use cases:

use case (Snowflake or Databricks)
use case (if not using Snowflake nor Databricks).

Enriching your data metadata with tags

You may have read the use case already. If so, you are aware of the you must choose between: orchestrated-RBAC vs ABAC. To manage data metadata with this particular use case, you should use ABAC.

This is because you must know the contents and sensitivity of every column in your data ecosystem to follow this use case. With orchestrated RBAC, you tag your columns with access logic baked in. ABAC means you tag your columns with facts: what is in the column. It is feasible to do the latter, extremely hard to do the former (unless you use , described below), especially in an data ecosystem with constant change.

Understanding that, read the guide.

However, there are some considerations specific to this use case with regard to data metadata.

Automated data tagging

Automated data tagging with Immuta is recommended with this use case because it significantly reduces the manual effort involved in classifying and managing data. It ensures that data is consistently and accurately tagged, which is crucial for implementing effective data policies. Moreover, it allows for real-time tagging of data, ensuring that new or updated data is immediately protected by the appropriate policies. This is critical when all columns need to be considered for policy immediately, which is the case with this use case.

Schema monitoring enabled

While not directly related to data tagging, is a feature in Immuta that allows organizations to keep track of changes in their data environments. When enabled, Immuta actively monitors the data platform to find when new tables or columns are created or deleted. It then automatically registers or disables these tables in Immuta and updates the tags. This feature ensures that any global policies set in Immuta are applied to these newly created or updated data sources. It is assumed you are using schema monitoring when also using SDD. Without this feature, data will remain in while waiting for the policies to update.

Data tags are facts

As discussed above, with ABAC your data tags need to be facts. Otherwise, a human must be involved to bake access logic into your tags. As an example, it's easy for SDD to find an address, but it's harder for a human to decide if that address should be sensitive and who should have access to it - all defined with a single tag - for every column!

Tag lineage

Author Policy

In the use case, we covered at length. Subscription policies control table level access.

In this use case, we are focused one level deeper: columns and rows within a table that can be protected in a more granular manner with .

Global data policy: column masking

With Immuta, you are able to , or even mask cells within a given column per row using (also termed conditional masking).

This is important for this use case to granularly mask or grant unmasked access to specific columns to specific users. This granularity of policy allows more access to data because you no longer have to roll up coarse access decisions to the table level and can instead make them at the individual column, row, or cell level.

Using tags

The concept of masking columns has been mentioned a few times already, but as you already know per the guide, your masking policies will actually target tags instead of physical columns. This abstraction is powerful, because it allows you to build a single policy that may apply to many different columns across many different tables (or ).

Masking conflicts

If you build policy using tags, isn't there a good chance that multiple masking policies could land on the same column? Yes.

This can be avoided . Immuta supports tag hierarchy, so if the depth of a tag targeted by a policy is deeper than the conflicting policy, the deepest (the more specific one) wins. As an example, mask by making null columns tagged PII is less specific (depth of 1) than the policy mask using hasing columns tagged Discovered.name (depth of 2), so the hashing masking policy would apply to columns tagged Discovered.name and PII rather than the null one.

Principle of least privilege

Immuta meets principle of least privilege by following an . What this means is that if you mask a column, that mask will apply to everyone except [some list of exceptions]. This uni-directional approach avoids policy conflicts, makes change management easier, authoring policy less complex, and (most importantly) avoids data leaks.

Masking techniques

There are many different approaches you can take to masking a column. Some masks render the column completely useless to the querying user, such as nulling it, while other masking techniques can provide some level of utility from the column while at the same time maintaining a level of privacy and security. These advanced masking techniques are sometimes termed . Immuta provides a that allow for privacy-vs-utility trade-off decisions when authoring masking policies.

Dealing with data types

If you were to build masking policies natively in your data platform, they require that you build a masking policy per data type it could mask. This makes sense, because a varchar column type can't display numerics, or vice versa, for example. Furthermore, when building masking policies that target tags instead of physical columns, it is possible that policy may target many differing data types, or even target new unforeseen data types in the future when new columns appear.

Global data policy: row-level

You may hear this policy called row filter or row access policy. The idea is to redact rows at query time based on the user running the query.

Without this capability, you would need a transform process that segments your data across different tables and then manage access to those tables. This introduces extra compute costs and at some point, when dealing with large tables and many differing permutations of access, it may be impossible to maintain as those tables grow.

Using tags

Advanced functions

@groupsContains('SQL Column or Expression') allows you to compare the value of a column to see if the user possesses any groups with a matching name (case sensitive).
@attributeValuesContains('Attribute Name', 'SQL Column or Expression') allows you to compare the value of a column to see if the user possesses any attribute values under a specific key with a matching name (case sensitive).
@purposesContains('SQL Column or Expression') allows you to compare the value of a column to see if the user is acting under a matching purpose (case sensitive).
@columnTagged('Tag Name') allows you to target tags instead of physical columns when using one of the above functions.

Here's a simple example that targets a physical column COUNTRY comparing it to the querying user's country attribute key's values:

@attributeValuesContains('country', 'COUNTRY')

This could also be written to instead use the @columnTagged function instead of the physical column name:

@attributeValuesContains('country', @columnTagged('Discovered.Country'))

Allowing this policy to be reused across many different tables that may have the COUNTRY column name spelled differently.

Data policy temporary overrides

...for everyone except users acting under purpose [some legitimate purpose(s)]

Data sources with the policies they want to be excluded from and
Purposes

This can be made temporary by deleting the project once access is no longer needed or revoking approval for the project after the need for access is gone.

Subscription policies

After you've created global data policies as described above, how do you actually give access to the tables?

Federated Governance for Data Mesh and Self-Serve Data Access

Who is this for?

This guide is intended for users who wish to delegate control of data products within a data mesh without breaking their organization's security and compliance standards.

Prerequisites

Data platform integration configured in Immuta
Users and data sources have been registered in Immuta

Goals

This guide is designed for those interested in understanding how Immuta can effectively assist organizations in adopting core data mesh principles. By leveraging this document, readers will be empowered to implement the essential procedures for integrating Immuta within their organization's data mesh framework. It is recommended to read the ebook Data Security for Data Mesh Architectures before starting with this use case.

Even if you are not interested in data mesh, this use case will explain how you can create a self-serve "data shopping experience" in Immuta.

Configuration steps

Follow these steps to learn more about and start using Immuta to apply federated governance for data mesh and self-serve data access:

Complete the Monitor and secure sensitive data platform query activity use case.
Opt to review the Automate data access control decisions and Compliantly open more sensitive data for ML and analytics use cases.
Define domains. This step is critical in order to delegate policy controls over certain segments of your data.
Manage data products to allow your data product creators to compliantly release data products for consumption.
Manage data metadata. This is the final setup step you must complete before authoring policy. Tag your columns with tags that are meaningful to ensure federated governance.
Work with your data product creators and global compliance users to Apply federated governance. Enforce global standards across all data products while empowering data product owners to expand on those policies.
Allow your data consumers to self-serve access data through the Immuta data portal by Discovering and subscribing to data products.

Defining Domains

Domains are containers of data sources that allow you to assign data ownership and access management to specific business units, subject matter experts, or teams at the nexus of cross-functional groups. Instead of centralizing your data governance and giving users too much governance over all your data, you delegate and control their power over data sources they own by granting them permission within domains in Immuta.

Domains are typically defined by your organization’s data governance board and closely aligned with business departments or units, such as Sales, Customer Service, or Finance. Domains can also be defined by value stream, such as Patient Health Records, Fraud Detection, or Quality Control.

Who will do this?

The task of setting up domains in Immuta can be done by any member of the team with the GOVERNANCE permission and assigning domain permissions can be done by any member of the team with the USER_ADMIN permission.

Consider users' jobs in a domain

When delegating control to your data product owners, it's possible to further segment that control within domains. If you want to give the user control to build policies on the data sources (the data products) in the domain, that is managed by giving them the Manage Policies permission in the domain.

Managing Data Products

A data product is a valuable output derived from data. It is designed to provide insights, support decision-making, and drive business value. Data products are created to meet business needs or solve problems within a domain or organization. These products can take various forms. In most cases, it is a collection of tables and views, which is what is supported by Immuta.

Data product metadata registration

The use case covers how to register your data metadata, but it's worth discussing in more depth here because you have multiple data product owners.

As a first step, all the tables/views on the data platform(s) must be onboarded in Immuta. With the (which is the Immuta default), onboarding objects in Immuta will not make any changes to existing accesses already in place.

Immuta only starts taking control when the first policy has been applied. We suggest that one team registers all objects from your data platform(s). Alternatively, different domain owners can register their own domain data. It is recommended to use a system account to do the data registration.

Regardless of which user registered the data, it's critical that is enabled to ensure future tables and views are discovered by Immuta and automatically registered to the proper domain.

Be aware that registering your data product metadata with Immuta does not make it available to users. It simply makes Immuta aware of its existence. To allow users to see it and subscribe, you would manage policies on it, discussed in the guide next.

Who will do this?

Any team member that has the necessary user credentials in the underlying platform to read the data.

Managing Data Metadata

You may have read the Automate data access control decisions use case already. If so you are aware of the two paths you must choose between: orchestrated-RBAC vs ABAC. To manage data metadata with this particular use case, you should use ABAC.

This is because you want your data product owners to tag data with facts - what they have intimate knowledge of because they built the data product - and not have to be knowledgeable about all policies across your organization. With orchestrated RBAC, you tag your columns with access logic baked in. ABAC means you tag your columns with facts, what is in the column, which is why ABAC makes much more sense.

Understanding that, read the automate data access control decisions use case's Managing data metadata guide.

Tags in a federated governance world

It is important to distinguish between tag definition and tag application. While tag definition (e.g., a tag called “Business Unit” with the values “Finance”, “Marketing”, “Sales”) should be strongly governed to guarantee consistency and coherence, tag application can be fully decentralized, meaning every domain or data owner can apply tag values (from the centrally governed list) to their data. There needs to be a process in place for data owners to request the definition of a new tag in case they identify any gaps.

Monitoring data products

It is important to leverage Immuta Discover's sensitive data discovery (SDD) to monitor your data products. This allows you to uncover if and when sensitive data may be leaked unknowingly by a data product and mitigate that leak quickly.

The Monitor and secure sensitive data platform query activity use case covers this in great detail and is highly recommended for a data mesh environment.

Apply Federated Governance

In the distributed domains of a data mesh architecture, data governance and access control must be applied vertically (locally) within specific domains or data products and horizontally (globally). Global policies should be authored and applied in line with the ecosystem’s most generic and all-encompassing principles, regardless of the data’s domain (e.g., mask all PII data). Localized domain- or product-level policies should be fine-grained and applicable to only context-specific purposes or use cases (e.g., only show rows in the Sales table where the country value matches the user's office location). In Immuta we distinguish between subscription policies and data policies.

Subscription policies

It is possible to build Immuta subscription policies across all data products (horizontal), as shown in the diagram above, and then have those policies merged with additional subscription policies authored at the domain level (vertical). When this occurs, those subscription policies are merged as prescribed by the two or more policies being merged.

Whether the requirements for access are merged with an AND or an OR is prescribed by this setting in the policy builder for each of the individual policies:

Always Required = AND
Share Responsibility = OR

Subscription policies for a data product "shopping" experience

When building subscription policies, it can impact what a user can discover and, if desired, "put in their shopping cart" to use.

Allow Data Source Discovery: Normally, if a user does not meet the subscription policy, that data source is hidden from them in the Immuta UI. Should you check this option when building your subscription policy, the inverse is true: anyone can see this data source. This is important if you want users to understand if the data product exists, even if they don't have access.
Require Manual Subscription: Even if the user does meet the policy, instead of automatically subscribing them, they would have to discover and subscribe themselves. If they meet the policy, they will automatically be subscribed with no intervention. This is important if you want the users to maintain the list of data products they see in the data platform rather than all data products they have access to.
Request Approval to Access: This allows the user to request access, even if they don't meet the policy. Rules can determine what user manually overrides the policy to let them in.

Data policies

Once a user is subscribed to a data source via Immuta or has a pre-existing direct access on the underlying data platform, the data policies that are applied to that data source determine what data the user sees. Data policy types include masking, row-level, and other privacy-enhancing techniques.

An exemplary three-step approach to managing data policies would be

Update the global data policies as new sensitive data is potentially released by data products and discovered using Immuta Detect.

Strategy

Data mesh is a higher level use case that pulls from concepts learned across the other use cases:

Discover and Subscribe to Data Products

Data sources to users in the Immuta UI. This allows users to read the metadata associated with data sources (e.g., tags, descriptions, data dictionary, and contacts) and permits users to request access to the data itself.

This allows customers to use Immuta as a data catalog and marketplace. In case an organization uses an external marketplace, Immuta easily integrates with it via the API to bring in metadata or manage access controls for users that requested access via external marketplace.

While we have been talking about data mesh, using Immuta as a marketplace for a data shopping experience is not limited to data mesh use cases - it can be used this way for any use case.

User authentication

Immuta integrates with your identity manager in order to enforce controls in your data platform(s) - you likely configured this when completing the use case. This setup has another point, though: it also allows all your users to authenticate into Immuta with their normal credential to leverage it for this "shopping" experience.

Any action taken in Immuta will automatically be reflected in the data platform via Immuta's integration.

Author Policy

The policy authoring approaches are broken into the two paths you should choose between discussed previously.

Path 1: Orchestrated RBAC policy authoring

Since orchestrated RBAC is all about one-to-one matching of user metadata to data metadata, Immuta has special functions in our subscription policies for managing this:

@hasTagAsAttribute('Attribute Name', 'dataSource' | 'column')
@hasTagAsGroup('dataSource' | 'column')

Then we could tag our users with their access under the Access attribute key:

Bob: Access: Strictly Confidential
Steve: Access: Public

Then we could also tag our tables with the same metadata:

Table 1: Strictly Confidential
Table 2: Public

Now we build a policy like so:

And ensure we target all tables with the policy in the builder:

Here is the policy definition payload YAML for the above policy using the v2 api:

actions:
  advanced: '@hasTagAsAttribute(''Access'', ''dataSource'')'
  allowDiscovery: false
  automaticSubscription: true
  shareResponsibility: false
  type: entitlements
name: Access
policyKey: Access
staged: false
template: false
type: subscription

We could also use column tags instead, by tagging the columns appropriately:

Table 1 --> Column 2: Strictly Confidential
This tag change would require the following alteration to the function: @hasTagAsAttribute('Access', 'column') instead of @hasTagAsAttribute('Access', 'dataSource').

Finally, we could use groups instead of attributes:

Bob: Strictly Confidential
Steve: Public
This change would require the following alteration to the function: @hasTagAsGroup('Access', 'column') instead of @hasTagAsAttribute('Access', 'column').

Groups approach:

Bob: Strictly Confidential
Steve: Strictly Confidential.Confidential.Internal.Public

Path 2: ABAC policy authoring

an employee (not contractor)
in the US
part of the Legal team

Why should you bother getting to the root?

be an employee (not contractor)
be in the US or France
be part of the Legal team

However, if you are on the ABAC path, it’s a single simple policy logic change because of our powerful triangle that decouples policy logic from user and metadata attributes.

Let’s walk through an example:

Here is the policy definition payload YAML for the above policy using the v2 api:

actions:
  allowDiscovery: false
  automaticSubscription: true
  entitlements:
    groups:
      - Employees
    operator: all
  shareResponsibility: false
  type: entitlements
circumstanceOperator: any
circumstances:
  - columnTag: Strictly Confidential
    type: columnTags
name: Employee Access
policyKey: Employee Access
staged: false
template: false
type: subscription

Here is the policy definition payload YAML for the above policy using the v2 api:

actions:
  allowDiscovery: false
  automaticSubscription: true
  entitlements:
      attributes:
        - name: Country
          value: US
      operator: all
  shareResponsibility: false
  type: entitlements
circumstanceOperator: any
circumstances:
  - columnTag: Strictly Confidential
    type: columnTags
name: Country Access
policyKey: Country Access
staged: false
template: false
type: subscription

Here is the policy definition payload YAML for the above policy using the v2 api:

actions:
  allowDiscovery: false
  automaticSubscription: true
  entitlements:
    groups:
      - Legal Team
    operator: all
  shareResponsibility: false
  type: entitlements
circumstanceOperator: any
circumstances:
  - columnTag: Strictly Confidential
    type: columnTags
name: Legal Team Accesss
policyKey: Legal Team Accesss
staged: false
template: false
type: subscription

Each policy targets tables tagged Strictly Confidential (they could alternatively target columns tagged that):

When each of those policies converge on a single table tagged Strictly Confidential, they will be merged together appropriately by Immuta, demonstrating Immuta's conflict resolution:

And whether merged with an AND or an OR is prescribed by this setting in the policy builder for each of the 3 individual policies:

With Immuta, all you’ve got to do is change that single Country Access subscription policy to include France, and voila, you’re done:

Here is the policy definition payload YAML for the above policy using the v2 api:

actions:
  allowDiscovery: false
  automaticSubscription: true
  entitlements:
    attributes:
      - name: Country
        value: US
      - name: Country
        value: France
    operator: any
  shareResponsibility: false
  type: entitlements
circumstanceOperator: any
circumstances:
  - columnTag: Strictly Confidential
    type: columnTags
name: Country Access Two
policyKey: Country Access
staged: false
template: false
type: subscription

Which results in this merged policy on all the Strictly Confidential tagged tables:

Also be aware, this auto-merging logic Immuta provides enables you to also combine orchestrated RBAC and ABAC policies.

Manual overrides

This is possible through this setting in each policy:

Be aware that all policies that land on the table must have this same setting turned on in order to allow the manual override.

Getting Started with Secure

Prerequisites

Use cases

Automate Data Access Control Decisions

Who is this for?

Prerequisites

Goals

Table vs column access

Business value

Configuration steps

The Two Paths: Orchestrated RBAC and ABAC

Orchestrated RBAC: one-to-one

ABAC: many-to-many

Managing User Metadata

Considerations

Orchestrated RBAC: user attributes and groups

ABAC: user attributes and groups

Applying user metadata

Managing Data Metadata

Prerequisites

Data tags for Detect

Considerations

Applying data tags

Data tag hierarchy

Author Policy

Path 1: Orchestrated RBAC policy authoring

Path 2: ABAC policy authoring

Manual overrides

Test and Deploy Policy

Use a single Immuta for testing policies

Best practice for policy testing

Logically separating your data platform

Logically separating Immuta

Ensuring development data is tagged correctly

Testing a new policy

Testing an edit to an existing policy

Policy approvals

Compliantly Open More Sensitive Data for ML and Analytics

Who is this for?

Prerequisites

Goals

When is this appropriate?

When is this not appropriate?

Data sensitivity

Data navigation

Configuration steps

Managing User Metadata

Managing Data Metadata

Prerequisites

Enriching your data metadata with tags

Automated data tagging

Schema monitoring enabled

Data tags are facts

Tag lineage

Author Policy

Global data policy: column masking

Using tags

Masking conflicts

Principle of least privilege

Masking techniques

Dealing with data types

Global data policy: row-level

Using tags

Advanced functions

Data policy temporary overrides

Subscription policies

Federated Governance for Data Mesh and Self-Serve Data Access

Who is this for?

Prerequisites

Goals

Configuration steps

Defining Domains

Who will do this?

Consider users' jobs in a domain

Related guides

Managing Data Products

Data product metadata registration

Who will do this?

Related guide

Managing Data Metadata