Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Orchestrated RBAC and ABA
When following this use case, you need to decide which path you fall in. This decision drives how you will manage user and data metadata as well as policies, so it's a critical decision.
If you aren’t sure, you should strive for ABAC. While it may seem more complicated to get started, in the long run it will provide you powerful flexibility and scalability of policy management. In this method, you tag your users and data with facts and prescribe policies that leverage those facts to make real-time decisions.
Orchestrated RBAC puts more strain on managing access decisions outside of your access logic (Immuta) because you need all access decisions in a single attribute on the user. Because of this, it more closely resembles the role explosion problem, and if you incorrectly select this path you will end up there over time. Orchestrated RBAC is tag-orchestrated RBAC and is supported by Immuta (in fact, many customers stick to this because of the benefits of the tag-orchestration).
Do you have many permutations of static access with no overlap?
This method is for organizations whose access decision typically depends on a single variable:
If you have x, you have access to everything tagged y.
Furthermore, the access decision to objects tagged y never strays beyond x. In other words, there’s only ever one way to get access to objects tagged y - a 1:1 relationship. A good real-world example of orchestrated RBAC is
You must have signed data use agreement x to have access to data y.
Do you have a lot of variables at play to determine access?
This method is for organizations that may have many differing x’s for access to objects y. Furthermore, x and y may actually be many different variables, such as
If you have a, b, and c you get access to objects tagged x, y, and z.
or
If you have d, you get access to objects tagged x.
Notice in this example that access to objects tagged x can happen through different decisions.
ABAC is also important when you have federated governance in play, where many different people are making policy logic decisions and that logic may stack on the same objects. A good real world example of ABAC is
you must reside in the US and be a full time employee to see data tagged US and Highly Sensitive.
This guide is intended for users who want to open more data for access by creating more granular and powerful policies at the data layer.
Users and data sources have been registered in Immuta
Firstly, it's crucial to remember that just because a subscription policy, as described in the Automate data access control decisions use case, grants a user access to data, it doesn’t mean that they should have access to all of that data. Often, organizations stop at just granting access without considering the nuances of what specific columns or rows should be accessible to different users. It's important to see the process all the way through by masking sensitive values that are not necessary for a user's role. This ensures that while users have the data access they need, sensitive information is appropriately protected.
Secondly, when considering subscription policies in the context of global data policies, an interesting perspective emerges. A subscription policy could essentially be seen as mirroring the functionality of a global masking policy. This is because, like a global masking policy, a subscription policy can be used to mask or redact the entirety of a table. This interpretation underscores the potential of global data policies for comprehensive data protection.
One of the primary advantages is an easy and maintainable way to manage data leak risks without impeding data access, which means more data for ML and analytics. By focusing on global data policies, organizations can ensure that sensitive data, down to the row and column level, is appropriately protected, regardless of who has access to it. This means that while data remains broadly accessible for business operations and decision-making, the risk of data leaks is significantly reduced. This is because you can
be more specific with your policies as described above and
mask using advanced privacy enhancing technologies (PETs) that allow you to get utility from data in a column while still preserving privacy in that same column.
However, it's important to note that this approach does not mean that you should never create subscription policies. Subscription policies still have their place in data governance. The key point here is that the primary focus shifts away from subscription policies and towards global data policies, which offer a more comprehensive and effective approach to data protection. This shift in focus allows for more nuanced control over data access, enhancing both data security and compliance.
This use case is particularly suitable in scenarios where you already have a process for granting access to tables. If your organization has established procedures for table access that are working effectively, introducing global data policies can enhance your data governance without disrupting existing workflows.
It's also fitting when you grant access to tables to everyone. In such cases, the focus is less on who has access and more on what they can access. Global data policies can help ensure that while data is broadly accessible, sensitive information is appropriately masked or redacted, maintaining compliance and security.
Lastly, this approach is appropriate when you have very generic subscription policies. Your native tool may refer to subscription policies as table GRANTs. If your subscription policies are not tailored to specific user attributes, roles, or data sensitivity levels, they may not provide adequate data protection. Shifting the focus to global data policies, such as data masking, allows for more nuanced and effective control over data access, enhancing both security and compliance.
In essence, this use case is appropriate when you want to maintain or improve data accessibility while ensuring robust data protection, regardless of your current table grants.
If the existence of certain tables, schemas, or columns is considered sensitive information within your organization, this solution pattern may not be appropriate. Revealing the existence of certain data, even without granting access to the actual data, can pose a security risk in some contexts. In such cases, a more restrictive strategy may be required.
With this use case, users might have to navigate through a large number of tables to find the data they need. This could potentially hinder user experience, especially in large organizations with extensive data environments.
Follow these steps to learn more about and start using Immuta to compliantly open more sensitive data for ML and analytics:
Complete the Monitor and secure sensitive data platform query activity use case to configure Immuta.
Opt to review the Automate data access control decisions use case.
Manage user metadata. This step is critical to building scalable policy and understanding the considerations around how and what to capture. Tag your users with attributes and groups that are meaningful for Immuta global data policies.
Manage data metadata. This is the final setup step you must complete before authoring policy. Tag your columns with tags that are meaningful for Immuta global data policies.
Author policy. In this step, you will define your global data policy logic for granularly masking and redacting rows and columns. Optionally test and deploy policy.
Immuta Secure allows you to secure your data through various access control policies you configure.
This section provides use cases to guide you through implementing Immuta Secure.
This section provides a conceptual overview of Immuta Secure and its benefits.
This section provides how-to and references guides for authoring policies to enforce data access controls.
This section provides how-to and reference guides for projects, which combine users and data sources under a common purpose that can then be used to restrict access to data and streamline collaboration.
The guides in this section illustrate how to subscribe to data sources in Immuta, run health jobs, and complete other actions as a data source subscriber.
This guide is intended for users who want to build table access control policies in a scalable manner using Immuta.
Users and data sources have been registered in Immuta
This use case is the most common across customers. With it, you solve the problem of entitlement of users to data based on metadata about the user (attributes and groups) and metadata about the data (tags).
This use case is unique to Immuta, because rather than coupling an access decision with a role, you are able to instead decouple access decisions from user and data metadata. This is powerful because it means when metadata about the users or metadata about the data changes, the access for a user or set of users may also change - it is a dynamic decision.
Decoupling access decisions from metadata also eliminates the classic problem of role explosion. In a world where policy decisions are coupled to a role, you must manage a new role for every permutation of access, causing an explosion of roles to manage. Instead, with this use case you decouple that logic from the user metadata and data metadata, so real-time, dynamic decisions are possible. Immuta’s method of building policies allows for these many permutations in a clear, concise manner through the decoupling of policy logic from the user and data metadata.
This use case also eliminates the need to have a human in-the-loop for approval to data access. If you can describe clearly why the approver would approve an access request - that can instead be expressed as an Immuta subscription policy, as you’ll see below. Removing humans from this process increases the speed to access data, makes access decisions consistent, and removes error and favoritism.
Want to learn more? Check out this ABAC 101 blog on the methodology.
Lastly, this use case is primarily focused on automating table grants, termed subscription policies in Immuta. We recommend you also read the Compliantly open more sensitive data for ML and analytics use case to learn about column masking, which will allow you to enforce more granular controls so that you can open more data. It's common to see customers mix the automate data access control decisions use case with the compliantly open more sensitive data for ML and analytics use case, so it's recommended to read both.
Following this use case will reap huge operational cost savings. You will have to manage far fewer data policies; those policies will be more granular, accurate, and easily authored; and you will be able to prove compliance more easily. Furthermore, instead of an explosion of incomprehensible roles, the users and data will have meaningful metadata matched with clear policies.
Quantified benefits:
75x fewer policy changes required to accomplish the same use cases
70% reduction in dedicated resources to data access management and policy administration
Onboarding new employees two weeks faster and provisioning new data one week faster
5% to 15% increase in client contract values to improve revenue
Unquantified benefits:
Improved compliance standard and security posture
Enhanced employee satisfaction
Better customer experiences
More details on the business value can be found in these reports:
GIGAOM: ABAC vs RBAC: The Advantage of Attribute-Based Access Control over Role-Based Access Control (Immuta is an OT-ABAC approach)
Forrester: The Total Economic Impact Of Immuta
Follow these steps to learn more about and start using Immuta to automate data access control decisions:
Complete the Monitor and secure sensitive data platform query activity use case.
Read the overview of the two access control paths: orchestrated RBAC and ABAC. These are the two different approaches (or mix) you can take to managing policy in Secure and their tradeoffs.
Manage user metadata. This step is critical to building scalable policy and understanding the considerations around how and what to capture. Tag your users with attributes and groups that are meaningful for Immuta global policies.
Manage data metadata. This is the final setup step you must complete before authoring policy. Tag your columns with tags that are meaningful.
Author policy. In this step, you will define your global data policy logic. Optionally test and deploy policy.
No matter if you choose orchestrated RBAC or ABAC as described in the prior guide, you must have metadata on your users. This can be done in Immuta, via your identity manager, or other custom sources.
Doing so allows you to build scalable policies that do not reference individual users and instead use metadata about them to target them with access decisions. In Immuta, user metadata is termed user attributes and groups.
Referring back to our triangle of access decision, user metadata is the first point of that triangle. User metadata is made up of the following elements:
Attributes: key:value pairs tied to your users
Groups: a single value tied to your users - you can think of groups as containing users. Groups can also have their own attributes, a shortcut for assigning attributes to all members of a group. In either case, multiple users can be attached to the same attribute or group.
Referring back to the two paths: orchestrated RBAC and ABAC will help drive how you manage user metadata.
Fact-based user metadata (ABAC) allows you to decouple policy logic from user metadata, allowing highly scalable ABAC policy authoring:
Steve has attribute: country:USA
Sara has attribute: role:administrator
Stephanie has attribute: sales_region: Ohio, Michigan, Indiana
Sam is in group: developers
Group developers has attribute: organization:engineering
Logic-based user metadata (orchestrated-RBAC) couples user metadata with access logic:
Steve has attribute: access_to:USA
Sara has attribute: role:admin_functions
Stephanie has groups: Ohio_sales, Michigan_sales, Indiana_sales
Sam is in group: all_developer_data
Group developer has attribute: access_to:AWS_all
Typically these groups (most common) and attributes come from your identity management system. If they do, they also potentially contain access logic as part of them (examples in the logic-based user metadata list above). It’s not the end of the world if they do, though this means you are potentially set up for orchestrated RBAC. If each group from your identity management system represents a single static fact that provides access for a set of objects, with no overlap with other facts that provide access - then you can use orchestrated RBAC. A simple example to demonstrate groups ready for orchestrated RBAC is “you must have signed license agreement x to see data tagged x” - it’s a simple fact: you’ve signed license agreement x, that provides access to data tagged x. There are no other paths to data tagged x.
But the key to orchestrated RBAC is that it’s a 1:1 relationship with the group from your identity manager and the set of objects that group gives them access to. If this is not true, and there are multiple permutations of groups that have implied policy logic and that logic provides access to the same sets of objects, then you are in a situation with role explosion. This is a problem that Immuta can help you solve, and you should solve that problem with ABAC.
The one exception where you have the above scenario but may want to use orchestrated RBAC is if these multiple paths for access to the same objects are a true hierarchy. For example, data is tagged with Strictly Confidential
, Confidential
, Internal
, and Public
:
users assigned group Strictly Confidential
can see data tagged Strictly Confidential, Confidential, Internal, and Public
users assigned group Confidential
can see data tagged Confidential, Internal, and Public
In this case, you could use orchestrated RBAC even though there are multiple paths of access to the same objects because it is a one-to-one relationship represented in a hierarchy that contains all paths. How you actually write this policy will be described later.
Since orchestrated RBAC is a one-to-one match (or a true hierarchy), you can use your groups as-is from your identity manager or other custom sources.
Without getting into the details of how the policy is authored (yet), it is possible to leverage a user attribute hierarchy to contribute to access decisions through hierarchical matching. For example, users with attributes that are either a hierarchical parent or an exact match to a given data tag can be subscribed to the data source via a hierarchical match. To better illustrate this behavior, see the matrix below:
'Exercise': ['Gym.Treadmill', 'Sports.Football.Quarterback']
['Athletes.Performance', 'Sports.Football.Quarterback']
Yes
Exact match on 'Sports.Football.Quarterback'
'Exercise': ['Gym.Weightlifting', 'Sports.Football']
['Athletes.Performance', 'Sports.Football.Quarterback']
Yes
User attribute 'Sports.Football' is a hierarchical parent of data source tag 'Sports.Football.Quarterback'
'News Articles': ['Gym.Weightlifting', 'Sports.Football']
['Athletes.Performance', 'Sports.Football.Quarterback']
No
The policy is written to only match values under the 'Exercise' attribute key. Not 'News Articles'.
'Exercise': ['Sports']
['Sports.Baseball.Pitcher']
Yes
User attribute 'Sports' is a hierarchical parent of data source tag 'Sports.Baseball.Pitcher'
'Exercise': ['Gym.Gymnastics.Trapeze', 'Sports.Football.Quarterback']
['Gym.Gymnastics', 'Sports.Football']
No
Hierarchical matching only happens in one direction (user attribute contains data source tag). In this case, the user attributes are considered hierarchical children of the data source tags.
So you should take care organizing your user attributes with a hierarchy that will support the hierarchical matching.
Going back to our hierarchical example of Strictly Confidential
, Confidential
, Internal
, and Public
above, you would want to ensure that user attributes follow the same hierarchy. For example,
A user with access to all data: Classification: Strictly Confidential
A user with access to only Internal
and Public
: Classification: Strictly Confidential.Confidential.Internal
Remembering the last row in Figure 3, hierarchical matching only happens in a single direction → user attribute contains the data source tag. Since the user attribute is more specific (Strictly Confidential.Confidential.Internal
) than data tagged Strictly Confidential
and Strictly Confidential.Confidential
, the user would only gain access to data tagged Strictly Confidential.Confidential.Internal
and Strictly Confidential.Confidential.Internal.Public
.
To use ABAC, you need to divorce your mind from thinking that every permutation of access requires a new role. (We use the word role interchangeably with groups in this document, because many identity and access systems call groups roles, especially when they contain policy logic.)
Instead you need to focus on the facts about your users they must possess in order to grant them access to data. Ask yourself, why should someone have access to this data? It’s easiest to come at this from the compliance perspective and literally ask yourself that question for all the compliance rules that exist in your organization.
Why should someone have access to my PHI data? Example answers might be
If they are an analyst
If they have taken privacy training
So, at a minimum, you are going to need to attach attributes or groups to users representing if they are an analyst and if they have taken privacy training. Note you do not have to (nor should you) negate these attributes, meaning nobody should have a group not_analyst
and no_privacy_training
.
Repeat this process until you have a good starting point for the user attributes you need. This is a lot of upfront work and may lead to the perception that ABAC also suffers from attribute explosion. However, when attributes are well enumerated, there’s a natural limit to how many different relevant attributes will be present in an organization. This differs substantially from the world where every permutation of access is represented with a role, where role growth starts off slow, but increases linearly over time.
As an illustration, well-defined subject attributes tend to follow the red curve over time, while roles and poorly defined subject attributes tend to follow the blue and quickly become unwieldy.
Where do those attributes and groups come from? Really, wherever you want - Immuta has several interfaces to make this work, and you can combine them:
Option 1: Store them in your identity manager. If you have the power to do this, you should. It becomes the primary source of facts about your users instead of the source of policy decisions, which should not live in your identity manager. We recommend synchronizing this user metadata from your identity manager to Immuta over SCIM where possible. (This concept was covered in the Detect getting started.)
Option 2: Store them directly in Immuta. It is possible to manually add attributes and groups to users directly in the Immuta UI or through the API.
Option 3: Refer to them from some other external source of truth. For example, perhaps training courses are already stored in another system. If so, you can able to build an external user into endpoint to pull those groups (or attributes) into Immuta. However, there are some considerations with this approach tied to which IAM protocol you use. If your identity manager restricts you from using the approach in the “Consume attributes/groups from arbitrary sources” row in the IAM protocol table, you can instead periodically push the attributes from the source of truth into Immuta using our APIs.
All of these options assume you have already configured your identity manager in Immuta. This has to be done in order to create the user identities that these attributes and groups will be attached to.
Now we are ready to focus on the third point of the triangle for data access decisions: access control policies, and, more specifically, how you can author them with Immuta now that you have the foundation with your data and user metadata in place.
It’s important to understand that you only need a starting point to begin onboarding data access control use cases - you do not have to have every user perfectly associated with metadata for all use cases, nor do you need all data tagged perfectly for all use cases. Start small on a focused access control use case, and grow from there.
The policy authoring approaches are broken into the two paths you should choose between discussed previously.
With orchestrated RBAC you have established one-to-one relationships with how your users are tagged (attribute or group) and what that single tag explicitly gives them access to. Remember, the user and data metadata should be facts and not reflect policy decisions veiled in a single fact; otherwise, you are setting yourself up for role explosion and should be using ABAC instead.
Since orchestrated RBAC is all about one-to-one matching of user metadata to data metadata, Immuta has special functions in our subscription policies for managing this:
@hasTagAsAttribute('Attribute Name', 'dataSource' | 'column')
@hasTagAsGroup('dataSource' | 'column')
The first argument for @hasTagAsAttribute
is the attribute key whose value tags will be matched against (or for @hasTagAsGroup
simply the group name), and the second argument specifies whether the attribute values are matched against data source tags or column tags.
Understanding this, let’s use a simple example before we get into a hierarchical example. For this simple example, let’s say there are Strictly Confidential
tables and Public
Tables. Everyone has access to Public
, and only special people have access to Strictly Confidential
. Let’s also pretend there aren’t other decisions involved with someone getting access to Strictly Confidential
(there probably is, but we’ll talk more about this in the ABAC path). It is just a single fact: you do or you don’t have that fact associated with you.
Then we could tag our users with their access under the Access
attribute key:
Bob: Access: Strictly Confidential
Steve: Access: Public
Then we could also tag our tables with the same metadata:
Table 1: Strictly Confidential
Table 2: Public
Now we build a policy like so:
And ensure we target all tables with the policy in the builder:
Here is the policy definition payload YAML for the above policy using the v2 api:
Since this policy applies to all data sources, you only need a single policy in Immuta. When you change user metadata or data metadata, the access to those tables will automatically be updated by Immuta through the matching algorithm (user metadata matches the data metadata).
We could also use column tags instead, by tagging the columns appropriately:
Table 1 -> Column 2: Strictly Confidential
This tag change would require the following alteration to the function: @hasTagAsAttribute('Access', 'column')
instead of @hasTagAsAttribute('Access', 'dataSource')
.
Finally, we could use groups instead of attributes:
Bob: Strictly Confidential
Steve: Public
This change would require the following alteration to the function: @hasTagAsGroup('Access', 'column')
instead of @hasTagAsAttribute('Access', 'column')
.
In the above case, we are using column tags again, but it could (and probably should) be data source tags. The reason is that orchestrated RBAC is not using facts about what is in your data source (the table) to determine access. Instead, it's using the single variable as a tag, and because of that, the single variable is better represented as a table tag. This also reduces the amount of processing Immuta has to do behind the scenes.
We can get more complicated with a hierarchy as well, but the approach remains the same: you need a single policy and you use the hierarchy to solve who should gain access to what by moving from most restrictive to least restrictive in your hierarchy tree on both the users and data metadata.
Groups approach:
Bob: Strictly Confidential
Steve: Strictly Confidential.Confidential.Internal.Public
In this case, Steve would only gain access to tables or tables with columns tagged Strictly Confidential.Confidential.Internal.Public
, but Bob would have access to tables or tables with columns tagged Strictly Confidential
, Strictly Confidential.Confidential
, Strictly Confidential.Confidential.Internal
, and Strictly Confidential.Confidential.Internal.Public
.
Orchestrated RBAC one-to-one matching can be powerful, but can also be a slippery slope to role explosion. You can quickly fall into rolling-up many decisions into a single group or attribute tied to the user (like Strictly Confidential
) instead of prescriptively describing the real root reason of why someone should have access to Strictly Confidential
in your policy engine. This is the power behind ABAC, and using the ABAC method of access can give you ultimate scalability and evolvability of policy with ease.
Let’s say that having access to Strictly Confidential
data isn’t a single simple fact attached to a user like we described in the orchestrated RBAC section (it likely isn't). Instead, let’s try to boil down why you would give someone access to Strictly Confidential
. Let’s say that to have access to Strictly Confidential
, someone should be
an employee (not contractor)
in the US
part of the Legal team
Those are clearly facts about users, more so than Strictly Confidential
. If you can get to the root of the question like this, you should, because Immuta allows you to pull in that information about users to drive policy.
Why should you bother getting to the root?
Well, consider what we just learned about how to author policy with orchestrated RBAC above. Now let’s say your organization changes their mind and now wants the policy for access to Strictly Confidential
to be this:
be an employee (not contractor)
be in the US or France
be part of the Legal team
With orchestrated RBAC this would be a nightmare, unless you already have a process to handle it. You would have to manually figure out all the users that meet that new criteria (in France) and update them in and out of Strictly Confidential
. This is why orchestrated RBAC should always use a single fact that gains a user access.
However, if you are on the ABAC path, it’s a single simple policy logic change because of our powerful triangle that decouples policy logic from user and metadata attributes.
Let’s walk through an example:
The best approach is to build individual policies for each segment of logic associated with why someone should have access to Strictly Confidential
information. Coming back to the original logic above, that would mean 3 (much more understandable) separate policies:
Here is the policy definition payload YAML for the above policy using the v2 api:
Here is the policy definition payload YAML for the above policy using the v2 api:
Here is the policy definition payload YAML for the above policy using the v2 api:
Each policy targets tables tagged Strictly Confidential
(they could alternatively target columns tagged that):
When each of those policies converge on a single table tagged Strictly Confidential
, they will be merged together appropriately by Immuta, demonstrating Immuta's conflict resolution:
And whether merged with an AND or an OR is prescribed by this setting in the policy builder for each of the 3 individual policies:
If either side of the policies are Always Required
that will result in an AND; otherwise, if both sides agree to Share Responsibility
it will result in an OR. In this case, each of the 3 policies chose Always Required
, which is why the three policies were AND’ed together on merge. Note this merging behavior is relevant to how you can have different users manage policies without risk of conflicts.
Now let’s move into our nightmare (without Immuta) scenario where the business has changed the definition of who should have access to Strictly Confidential
data by adding France along with the other existing three requirements.
With Immuta, all you’ve got to do is change that single Country Access subscription policy to include France, and voila, you’re done:
Here is the policy definition payload YAML for the above policy using the v2 api:
Which results in this merged policy on all the Strictly Confidential
tagged tables:
As you can see, ABAC is much more scalable and evolvable. Policy management becomes a breeze if you are able to capture facts about your users and facts about your data, and decouple policy decisions from it.
Also be aware, this auto-merging logic Immuta provides enables you to also combine orchestrated RBAC and ABAC policies.
Last but not least are manual overrides. While this use case is all about automating access decisions, there may be cases where you want to give a user access even if they don’t meet the automated criteria.
This is possible through this setting in each policy:
In this scenario, users marked as Owner
of the registered data source in Immuta have the power to approve this override request by the user. There are ways to specify different approvers as well, or multiple approvers.
Also notice the grayed out Allow Data Source Discovery
: normally, if you do not meet the criteria of a policy, the data source is not even visible to you in the Immuta UI or the data platform. However, if you have this checked, it is visible in the Immuta UI even if you don’t meet the policy. This must be checked in order to request approval to access. Otherwise, users would never find the data source to ask for approval in the Immuta UI.
Be aware that all policies that land on the table must have this same setting turned on in order to allow the manual override.
In the use case, we covered at length. Subscription policies control table level access.
In this use case, we are focused one level deeper: columns and rows within a table that can be protected in a more granular manner with .
With Immuta, you are able to , or even mask cells within a given column per row using (also termed conditional masking).
This is important for this use case to granularly mask or grant unmasked access to specific columns to specific users. This granularity of policy allows more access to data because you no longer have to roll up coarse access decisions to the table level and can instead make them at the individual column, row, or cell level.
The concept of masking columns has been mentioned a few times already, but as you already know per the guide, your masking policies will actually target tags instead of physical columns. This abstraction is powerful, because it allows you to build a single policy that may apply to many different columns across many different tables (or ).
If you build policy using tags, isn't there a good chance that multiple masking policies could land on the same column? Yes.
This can be avoided . Immuta supports tag hierarchy, so if the depth of a tag targeted by a policy is deeper than the conflicting policy, the deepest (the more specific one) wins. As an example, mask by making null columns tagged PII
is less specific (depth of 1) than the policy mask using hasing columns tagged Discovered.name
(depth of 2), so the hashing masking policy would apply to columns tagged Discovered.name
and PII
rather than the null one.
Immuta meets principle of least privilege by following an . What this means is that if you mask a column, that mask will apply to everyone except [some list of exceptions]
. This uni-directional approach avoids policy conflicts, makes change management easier, authoring policy less complex, and (most importantly) avoids data leaks.
There are many different approaches you can take to masking a column. Some masks render the column completely useless to the querying user, such as nulling it, while other masking techniques can provide some level of utility from the column while at the same time maintaining a level of privacy and security. These advanced masking techniques are sometimes termed . Immuta provides a that allow for privacy-vs-utility trade-off decisions when authoring masking policies.
If you were to build masking policies natively in your data platform, they require that you build a masking policy per data type it could mask. This makes sense, because a varchar
column type can't display numerics, or vice versa, for example. Furthermore, when building masking policies that target tags instead of physical columns, it is possible that policy may target many differing data types, or even target new unforeseen data types in the future when new columns appear.
You may hear this policy called row filter or row access policy. The idea is to redact rows at query time based on the user running the query.
Without this capability, you would need a transform process that segments your data across different tables and then manage access to those tables. This introduces extra compute costs and at some point, when dealing with large tables and many differing permutations of access, it may be impossible to maintain as those tables grow.
@groupsContains('SQL Column or Expression')
allows you to compare the value of a column to see if the user possesses any groups with a matching name (case sensitive).
@attributeValuesContains('Attribute Name', 'SQL Column or Expression')
allows you to compare the value of a column to see if the user possesses any attribute values under a specific key with a matching name (case sensitive).
@purposesContains('SQL Column or Expression')
allows you to compare the value of a column to see if the user is acting under a matching purpose (case sensitive).
@columnTagged('Tag Name')
allows you to target tags instead of physical columns when using one of the above functions.
Here's a simple example that targets a physical column COUNTRY
comparing it to the querying user's country attribute key's values:
@attributeValuesContains('country', 'COUNTRY')
This could also be written to instead use the @columnTagged
function instead of the physical column name:
@attributeValuesContains('country', @columnTagged('Discovered.Country'))
Allowing this policy to be reused across many different tables that may have the COUNTRY
column name spelled differently.
...for everyone except users acting under purpose [some legitimate purpose(s)]
Data sources with the policies they want to be excluded from and
Purposes
This can be made temporary by deleting the project once access is no longer needed or revoking approval for the project after the need for access is gone.
After you've created global data policies as described above, how do you actually give access to the tables?
using either of the Detect use cases:
use case (Snowflake only)
use case (if not using Snowflake)
You should have already done some data tagging while configuring Immuta in the . That guide focuses on understanding where compliance issues may exist in your data platform and may not have fully covered all tags required for policy. Read on to see if there's more work to be done with data tagging.
Orchestrated RBAC method: tag data sources at the table level
ABAC method: tag data at the table and column level
While it is possible to target policies using both table- and column-level tags, for ABAC it’s more common to target column tags because they represent more granularly what is in the table. Just like user metadata needs to be facts about your users, the data metadata must be facts about the data. The tags on your tables should not contain any policy logic.
Fact-based column tags are descriptive (recommended):
Column ssn
has column tag social security number
Column f_name
has column tag name
Column dob
has column tags date
and date of birth
Logic-based column tags requires subjective decisions (not recommended):
Column ssn
has column tag PII
Column f_name
has column tag sensitive
Column dob
has column tag indirect identifier
Entity tags are facts about the contents of individual columns in isolation. Entity tags are what we listed above: social security number, name, date, and data of birth. Entity tags do not attempt to contextualize column contents with neighboring columns' contents. Instead, categorization and classification tags describe the sensitive contents of a table with the context of all its columns, which is what is listed in the logic-based tags above, things like PII, sensitive, and indirect identifier.
For example, under the HIPAA framework a list of procedures a doctor performed is only considered protected health information (PHI) if it can be associated with the identity of patients. Since entity tagging operates on a single column-by-column basis, it can’t reason whether or not a column containing procedure codes merits classification as PHI. Therefore, entity tagging will not tag procedure codes as PHI. But categorization tagging will tag it PHI if it detects patient identity information in the other columns of the table.
Additionally, entity tagging does not indicate how sensitive the data is, but categorization tags carry a sensitivity level, the classification tag. For example, an entity tag may identify a column that contains telephone numbers, but the entity tag alone cannot say that the column is sensitive. A phone number associated with a person may be classified as sensitive, while the publicly-listed phone number of a company might not be considered sensitive.
Contextual tags are really what you should target with policy where possible. This provides a way to create higher level objects for more scalable and generic policy. Rather than building a policy like “allow access to tables with columns tagged person name
and phone number
,” it would be much easier to build it like “allow access to tables with columns tagged PII
.”
In short, you must tag your entities, and then rely on a classification framework (provided by Immuta or customized by you) to provide the higher level context, also as tags. Remember, the owners of the tables (those who created them) can tag the data with facts about what is in the columns without having to understand the higher level implications of those tags (categorization and classification). This allows better separation of duty.
For orchestrated-RBAC, the data tags are no longer facts about your data, they are instead a single variable that determines access. As such, they should be table-level tags (which also improves the amount of processing Immuta must do).
As a quick example, it is possible to tag your data with Cars
and then also tag that same data with more specific tags (in the hierarchy) such as Cars.Nissan.Xterra
. Then, when you build policies, you could allow access to tables tagged Cars
to administrators
, but only those tagged Cars.Nissan.Xterra
to suv_inspectors
. This will result in two separate policies landing on the same table, and the beauty of Immuta is that it will handle the conflict of those two separate policies. This provides a large amount of scalability because you have to manage far fewer policies.
Imagine if you didn’t have this capability? You would have to include administrators
access to every policy you created for the different vehicle makes - and if that policy needed to evolve, such as adding more than administrators
to all cars, it would be an enormous effort to make that change. With Immuta, it’s one policy change.
Select your use case
Immuta Secure allows you to secure your data through various access control policies you configure.
and have been registered in Immuta
Immuta Secure allows you to build complex access control policies in a simple, scalable manner. However, there are many different ways customers think about access control, so Secure is extremely flexible. Because of this, Immuta has broken up Secure into common use cases that customers follow to speed up their onboarding process. Choose the use case below that best fits your goals. If none of them fit, contact your customer support representative for a more personalized onboarding experience:
: This is the most common use case. It walks you through how to build table access control policies in a scalable manner and clarifies how to think about table access and adjust existing paradigms.
: This is an approach where every user has access to every table, yet you mask the sensitive columns appropriately using data policies. You can apply some of the concepts from the automate data access control decisions use case to how you think about masking policy rules.
: Data mesh is a concept where you delegate operational control of data products to data producers across your organization. This requires delegation of policy management as well, which is the goal of this use case. This use case also covers a generic data shopping experience where users can discover and subscribe to data.
The use cases are a starting point for your Secure journey - you don’t necessarily need to follow them exactly, but they should give you the conceptual framework for thinking through your own use cases, so it is encouraged to read them all.
Now that you have a sense of what policies you want to enforce, it is sometimes necessary to first test those policies before deploying them. This is important if you have existing users accessing the tables you are protecting and you want to understand the impact to those users before moving a policy to production. However, consider this step optional.
It's also important to remember that Immuta subscription policies are additive, meaning no existing SELECT grants on tables will be revoked by Immuta when a subscription policy is created. It is your responsibility to revoke all pre-Immuta SELECT grants once you are happy with Immuta's controls.
While it may seem wise to have a separate Immuta tenant for development and production mapped to separate development and production data platforms, that is not necessary nor recommended because there are too many variables to account for and keep up-to-date in your data platform and identity management system:
Many Immuta data policies are enforced with a heavy reliance on the actual data values. Take for example the following row-level policy: only show rows where user possesses an attribute in Work Location that matches the value in the column tagged Discovered.Entity.Location
. This policy compares the user’s office location to the data in the column tagged Location
, so if you were to test this policy against a development table with incorrect values, it is an invalid test that could lead to false positives or false negatives.
Similar to #1, if you are not using your real production users, just like having invalid data can get you an invalid test result, so could having invalid or mock user attributes or groups.
Policies can (and should) target tables using tags discovered by or (such as Alation or Collibra). For SDD, that means if using development data, the data needs to match the production data so it is discovered and tagged correctly. For external catalogs, that would mean you need your external catalog to have everything in development tagged exactly like it is in production.
Users can have attributes and groups from , so similar to #3, you would need to have that all synchronized correctly against your development user set as it is synchronized for your production user set.
Your development user set may also lack all relevant permutations of access that need to be tested (sets of attributes/groups relevant to the policy logic). These permutations are not knowable a priori because they are dependent on the policy logic. So you would have to create all permutations (or validate existing ones) every time you create a new policy.
Lastly, you have a development data environment to test data changes before moving to production. That means your development data environment needs the production policies in place. In other words, policies are part of what needs to be replicated consistently across development environments.
Be aware, this is not to suggest that you don’t need a development data platform environment; that could certainly be necessary for your transformation jobs/testing (which should include policy, per #6). However, for policy testing it is a bad approach to use non-prod data and non-prod users because of the complexity of replicating everything perfectly in development - by the time you’ve done all that, it matches what’s in production exactly.
Immuta recommends testing against clones of production data, in a separate database, using your production data platform and production user identities with a single Immuta tenant. If you believe you do have a perfectly matching development environment that covers 1-5 in the section above, you can use it (and should for #6), but we still recommend a single Immuta tenant because Immuta allows logical separation of development and production policy work without requiring physically separated Immuta tenants.
So how do you test policies without impacting production workloads if we are testing against production? You create a logical separation of development and production in your data platform. What this means is that rather than physically separating your data into completely separate accounts or workspaces (or whatever term your data warehouse may use), logically separate development from production using the mechanisms provided by the data platform - such as databases. This reduces the amount of replication management burden you need to undertake significantly, but does put more pressure on ensuring your policy controls are accurate - which is why you have Immuta!
Many of our customers already take this approach for development and testing in their data platform. Instead of having physically separate accounts or workspaces of their data platform, they create logical separation using databases and table/view clones within the same data platform instance. If you are already doing this, you can skip the rest of this section.
Follow the below recommendations on how to create logical separation of development data in your data platform using the following approaches:
Redshift: Create new views that are backed by the tables in question to a different development database and register them with Immuta. Since Immuta enforces policies in Redshift using views, creating new views will not be impacted by any existing policy on the tables that back the views.
Starburst (Trino): You should virtualize the tables as new tables to a different development catalog in Starburst (Trino) for testing the policy.
Azure Synapse Analytics: You should virtualize the tables as new tables to a different development database in Synapse for testing the policy.
If you are managing creation of tables in an automated way, it may be prudent to automatically create the clones/views as described above as part of that process. That way you don’t have to pick and choose tables to logically separate every time you need to do policy work - they are always logically separated already (and presumably discovered and registered by Immuta because they are registered for monitoring as well).
Append the database name to the names of cloned data sources: When you register the clones you’ve created with Immuta, ensure you append the database
to your naming convention so you don’t hit any naming conflicts in Immuta.
If using a physically separate development data platform, ensure you have the Immuta data integration configured there as well.
If you are unable to leverage domains, we recommend adding an additional tag when the data is registered so it can be included when you describe where to target the policy. For example, tag the tables with policy 1
tag.
If using SDD to tag data:
If using an external catalog:
If manually tagging:
Once the policy is activated (using either technique), it will only apply to the cloned development tables/views because the policy was scoped to that domain/tag, or because the user creating the policy is scoped to managing policies in that domain. At this point, you can invite users that will be impacted by the policy to test the change or your standard set of testers.
Once you have your testers set, you can give them all an attribute such as policy test: [policy in question]
and use it to create a "subscription-testers" subscription policy. It should target only the tables/views in that domain per the steps described in the above paragraph in order to give them access to the development tables/views to test.
If you are testing a masking or row-level security policy, you don't need to do anything else; it's ready for your policy testers. However, if your ultimate goal is to test a subscription policy, you need to build it separately from the above "subscription-testers" subscription policy, but ensure that you select the Always Required
option for "subscription-testers" subscription policy. That means when the "subscription-testers" subscription policy is merged with the real subscription policy you are testing, both policies will be considered and some of your test users may accurately be blocked from the table (which is part of the testing). For example, notice how these two policies (the "subscription-testers" and real policy being tested) were merged with an AND.
There's nothing stopping you from testing a set of changes (many subscription policies, many data policies) all together as well.
Once you’ve received feedback from your test users and are happy with the policy change, you can move on to the deployment of the policy. For example, add prod-domain, remove dev-domain, and then apply the policy. Or build the policy using a user that has manage policy
in the production domain.
The approach here is the same as above; however, before applying your new policy to the development domain, you should first locally disable the pre-existing global policy in the data source. This can be done by going into the development data source in the Immuta UI as the data owner and disabling the policy you plan to change in the policy tab of the data source. This must be done because the clone will have the pre-existing policy applied because it should have all matching tags.
Now you can create the new version of the policy in its place without any conflict.
Sometimes you may have a requirement that multiple users approve a policy before it can be enabled for everyone. If you are in this situation, you may want to first consider doing this in git (or any source control tool) using our policy-as-code approach, so all approvals can happen in git.
You may have read the already. If so you are aware of the you must choose between: orchestrated-RBAC vs ABAC. To manage user metadata with this particular use case, you should use ABAC.
This is because you must know the contents and sensitivity of every column in your data ecosystem to follow this use case. With orchestrated RBAC, you tag your columns with access logic baked in. ABAC means you tag your columns with facts: what is in the column. It is feasible to do the latter, extremely hard to do the former (unless you use described in the next topic), especially in a data ecosystem with constant change. This means that your users will need to have facts about them that drive policy decisions (ABAC) rather than single variables that drive access (as in orchestrated-RBAC).
Understanding that, read the ABAC section in the automate data access control decisions use case's guide.
using either of the Detect use cases:
use case (Snowflake or Databricks)
use case (if not using Snowflake nor Databricks).
You may have read the use case already. If so, you are aware of the you must choose between: orchestrated-RBAC vs ABAC. To manage data metadata with this particular use case, you should use ABAC.
This is because you must know the contents and sensitivity of every column in your data ecosystem to follow this use case. With orchestrated RBAC, you tag your columns with access logic baked in. ABAC means you tag your columns with facts: what is in the column. It is feasible to do the latter, extremely hard to do the former (unless you use , described below), especially in an data ecosystem with constant change.
Understanding that, read the guide.
However, there are some considerations specific to this use case with regard to data metadata.
Automated data tagging with Immuta is recommended with this use case because it significantly reduces the manual effort involved in classifying and managing data. It ensures that data is consistently and accurately tagged, which is crucial for implementing effective data policies. Moreover, it allows for real-time tagging of data, ensuring that new or updated data is immediately protected by the appropriate policies. This is critical when all columns need to be considered for policy immediately, which is the case with this use case.
While not directly related to data tagging, is a feature in Immuta that allows organizations to keep track of changes in their data environments. When enabled, Immuta actively monitors the data platform to find when new tables or columns are created or deleted. It then automatically registers or disables these tables in Immuta and updates the tags. This feature ensures that any global policies set in Immuta are applied to these newly created or updated data sources. It is assumed you are using schema monitoring when also using SDD. Without this feature, data will remain in while waiting for the policies to update.
As discussed above, with ABAC your data tags need to be facts. Otherwise, a human must be involved to bake access logic into your tags. As an example, it's easy for SDD to find an address, but it's harder for a human to decide if that address should be sensitive and who should have access to it - all defined with a single tag - for every column!
This adds a large deal of effort to policy authoring. With Immuta, rather than explicitly requiring you to cover all possible data types when building a masking policy, Immuta has . This allows Immuta to apply the masking policy by changing, as minimally as possible, the masking type to something that is possible against that data type, while still maintaining the required privacy level provided by the original masking type.
Immuta allows you to using a global scope by leveraging tags just like with masking policies.
Unlike masking policies, the use of tags for redacting rows not only impacts which tables are targeted with the policy, but also the logic of the policy. When , you must choose the column to "compare against" when making a decision on if the row should be visible to the user or not at query time. The tag can drive this decision. This allows you to author a single row-level policy that targets many different tables (or ).
Sometimes customization is required for scenarios that require more complex logic. When in the policy builder, these are called policies. Within your custom WHERE, Immuta also provides several different functions you can leverage for powerful and scalable policies:
You may want to override and grant access to unmasked data to an individual for a very specific reason. Our recommendation is to use to create exceptions to global data policies.
There is some up front work that needs to occur to make this possible. A user with would need to create for access to different data types unmasked. As part of creating the purposes, they may want to alter the the user must agree to when acting under that purpose.
Then, the masking or row-level policies would need to be updated to to the policy, for example:
Once that is done, users can and add to the project both of the following:
However, that project does nothing until approved by a user with . Once that approval is complete, the user wanting the exception must they will only use the data for that purpose and then, using the Immuta UI, . Once switched to that purpose, the exception(s) will occur for the user.
This is a good question, and it really depends if you already have a table-access process you are already happy with or not. If you do, keep using it. If you don't, we recommend you create a that opens all tables up to anyone, as the point of this use case is not to focus on coarse grained table-level access but instead fine-grained access using global data policies.
When creating new tables, you should follow the best practices to ensure there's while you wait for policy uptime through Immuta schema monitoring.
Now that we’ve , let’s focus on the second point on the policy triangle diagram: the data tags. Just like you need user metadata, you need metadata on your data (tags) in order to decouple policy logic from referencing physical tables or columns. You must choose between the or method of data access:
But can't I get policy authoring scalability by tagging things with higher level classifications, like PII, so I can build broader policies? This is what Immuta’s are for.
There are several options for doing this, and if you are following along with the use cases for , you may have already accomplished the recommended option 1.
Immuta Discover's sensitive data discovery (SDD): This is the most powerful option. Immuta is able to , and you are able to extend what types of entities are discovered to those specific to your business. SDD can run completely within your data platform, with no data leaving at all for Immuta to analyze. SDD is more relevant for the ABAC approach because the tags are facts about the data.
Tags from an external source: You may have already done all the work tagging your data in some external catalog, such as Collibra, Alation, or your own homegrown tool. If so, Immuta can pull those tags in and use them. Out of the box , and for anything else you can build a . But remember, just like user metadata, these should represent facts about your data and not policy decisions.
Manually tag: Just like with user metadata, you are able to manually tag tables and columns in Immuta from , using the , or when registering the data, either during initial registration or subsequent tables discovered in the future through schema monitoring.
Just like hierarchy has an impact with user metadata, so can data tag hierarchy. We discussed the matching of user metadata to data metadata in the guide. However, there are even simpler approaches that can leverage data tag hierarchy beyond matching. This will be covered in more detail in the guide, but is important to understand as you think through data tagging.
You can skip this section if you believe you have a perfectly matching separate physical development data platform that covers 1-5 in the , or are already logically separating your dev and prod data in a single physical data platform.
Snowflake: your tables that need to be tested to a different development database. While Snowflake will copy the policies with the clone, as soon as you build an Immuta policy on top of it, those policies will take precedence and eliminate the cloned policies, which is the desired behavior.
Databricks Unity Catalog: Unity Catalog does not yet support tables with row/column policies, so if there is an existing row/column policy on the table that you intend to edit, rather than shallow cloning, you should create table as (select…limit 100)
to create a real copy of the tables with a limit on the rows copied to a different development database. Make sure you create the table with a user that has full access to the data. If you are building a row security policy, it's recommended to ensure you have a strong distribution of the values in the column that drive the policy.
Databricks Spark: your tables that need to be tested to a different development database.
Since your policies reference tags, not tables, you may be in a situation where it’s unclear which tables you need to logically separate for testing any given policy. To figure this out, you can in Immuta to find which tables have the tags you plan to use in your policies using the What data sources has this tag been assigned to? report. But again, if you create logical separation by default as part of your data engineering process, this is not necessary.
Logical separation in Immuta is managed through . You would want to create a new domain for your cloned tables/views in Immuta and register them in that domain. For example policy 1 test domain
. Note you must register your development database in Immuta with turned on (Create sources for all tables in this database and monitor for changes
) so any new development tables/views will appear automatically and so you can assign the Domain at creation time.
does not discriminate between development and production data; the data will be tagged consistently as soon as it's registered with Immuta, which is another reason we recommend you use SDD for tagging.
In the case where you are using the out-of-the-box tag integration, the clone will carry the tags, which in turn will be reflected on the development data in Immuta.
If using a custom , we recommend enhancing your custom lookup logic to use a naming convention for the development data to ensure the tag sync from the development data is done against the production data. For example, you could strip off a dev_
prefix from the database.schema.table to sync tags from the production version in your catalog.
If using the with Collibra or Alation, we recommend also creating development tables in those catalogs and tagging them consistently.
You must ensure that you (or over the ) the newly registered development data in Immuta consistent with how it’s tagged in production
Once registered, you can build the new policy you want to test in a way that it only targets the tables in that domain. This can be done by either adding that domain as a target for the policy (or the policy tag you added as part of the registration process) if building the policy with a user with GOVERNANCE
permission, or ensuring the user building the policy has only on the development domain (so their policies will be scoped to that domain).
To know which users are impacted, ensure you have enabled, and go to People → Overview → Filter: Tag
and enter the tags in question to see which users are most active and have a good spread of permutation of access (e.g., some users that gain access and others that lose access). We recommend inviting users that query the tables/views in question the most to test the policy. If you are unable to use Detect, you can visit the audit screen and manually inspect which users have queries against the tables or the internal audit logs of your data platform.
If you focus on tagging with facts, and use Immuta's to build higher level logic on those tags, then you are set up to build data policies in a scalable and error proof manner with limited data downtime.
There is one way you can accomplish this use case using orchestrated RBAC: lineage. Immuta's , currently in private preview and for Snowflake only, is able to propagate tags based on transform lineage. What this means is that columns that existed in tables for which the downstream table was derived, those upstream table's tags will get carried through to that new downstream table's columns. This is powerful because you can tag the root table(s) once - with your policy logic tags, as described in - and they will carry through to all downstream tables' columns without any work. Be aware that tag lineage only works for tables; for views, the tags will not be propagated. However, policies on backing tables will be enforced on the views (which is why they are not propagated). Also be aware that if a table exists after some series of views, the tags will propagate to that table. As an example,
A data product is a valuable output derived from data. It is designed to provide insights, support decision-making, and drive business value. Data products are created to meet business needs or solve problems within a domain or organization. These products can take various forms. In most cases, it is a collection of tables and views, which is what is supported by Immuta.
The Monitor and secure sensitive data platform query activity use case covers how to register your data metadata, but it's worth discussing in more depth here because you have multiple data product owners.
As a first step, all the tables/views on the data platform(s) must be onboarded in Immuta. With the no default subscription policy setting (which is the Immuta default), onboarding objects in Immuta will not make any changes to existing accesses already in place.
Immuta only starts taking control when the first policy has been applied. We suggest that one team registers all objects from your data platform(s). Alternatively, different domain owners can register their own domain data. It is recommended to use a system account to do the data registration.
Regardless of which user registered the data, it's critical that schema monitoring is enabled to ensure future tables and views are discovered by Immuta and automatically registered to the proper domain.
Be aware that registering your data product metadata with Immuta does not make it available to users. It simply makes Immuta aware of its existence. To allow users to see it and subscribe, you would manage policies on it, discussed in the Applying federated governance guide next.
Any team member that has the necessary user credentials in the underlying platform to read the data.
This guide is intended for users who wish to delegate control of data products within a data mesh without breaking their organization's security and compliance standards.
Users and data sources have been registered in Immuta
This guide is designed for those interested in understanding how Immuta can effectively assist organizations in adopting core data mesh principles. By leveraging this document, readers will be empowered to implement the essential procedures for integrating Immuta within their organization's data mesh framework. It is recommended to read the ebook Data Security for Data Mesh Architectures before starting with this use case.
Even if you are not interested in data mesh, this use case will explain how you can create a self-serve "data shopping experience" in Immuta.
Follow these steps to learn more about and start using Immuta to apply federated governance for data mesh and self-serve data access:
Complete the Monitor and secure sensitive data platform query activity use case.
Opt to review the Automate data access control decisions and Compliantly open more sensitive data for ML and analytics use cases.
Define domains. This step is critical in order to delegate policy controls over certain segments of your data.
Manage data products to allow your data product creators to compliantly release data products for consumption.
Manage data metadata. This is the final setup step you must complete before authoring policy. Tag your columns with tags that are meaningful to ensure federated governance.
Work with your data product creators and global compliance users to Apply federated governance. Enforce global standards across all data products while empowering data product owners to expand on those policies.
Allow your data consumers to self-serve access data through the Immuta data portal by Discovering and subscribing to data products.
You may have read the Automate data access control decisions use case already. If so you are aware of the two paths you must choose between: orchestrated-RBAC vs ABAC. To manage data metadata with this particular use case, you should use ABAC.
This is because you want your data product owners to tag data with facts - what they have intimate knowledge of because they built the data product - and not have to be knowledgeable about all policies across your organization. With orchestrated RBAC, you tag your columns with access logic baked in. ABAC means you tag your columns with facts, what is in the column, which is why ABAC makes much more sense.
Understanding that, read the automate data access control decisions use case's Managing data metadata guide.
It is important to distinguish between tag definition and tag application. While tag definition (e.g., a tag called “Business Unit” with the values “Finance”, “Marketing”, “Sales”) should be strongly governed to guarantee consistency and coherence, tag application can be fully decentralized, meaning every domain or data owner can apply tag values (from the centrally governed list) to their data. There needs to be a process in place for data owners to request the definition of a new tag in case they identify any gaps.
It is important to leverage Immuta Discover's sensitive data discovery (SDD) to monitor your data products. This allows you to uncover if and when sensitive data may be leaked unknowingly by a data product and mitigate that leak quickly.
The Monitor and secure sensitive data platform query activity use case covers this in great detail and is highly recommended for a data mesh environment.
Immuta Secure is the final piece of the puzzle: Now that you understand where sensitive data lives (via Discover) and can monitor activity against that data (via Detect), you can now mitigate risk using Immuta Secure.
In short, Immuta Secure enables the management and delivery of trusted data at scale.
Managing access control in your data platform typically starts off easy, but over time becomes a house of cards. This concept is termed role explosion and is a result of having to keep up with every permutation of access across your organization. Once this occurs, it becomes difficult to evolve policies for fear of breaking existing access or because of a lack of understanding across your extensive role list.
Secure allows you to apply engineering principles to how you manage data access, giving your team the agility to lower time-to-data across your organization while meeting your stringent and granular compliance requirements. Immuta allows massively scalable, evolvable, and understandable automation around data policies; creates stability and repeatability around how those policies are maintained; allows distributed stewardship across your organization, but provides consistency of enforcement across your data ecosystem no matter your compute or data warehouse; and fosters more availability of data through the use of highly granular data controls.
Each of the guides below explains Secure principles in detail:
Scalability and Evolvability: A scalable and evolvable data management system allows you to make changes that impact thousands of tables at once, accurately. It also allows you to evolve your policies over time with minor changes (or no changes at all) through policy logic.
Understandability: Immuta can present policies in a natural language form that is easily understood and provide an audit history of changes to create a trust and verify environments. This allows you to prove policy is being implemented correctly to business leaders concerned with compliance and risk, and your business can meet audit obligations to external parties or customers.
Distributed Stewardship: Immuta enables fine-grained data ownership and controls over organizational domains, allowing a data mesh environment for sharing data - embracing the ubiquity of your organization. You can enable different parts of your organization to manage their data policies in a self-serve manner without involving you in every step, and you can make data available across the organization without the need to centralize both the data and authority over the data. This frees your organization to share more data more quickly.
Consistency: With inconsistency comes complexity, both for your team and the downstream analysts trying to read data. That complexity from inconsistency removes all value of separating policy from compute. Immuta provides complete consistency so that you can build a policy once, in a single location, and have it enforced scalably and consistently across all your data platforms.
Availability (of data): Because of these highly granular decisions at the access control level, you can increase data access by over 50% in some cases when using Immuta because friction between compliance and data access is reduced.
Domains are containers of data sources that allow you to assign data ownership and access management to specific business units, subject matter experts, or teams at the nexus of cross-functional groups. Instead of centralizing your data governance and giving users too much governance over all your data, you delegate and control their power over data sources they own by granting them permission within domains in Immuta.
Domains are typically defined by your organization’s data governance board and closely aligned with business departments or units, such as Sales, Customer Service, or Finance. Domains can also be defined by value stream, such as Patient Health Records, Fraud Detection, or Quality Control.
The task of setting up domains in Immuta can be done by any member of the team with the GOVERNANCE
permission and assigning domain permissions can be done by any member of the team with the USER_ADMIN
permission.
When delegating control to your data product owners, it's possible to further segment that control within domains. If you want to give the user control to build policies on the data sources (the data products) in the domain, that is managed by giving them the Manage Policies
permission in the domain.
Data sources can be made visible (discoverable) to users in the Immuta UI. This allows users to read the metadata associated with data sources (e.g., tags, descriptions, data dictionary, and contacts) and permits users to request access to the data itself.
This allows customers to use Immuta as a data catalog and marketplace. In case an organization uses an external marketplace, Immuta easily integrates with it via the API to bring in metadata or manage access controls for users that requested access via external marketplace.
While we have been talking about data mesh, using Immuta as a marketplace for a data shopping experience is not limited to data mesh use cases - it can be used this way for any use case.
Immuta integrates with your identity manager in order to enforce controls in your data platform(s) - you likely configured this when completing the Monitor and secure sensitive data platform query activity use case. This setup has another point, though: it also allows all your users to authenticate into Immuta with their normal credential to leverage it for this "shopping" experience.
Any action taken in Immuta will automatically be reflected in the data platform via Immuta's integration.
This is one of the largest challenges for organizations. Having multiple data platforms/compute, which is quite common, means that you must configure policies uniquely in each of them. For example, the way you build policies in Databricks is completely different from how you build policies in Snowflake. This becomes highly complex to manage, understand, and evolve (really hard to make changes).
Just like the big data era created the need to separate compute from storage, the privacy era requires you to separate policy from platform. Immuta does just that; it abstracts the policy definition from your many platforms, allowing you to define policy once and apply anywhere - consistently!
You should not build policies uniquely in each data platform/compute you use; this will create chaos, errors, and make the data platform team a bottleneck for getting data in the hands of analysts.
In the distributed domains of a data mesh architecture, data governance and access control must be applied vertically (locally) within specific domains or data products and horizontally (globally). Global policies should be authored and applied in line with the ecosystem’s most generic and all-encompassing principles, regardless of the data’s domain (e.g., mask all PII data). Localized domain- or product-level policies should be fine-grained and applicable to only context-specific purposes or use cases (e.g., only show rows in the Sales table where the country value matches the user's office location). In Immuta we distinguish between subscription policies and data policies.
To access a data source, Immuta users must first be subscribed to that data source. A subscription policy determines who can request access to a data source or group of data sources. Alternatively, a subscription policy can also be configured to automatically provide access based on a user’s profile, such as being part of a certain group or having a particular attribute.
It is possible to build Immuta subscription policies across all data products (horizontal), as shown in the diagram above, and then have those policies merged with additional subscription policies authored at the domain level (vertical). When this occurs, those subscription policies are merged as prescribed by the two or more policies being merged.
Whether the requirements for access are merged with an AND or an OR is prescribed by this setting in the policy builder for each of the individual policies:
Always Required = AND
Share Responsibility = OR
You can find a specific example of subscription policy merging in the Automate data access control decisions use case, but in the case of data mesh, the policies are authored by completely separate users - one user at the global (horizontal) level with GOVERNANCE
permission and the second at the domain (vertical) level with the Manage policies
permission (in domain).
When building subscription policies, it can impact what a user can discover and, if desired, "put in their shopping cart" to use.
We'll discuss the "shopping" experience in the next guide, but how you are able to manage this in subscription policies is found here:
Allow Data Source Discovery: Normally, if a user does not meet the subscription policy, that data source is hidden from them in the Immuta UI. Should you check this option when building your subscription policy, the inverse is true: anyone can see this data source. This is important if you want users to understand if the data product exists, even if they don't have access.
Require Manual Subscription: Even if the user does meet the policy, instead of automatically subscribing them, they would have to discover and subscribe themselves. If they meet the policy, they will automatically be subscribed with no intervention. This is important if you want the users to maintain the list of data products they see in the data platform rather than all data products they have access to.
Request Approval to Access: This allows the user to request access, even if they don't meet the policy. Rules can determine what user manually overrides the policy to let them in.
Once a user is subscribed to a data source via Immuta or has a pre-existing direct access on the underlying data platform, the data policies that are applied to that data source determine what data the user sees. Data policy types include masking, row-level, and other privacy-enhancing techniques.
An exemplary three-step approach to managing data policies would be
Create global data policies on the tags resulting from the out-of-the-box sensitive data discovery.
Develop business specific frameworks and protection rules and controlled them via data policies.
Update the global data policies as new sensitive data is potentially released by data products and discovered using Immuta Detect.
Data mesh is a higher level use case that pulls from concepts learned across the other use cases:
Monitor and secure sensitive data platform query activity: How can you proactively monitor for data products that are leaking sensitive data?
Automate data access control decisions: How do you manage table access across your data products?
Compliantly open more sensitive data for ML and analytics: How do you manage granular access and mitigate concerns about sensitive data leaks in data products?
Our recommended strategy is that you decide if you want to automatically subscribe users to data products or if you want a workflow where they discover and subscribe to data products. In either case, you can delegate some policy ownership to the data product creators both by allowing them to tag their data with facts that drive global data policies and allowing domain-specific subscription policy authoring which merges with global subscription policies.
Do you find yourself spending too much time managing roles and defining permissions in your system? When there are new requests for data, or a policy change, does this cause you to spend an inordinate amount of time to make those changes? Scalability and evolvability will completely remove this burden. When you have a scalable and evolvable data policy management system, it allows you to make changes that impact hundreds if not thousands of tables at once, accurately. It also allows you to evolve your policies over time with minor changes or no changes at all, through future-proof policy logic.
Lack of scalability and evolvability are rooted in the fact that you are attempting to apply a coarse role-based access control (RBAC) model to your modern data architecture. Using Apache Ranger, a well known legacy RBAC system built for Hadoop, as an example, independent research has shown the explosion of management required to do the most basic of tasks with an RBAC system: Apache Ranger Evaluation for Cloud Migration and Adoption Readiness.
In a scalable solution such as Immuta, that count of policy changes required will remain extremely low, providing the scalability and evolvability. GigaOm researched this exactly, comparing Immuta’s ABAC model to what they called Ranger’s RBAC with Object Tagging (OT-RBAC) model and showed a 75 times increase in policy management with Ranger.
https://gigaom.com/report/cloud-data-security/
Value to you: You have more time to spend on the complex tasks you should be spending time on and you don’t fear making a policy change.
Value to the business: Policies can be easily enforced and evolved, allowing the business to be more agile and decrease time-to-data across your organization and avoid errors.
When building access control into our database platforms, the concept of role-based access control (RBAC) is familiar. Roles both define who is in them, but also determine what those roles get access to. A good way to think about this is roles conflate the who and what: who is in them and what they have access to (but lack the why).
In contrast, attribute-based access control (ABAC) allows you to decouple your roles from what they have access to, essentially separating the what and why from the who, which also allows you to explicitly explain the “why” in the policy. This gives you an incredible amount of scalability and understandability in policy building. Note this does not mean you have to throw away your roles necessarily, you can make them more powerful and likely scale them back significantly.
If you remember this picture and article from the start of this introduction, most of the Ranger, Snowflake, Databricks, etc. access control scalability issues are rooted in the fact that it’s an RBAC model vs ABAC model.
Consider that you have a table which contains a transaction_country
column and you have data localization needs which requires you to limit specific countries to specific users.
With a classic RBAC approach, you would need to create a role for every permutation of country access. Remember that it's not necessarily just a role per country, because some users may need access to more than one country. Every time a new permutation of country combination is required, a new role must be managed to represent that access.
With Immuta's ABAC approach, since Immuta is able to decouple policy logic from users, you can simply assign users countries and Immuta will filter appropriately on the fly. This can be done with a single policy in Immuta which references the user country metadata. If you add a new user with a never before seen combination of countries, in the RBAC model, you would have to remember to create a new role and policy for them to see data. In the ABAC model it will “just work” since everything is dynamic - future proofing your policies.
For more discussion about this model, see the Role-Based Access Control vs. Attribute-Based Access Control — Explained blog or the NIST article on ABAC, Guide to Attribute Based Access Control (ABAC) Definition and Considerations.
The only way to support AND boolean logic with a role-based model (RBAC) is by creating a new role that conflates the two or more roles you want to AND together.
For example, a governor wants users to only see certain data if they have security awareness training and have consumer privacy training. It would be natural to assume you need both separately as metadata attached to users to drive the policy. However, when you build policies in a role based model, it assumes roles are either OR’ed together in the policy logic or you can only act under one role at a time, and because of this, you will have to create a single role to represent this combination of requirements “users with security awareness training AND consumer privacy training.” This is completely silly and unmanageable - you need to account for every possible combination relevant to a policy, and you have no way of knowing that ahead of time.
With Immuta and its ABAC model, you are able to keep user attributes as meaningful separate facts about the users and then use boolean logic to combine those facts in policy logic. As an example, consider the country filtering policy described in the prior section: you could build the filtering, as described, but additionally add an exception such as "do this filtering for everyone except members of group security awareness training
and members of group consumer privacy training
" without the need to create a new role that represents those combined.
This next section draws on an analogy: Imagine you are planning your wedding reception. It’s a rather posh affair, so you have a bouncer checking people at the door.
Do you tell your bouncer who’s allowed in? (exception-based) Or, do you tell the bouncer who to keep out? (rejection-based)
The answer to that question should be obvious, but many policy engines allow both exception- and rejection-based policy authoring, which causes a conflict nightmare. Exception-based policy authoring in our wedding analogy means the bouncer has a list of who should be let into the reception. This will always be a shorter list of users/roles if following the principle of least privilege, which is the idea that any user, program, or process should have only the bare minimum privileges necessary to perform its function - you can’t go to the wedding unless invited. This aligns with the concept of privacy by design, the foundation of the CPRA and GDPR, which states “Privacy as the default setting.”
What this means in practice is that you should define what should be hidden from everyone, and then slowly peel back exceptions as needed.
How could your data leak if it wasn’t exception based?
What if you did two policies:
Mask Person Name
using hashing for everyone who possesses attribute Department HR
.
Mask Person Name
using constant REDACTED for everyone who possesses attribute Department Analytics
.
Now, some user comes along who is in Department Finance
- guess what, they will see the Person Name
columns in the clear because they were not accounted for, just like the bouncer would let them into your wedding because you didn’t think ahead of time to add them to your deny list.
There are two main issues with allowing bi-directional policies, which is why Immuta only allows exception-based policies, aligning to the industry standard of least privileged access:
Ripe for data leaks: Rejection-based policies are extremely dangerous and why Immuta does not allow them except with a catch-all OTHERWISE
statement at the end. Again this is because if a new role/attribute comes along that you haven’t accounted for, that data will be leaked. It is impossible for you to anticipate every possible user/attribute/group that could possibly exist ahead of time just like it’s impossible for you to anticipate any person off the street that could try to enter your posh wedding that you would have to account for on your deny list.
Ripe for conflicts and confusion: Tools that specifically allow both rejection-based and exception-based policy building create a conflict disaster. Let’s walk through a simple example, noting this is very simple, imagine if you have hundreds of these policies:
Policy 1: mask name for everyone who is member of group A
Policy 2: mask name for everyone except members of group B
What happens if someone is in both groups A and B? The policy will have to fall back on policy ordering to avoid this conflict, which requires users to understand all other policies before building their policy and it is nearly impossible to understand what a single policy does without looking at all policies.
While many platforms support the concept of object tagging / sensitive data tagging, very few truly support hierarchical tag structures.
First, a quick overview of what hierarchical tag structure means:
This would be a flat tag structure:
SUV
Subaru
Truck
Jeep
Gladiator
Outback
Each tag stands on its own and is not associated with one another in any way; there’s no correlation between Jeep and Gladiator nor Subaru and Outback.
A hierarchical tagging structure establishes these relationships:
SUV.Subaru.Outback
Truck.Jeep.Gladiator
Support for a tagging hierarchy is more than just supporting the tag structure itself. More importantly, policy enforcement should respect the hierarchy as well. Let’s run through a quick contrived example; you want the following policies:
Mask by making null any SUV
data
Mask using hashing any Outback
data
With a flat structure, if you build those policies they will be in conflict with one another. To avoid that problem you would have to order which policies take precedence, which can get extremely complex when you have many policies.
Instead, if your policy engine truly supports a tagging hierarchy, like Immuta does, it will recognize that Outback is more specific than SUV, and have that policy take precedence.
Mask by making null any SUV
data
Mask using hashing any SUV.Subaru.Outback
data
Policies are applied correctly without any need for complex ordering of policies.
Yes, this does put some work on the business to correctly build specificity, or depth, into their tagging hierarchy. This is not necessarily easy; however, the logic will have to live somewhere, and having it in the tagging hierarchy rather than policy order again allows you to separate policy definition from data definition. This provides you scalability, evolvability, understandability, and, most importantly, correctness because policy conflicts can be caught at policy-authoring-time.
There are a myriad of techniques and processes companies use to determine what users should have access to which tables. Some customers have had 7 people responding to an email chain for approval before a DBA runs a table GRANT statement, for example. Manual approvals are sometimes necessary, of course, but there’s a lot of power and consistency in establishing objective criteria for gaining access to a table rather than subjective human approvals.
Let’s take the “7 people approve with an email chain” example. Ask the question, “Why do any of you 7 say yes to the user gaining access?” If it’s objective criteria, you can completely automate this process. For example, if the approver says, “I approve them because they are in group x and work in the US,” that is user metadata that could allow the user to automatically gain access to the tables, either ahead of time or when requested. This removes a huge burden from your organization and avoids mistakes.
Being objective is always better than subjective: it increases accuracy, removes bias, eliminates errors, and proves compliance. If you can be objective and prescriptive about who should gain access to what tables - you should.
The anti-pattern is manual approvals. Although there are some regulatory requirements for this, if there’s any possible way to switch to objective approvals, you should do it. With subjective human-driven approvals, there is bias, larger chance for errors, and no consistency - this makes it very difficult to prove compliance and is simply passing the buck (and risk) to the approvers and wasting their valuable time.
One could argue that it’s subjective or biased to assign a user the Country.JP
attribute. This is not true, because, remember, data policy is separated from user metadata. The act of giving a user the Country.JP
attribute is simply defining that user - it is a fact about that user, there is no implied access given to the user from this act and that attribute will be objective - e.g., you know if they are in Japan or not.
The approach where an access decision is conflated with a role or group is common practice. So not only do you end up with manual approval flows, but you also end up with role explosion from so many roles to meet every combination of access.
By having highly granular controls coupled with anonymization techniques, more data than ever can be at the fingertips of your analysts and data scientists (in some cases, up to 50% more).
Why is that?
Let’s start with a simple example and get more complex. Obviously, if you can’t do row- and column-level controls and are limited to only GRANTing access to tables, you are either over-sharing or under-sharing. In most cases, it’s under-sharing: there are rows and columns in that table the users can see, just not all of them, but they are blocked completely from the table.
That example was obvious, but it can get a little more complex. If you have column-level controls, now you can give them access to the table, but you can completely hide a column from a user by making all the values in it null, for example. Thus, they’ve lost all data/utility from that column, but at least they can get to the other columns.
That masked column can be more useful, though. If you hash the values in that column instead, utility is gained because the hash is consistent - you can track and group by the values, but can’t know exactly what they are.
But you can make that masked column even more useful! If you use something like instead of hashing, they can know many of the values, but not all of them, gaining almost complete utility from that column. As your anonymization techniques become more advanced, you gain utility from the data while preserving privacy. These are termed privacy enhancing technologies (PETs) and Immuta places them at your fingertips.
This is why advanced anonymization techniques can get significantly more data into your analysts' hands.
While columns like first_name
, last_name
, email
, and social security number
can certainly be directly identifying, something like gender
and race
, on the surface, seem like they may not be directly identifying, but it could be. Imagine if there are very few Tongan men in a data set...in fact, for the sake of this example, lets say there’s only one. So if I know of a Tongan man in that company, I can easily run a query like this and figure out that person’s salary without using their name, email, or social security number:
select salary from [table] where race = 'Tongan' and gender = 'Male';
This is the challenge with indirect identifiers. It comes down to how much your adversary, the person trying to break privacy, knows externally, which is unknowable to you. In this case, all they had to know was the person was Tongan and a man (and there happens to be only one of them in the data) to figure out their salary, sensitive information. Let's also pretend the result of that query was a salary of 106072. This is called a linkage attack and is specifically called out in privacy regulations as something you must contend with, for example, from GDPR:
Article 4(1): "Personal data" means any information relating to an identified or identifiable natural person ("data subject"); an identifiable person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that person.
Almost any useful column with many unique values will be a candidate for indirectly identifying an individual, but also be an important column for your analysis. So if you completely hide every possible indirectly identifying column, your data is left useless.
You can solve this problem with PETs. Take note of two things by querying the data:
If you only search for “Tongan” alone (no Male), there are several Tongan women, so this linkage attack no longer works: select salary, gender from [table] where race = 'Tongan';
There are no null values in the gender or race columns.
Now let's say you apply the k-anonymization masking policy using Immuta.
Then you run this query again to find the Tongan man's salary: select salary from immuta_fake_hr_data where race = 'Tongan' and gender = 'Male';
You get no results.
Now you run this query ignoring the gender: select salary, gender from immuta_fake_hr_data where race = 'Tongan';
Only the women are returned.
The linkage attack was successfully averted. Remember, from our queries prior to the policy, the salary was 106072, so let’s run a query with that: select race, gender from immuta_fake_hr_data where salary = 106072;
There he is! But race will be suppressed (NULL) so this linkage attack will not work. It was also smart enough to not suppress gender
because that did not contribute to the attack; suppressing race
alone averts the attack. This is the magic of k-anonymization: it provides as much utility as possible while preserving privacy by suppressing values that appear so infrequently (along with other values in that row) that they could lead to a linkage attack.
Cell-level security is not exactly an advanced privacy enhancing technology (PET) as in the example above, but it does provide impressive granular controls within a column for common use cases.
What is cell-level security?
If you have values in a column that should sometimes be masked, but not always, that is masking at the cell-level, meaning the intersection of a row with a column. What drives whether that cell should be masked or not is some other value (or set of values) in the rest of the row shared with that column (or a joined row from another table).
For example, a user wants to mask the credit card numbers but only when the transaction amount is greater than $500. This allows you to drive masking in a highly granular manner based on other data in your tables.
This technique is also possible using Immuta, and you can leverage tags on columns to drive which column in the row should be looked at to mask the cell in question, providing further scalability.
Immuta allows you to define policies at different levels of your data stack.
First are , which are commonly termed table access grants or table-level access. Subscription policies control access to your tables. Immuta calls them subscription policies because they are not always an access grant but could also be the result of a data consumer finding the data, requesting access, and then being subscribed to it via Immuta policy you have in place.
Second are , which control access more granularly inside a table. For example, Immuta can help you build policies to , , or even .
While it is possible to build policies one table at a time using Immuta, there isn't much value in doing so. These are termed local policies in Immuta.
These global policies will then seek out the name tag, wherever found, and apply the policy, no matter the physical location of the tables that contain names. It's important to understand that Immuta supports tag-based global policies for more than just masking. Both subscription and row-level policies can be authored as global policies targeting tags instead of physical tables and columns.
There are many guides found in this section, but an efficient approach to learning how to author secure policy would be to first read the two Immuta use cases specific to secure:
And then to focus on the complex topics around how applying policy at scale is managed in Immuta, specifically
Immuta policies restrict access to data and apply to data sources at the local or global level:
Local policies are authored against specific tables, one at a time.
Global policies target tags instead of specific tables, allowing you to build a single policy that impacts a large percentage of your data rather than building separate local policies for each table.
Consider the following local and global policy examples:
Local policy example: Mask using hashing the values in the columns ssn
, last_name
, and home_address
.
Global policy example: Mask using hashing values in columns tagged PII
on all data sources.
In this scenario, the local policy would mask the sensitive columns specified in the data policy on the single data source it was created for. If only using local policies, a data owner or governor would have to write that policy for every data source on which they wanted to mask sensitive data. The second policy would mask any column tagged PII
on all data sources that had the PII
tag applied to a column. Because this global policy automatically applies to those qualifying data sources, that policy only needs to be written once.
Consequently, global policies are the best practice for using Immuta: they provide the most scalability and manageability of access control.
For details about subscription and data policy types, see the or .
Only one Immuta tenant should apply policies to a single data source
Avoid having more than one Immuta tenant apply a policy to the same table. Multiple Immuta tenants targeting the same data source will cause issues with your deployment and policy enforcement.
Global policies can be authored by users with GOVERNANCE permission or data owners. Data owners' global policies will only attach to data sources they own (also called ), even if the tags their policies target go beyond their data sources.
When building global policies at scale, never reference individual users. The goal is to instead reference metadata about users. This way, as users gain or lose that metadata, access logic is automatically updated. This future proofs policies.
User metadata are values connected to specific Immuta users and are used in policies to gain access to tables (with subscription policies) or data (with data policies). These attributes fall into three categories:
attributes: Attributes are key/value pairs that can be assigned to users.
groups: Groups are similar to roles in a data platform; users can be assigned to one or many groups.
group attributes: It is possible to assign attributes to groups. This is a shortcut for assigning a specific attribute to a large set of users: all the users in the group will be assigned the attribute.
The table below outlines the various states of global policies in Immuta.
Note: Policies that contain the circumstance When selected by data owners cannot be staged.
Data owners who are not governors can write restricted global policies for data sources that they own. With this feature, data owners have higher-level policy controls and can write and enforce policies on multiple data sources simultaneously, eliminating the need to write redundant local policies on data sources.
Data ownership can be assigned, but it is also automatically granted to the user that registered the data source with Immuta.
All data source subscribers can access a data source's policies on the policies tab of the data source, but only data owners and governors can edit it. Although not recommended, from there users can view existing policies or apply new local policies to the data source with the following features:
Apply Existing Policies Button: By clicking this button in the top right corner of the policies tab, users can search for and apply existing policies to the data source from other data sources or global policies.
Subscription Policy Builder: In this section, users can determine who may access the data source. If a subscription policy has already been set by a global policy, a notification and a Disable button appear at the bottom of this section. Users can click the Disable button to make changes to the subscription policy.
Data Policy Builder: In this section, users can create policies to enforce privacy controls on the data source and see a list of data policies currently applied to the data source.
All changes made to policies by data owners or governors appear in a collapsible Activity panel on the right side of the screen.
The information recorded in the activity panel includes when the data source was created, the name and type of the policy, when the policy was applied or changed, and if the policy is in conflict on the data source. Additionally, global policy changes are identified by the governance icon; all other updates are labeled by the data sources icon.
If your goal is data mesh, read the content below, but also refer to the use case. It will help you understand how distributed stewardship aligns with additional data mesh strategies in Immuta.
Separation of duties is a critical component of policy enforcement. An additional component to consider is also separation of understanding, where some people in your organization are much more knowledgeable about what policies must be enforced compared to the people in your organization that understand deeply what data is contained in certain tables, experts on data, so to speak.
Wouldn’t it be nice if you could rely on data experts to ensure that data is being tagged correctly, and rely on the compliance experts to ensure that policy is being authored appropriately based on requirements - separation of understanding? This is possible with Immuta.
You can have a set of users manage the tags on the data - those who know the data best - and a separate set of users to author the policies. When they author those policies, they reference tags, a semantic layer, rather than the physical tables and columns, which they don't understand.
The tags bridge the gap between the physical world and the logical world, allowing the compliance experts to build meaningful policy leveraging the knowledge of the physical world transferred into the tags.
Remember also, it is possible to automatically tag data through , which further automates this process.
The GOVERNANCE
permission in Immuta is quite powerful, as described in our section. It is for a situation where a select few users are the only ones that control all policies.
It is possible to instead delegate policy control to data owners without giving them governance permission. This allows them to write just like governors, but they are .
Note that this capability is further enhanced with the Immuta domains feature which is currently private preview.
This is a pretty simple one: if you can’t show your work, you are in a situation of trust with no way to verify. Writing code to enforce policy (Snowflake, Databricks, etc.) or building complex policies in Ranger does show your work to a certain extent - but not enough for outsiders to easily understand the policy goals and verify their accuracy, and certainly not to the non-engineering teams that care that policy enforcement is done correctly.
With Immuta, policy is represented in natural language that is easily understood by all. This allows non-engineering users to verify that policy has been written correctly. Remember that when using global policies they leverage tags rather than physical table/column names, which further enhances understandability.
Lastly, and as covered in the , with Immuta you are able to build far fewer policies, upwards of 75x fewer policies, which provides an enormous amount of understandability with it.
Certainly this does not mean you have to build every policy through our UI - data engineers can build automation through the Immuta API, if desired, and those policies are presented in a human readable form to the non-engineering teams that need to understand how policy is being enforced.
Understandability of policy is critically important. This should be further augmented by change history around policy, and being able to monitor and attribute change.
Immuta provides this capability through extensive audit logs and takes it a step further by providing history views and diffs in the user interface.
This is different from query activity in your data platform, as discovered and surfaced in Immuta Detect. In addition to that, actions taken in Immuta that alter policy decisions are audited and allowing the creation of compliance reports around that information.
Without Immuta, if you build policy based on tasking an engineer in an ad-hoc manner there is no history of the change, nor is it possible to see the difference between the old and new policies. That makes it impossible to take a historical look at change and understand where an issue may have arisen. If you have a standardized platform for making policy changes, like Immuta, then you are able to understand and inspect those changes over time.
Subscription policies manage table-level access to data. The how-to and reference guides in this section detail how to implement subscription policies to meet your compliance requirements and describe the types of subscription policies available.
: Author a subscription policy to manage access to data sources.
: Author a subscription policy that uses attribute-based access controls to automate data access decisions.
: Author a subscription policy using the advanced special functions.
: As a data owner, author a single subscription policy that can apply to multiple data sources you own.
: Clone, activate, or stage a global policy.
: This reference guide describes the design and behavior of Immuta subscriptions policies.
: This reference guide describes read and write access policies and how they translate to privileges in the remote data platform.
: This reference guide defines the advanced functions available when building subscription policies.
Best practice: write global policies
Build global subscription policies using attributes and Discovered tags instead of writing local policies to manage data access. This practice prevents you from having to write or rewrite single policies for every data source added to Immuta and from manually approving data access.
Private preview
are only available to select accounts. Contact your Immuta representative to enable this feature.
At least one of the following permissions is required to manage write policies:
CREATE_DATA_SOURCE Immuta permission (to create local write policies)
GOVERNANCE Immuta permission (to create local or global write policies)
MANAGE_POLICIES domain permission (to create global write policies)
, , or integration
(for Snowflake integrations)
Once support for this feature has been enabled in your Immuta tenant,
Navigate to the App Settings page.
Scroll to the Preview Features section.
Click the Enable Write Policies checkbox and Save your changes.
Determine your policy scope:
Select the access type:
Read Access: Control who can view the data source.
Write Access: Control who can view and modify data in the data source.
Select the level of access restriction you would like to apply:
Allow anyone: Check the Require Manual Subscription checkbox to turn off automatic subscription. Enabling this feature will require users to manually subscribe to the data source if they meet the policy.
Allow anyone who asks (and is approved):
Click Anyone or An individual selected by user from the first dropdown menu in the subscription policy builder.
Note: If you choose An individual selected by user, when users request access to a data source they will be prompted to identify an approver with the permission specified in the policy and how they plan to use the data.
Select the Owner (of the data source), USER_ADMIN, GOVERNANCE, or AUDIT permission from the subsequent dropdown menu.
Note: You can add more than one approving party by selecting + Add Another Approver.
Allow individually selected users
For global policies: From the Where should this policy be applied dropdown menu, select When selected by data owners, On all data sources, or On data sources. If you selected On data sources, finish the condition in one of the following ways:
tagged: Select this option and then search for tags in the subsequent dropdown menu.
with columns tagged: Select this option and then search for tags in the subsequent dropdown menu.
with column names spelled like: Select this option, and then enter a regex and choose a modifier in the subsequent fields.
in server: Select this option and then choose a server from the subsequent dropdown menu to apply the policy to data sources that share this connection string.
created between: Select this option and then choose a start date and an end date in the subsequent dropdown menus.
Click Create Policy. If creating a global policy, you then need to click Activate Policy or Stage Policy.
This guide demonstrates how to build subscription policies using the policy builder in the Immuta UI. To build more complex policies than the builder allows, follow the policy guide.
Determine your policy scope:
: Click the Policies page icon in the left sidebar and select the Subscription Policies tab. Click Add Subscription Policy and complete the Enter Name field.
: Navigate to a specific data source and click the Policies tab. Click Add Subscription Policy and select New Local Subscription Policy.
Select Allow users with specific groups/attributes.
Choose the condition that will drive the policy: when user is a member of a group or possesses attribute.
Use the subsequent dropdown to choose the group or attribute for your condition. You can add more than one condition by selecting + Add Another Condition. The dropdown menu in the subscription policy builder contains conjunctions for your policy. If you select or, only one of your conditions must apply to a user for them to see the data. If you select and, all of the conditions must apply.
Check the Require Manual Subscription checkbox to turn off automatic subscription. Enabling this feature will require users to manually subscribe to the data source if they meet the policy.
If you would like to make your data source visible in the list of all data sources in the UI to all users, click the Allow Data Source Discovery checkbox. Otherwise, this data source will not be discoverable by users who do not meet the criteria established in the policy.
If you would like users to have the ability to request approval to the data source, even if they do not have the required attributes or traits, check the Request Approval to Access checkbox. This will require an approver with permissions to be set.
For global policies: Select how you want Immuta to merge multiple global subscription policies that apply to a single data source.
Always Required: Users must meet all the conditions outlined in each policy to get access (i.e., the conditions of the policies are combined with AND
).
Share Responsibility: Users need to meet the condition of at least one policy that applies (i.e., the conditions of the policies are combined with OR
).
Note: To make this option selected by default, see .
For global policies: Click the dropdown menu beneath Where should this policy be applied and select When selected by data owners, On all data sources, or On data sources. If you selected On data sources, finish the condition in one of the following ways:
tagged: Select this option and then search for tags in the subsequent dropdown menu.
with columns tagged: Select this option and then search for tags in the subsequent dropdown menu.
with column names spelled like: Select this option, and then enter a regex and choose a modifier in the subsequent fields.
in server: Select this option and then choose a server from the subsequent dropdown menu to apply the policy to data sources that share this connection string.
created between: Select this option and then choose a start date and an end date in the subsequent dropdown menus.
Click Create Policy. If creating a global policy, you then need to click Activate Policy or Stage Policy.
Subscription policies manage access to tables; they can be defined one of four ways:
Anyone: Users will automatically be granted access (least restricted).
Anyone who asks (and is approved): Users will need to request access and be granted permission by the configured approvers (moderately restricted).
Users with specific groups or attributes: Only users with the specified groups/attributes will be able to see the data source and subscribe (moderately restricted). This restriction type is referred to as an attribute-based access control (ABAC) global subscription policy throughout this page.
Individual users you select: The data source will not appear in search results; data owners must manually add/remove users (most restricted).
Immuta almost always recommends the ABAC global subscription policies, as described in our two Secure policy use cases:
See for a tutorial.
Immuta offers two types of subscription policies to manage read and write access in a single system:
Read access policies manage who can view data from the specified data sources.
Write access policies manage who can view and modify data in the specified data sources.
See the for details.
When building an ABAC global subscription policy, the users that match the condition of the policy will automatically be subscribed to the data source by default without taking any action once the policy is activated.
Should user attributes or groups change, that will immediately impact their subscription status if the change results in them either gaining or losing subscription access to table in question. That is the power in these types of policies. You are future proofing the need to alter policies and instead simply need to represent the correct metadata on your users and the policies will react.
Also, by default, user access is reflected appropriately in the Immuta user interface. If a user does not have access to a particular table, that data source will not be visible to them.
The above default behavior can be altered using the below settings in the subscription policy builder:
Allow Data Source Discovery Configures visibility so that data sources that the user cannot subscribe to are still discoverable in the Immuta.
Require Manual Subscription In this case, rather than automatically subscribing users to the table, they must click the subscribe button in Immuta to subscribe to the table. This can be valuable if data consumers don't want to have thousands of tables they could access listed in their data platform, only the ones they care about.
In some cases, multiple ABAC global policies may apply to a single data source and can allow delegation of policy to many different policy builders. Rather than allowing the two policies to conflict, Immuta combines the conditions of the subscription policies.
Combining of global subscription policies only occurs with ABAC subscription policies. See the next section for how to deal with non-ABAC subscription policy conflicts.
In the ABAC subscription policy builder, the below option determines how to merge subscription policies:
Always Required: Users must meet all the conditions outlined in each policy to get access (i.e., the conditions of the policies are combined with AND
).
Share Responsibility: Users need to meet the condition of at least one policy that applies (i.e., the conditions of the policies are combined with OR
).
Consider the following global subscription policies potentially created by completely different users:
Policy 1: (Always Required)
Allow users to subscribe to the data source when user is a member of group HR; otherwise, allow users to subscribe when approved by an Owner of the data source.
Policy 2: (Shared Responsibility)
Allow users to subscribe to the data source when user is a member of group Analytics; otherwise, allow users to subscribe when approved by anyone with permission Governance.
Policy 3: (Shared Responsibility)
Allow users to subscribe to the data source when user has attribute Office Location Ohio; otherwise, allow users to subscribe when approved by anyone with permission Audit.
If a data source exists where all of these policies apply, the subscription policies are combined, which will result in a policy like this:
Combined policy:
Allow users to subscribe to the data source when user (@isInGroups('HR')) AND ((@isInGroups('Analytics')) OR (@hasAttribute('Office Location', 'Ohio')))
Otherwise
Allow users to subscribe when approved by ( anyone with permission Owner (of this data source) )
AND
( ( anyone with permission GOVERNANCE ) OR ( anyone with permission AUDIT ) )
Note that all the policies which have Always Required
must have a manual override (approve by
) selected for there to be any approved by
in the final merged policy.
Once enabled on a data source, combined global subscription policies can be edited and disabled by data owners.
If straying outside ABAC subscription policies, it is possible multiple global subscription policies created may apply to a single data source and not merge.
Anyone
Anyone who asks (and is approved)
Individual users you select
When such conflicts occur, data owners can manually choose which policy will apply. To do this the data owner must
Disable the applied global subscription policy in the policies tab on a data source.
Provide a reason the global policy should be disabled.
Select which conflicting global subscription policy they want to apply.
Users can create more complex policies using functions and variables in the advanced DSL policy builder than the subscription policy builder allows.
By default, Immuta does not apply a subscription policy on registered data (unless an existing global policy applies to it).
Deprecation notice
There are two settings available as the default subscription policy: none or allow individually selected users.
None: The default. If this option is selected as the default subscription policy, a data source will have no subscription policy applied to it if
it is a new data source and no global policy matches it.
a data owner or governor removes an existing global subscription policy from the data source.
Once a global subscription policy matches or a data owner applies a local subscription policy to a data source, that policy will restrict users’ access to the table.
Allow individually selected users: If this option is selected, data owners have to manually add users as subscribers to the data source in Immuta for those users to query the underlying table.
Changing the default subscription policy setting only affects new data sources; existing data sources (and those in the process of being registered when the setting is changed) are unaffected. For example, if an Immuta data source’s subscription policy restricts access to members of the Marketing group before the feature is enabled, that existing subscription policy will still apply to that table in the underlying data platform; only users who are members of the Marketing group will be able to access that data.
Even if there is no subscription policy on a table, data owners and governors can manage data policies on data sources without affecting users’ access to the registered data sources. This can be powerful to manage table access outside of Immuta but want to manage data policies in Immuta.
If no subscription policy is applied to a data source, users can only subscribe as data source owners; they cannot be added as regular subscribers. To add regular subscribers, a data owner or governor must apply a subscription policy to the data source.
This page details how users can create more complex policies using functions and variables in the Advanced DSL policy builder than the Subscription Policy builder allows.
For instructions on writing Global Subscription Policies, see the following .
Navigate to the App Settings Page.
Click Advanced Settings in the left panel, and scroll to the Preview Features section.
Check the Enable Enhanced Subscription Policy Variables checkbox.
Click Save.
Navigate to the Policies Page.
Select Subscription Policies and click + Add Subscription Policy.
Choose a name for your policy and select how the policy should grant access.
Select Create using Advanced DSL.
Select the rules for your policy from the Advanced DSL options. For example, creating a @hasTagsAsAttribute('Department', 'dataSource') would subscribe all users who have an attribute that matches a tag on a data source to that data source. So users with the attribute Department.Marketing
would be subscribed to data sources with the tag Marketing.
Check the Require Manual Subscription checkbox to turn off automatic subscription. Enabling this feature will require users to manually subscribe to the data source if they meet the policy.
If you would like to make your data source visible in the list of all data sources in the UI to all users, click the Allow Data Source Discovery checkbox. Otherwise, this data source will not be discoverable by users who do not meet the criteria established in the policy.
If you would like users to have the ability to request approval to the data source, even if they do not have the required attributes or traits, check the Request Approval to Access checkbox. This will require an approver with permissions to be set.
Select how you want Immuta to merge multiple global subscription policies that apply to a single data source.
Always Required: Users must meet all the conditions outlined in each policy to get access (i.e., the conditions of the policies are combined with AND
).
Share Responsibility: Users need to meet the condition of at least one policy that applies (i.e., the conditions of the policies are combined with OR
).
Select where this policy should be applied, On data sources, When selected by data owners, or On all data sources
If a user selects On data sources options include, with columns tagged, with columns spelled like, in server, and created between.
Click Create Policy.
Data owners who are not governors can write restricted and , which allow them to enforce policies on multiple data sources simultaneously, eliminating the need to write redundant .
Unlike global policies, the application of these policies is restricted to the data sources owned by the users or groups specified in the policy and will change as users' ownerships change.
Private preview
are only available to select accounts. Contact your Immuta representative to enable this feature.
At least one of the following permissions is required to manage write policies:
CREATE_DATA_SOURCE Immuta permission (to create local write policies)
GOVERNANCE Immuta permission (to create local or global write policies)
MANAGE_POLICIES domain permission (to create global write policies)
, , or integration
(for Snowflake integrations)
Once support for this feature has been enabled in your Immuta tenant,
Navigate to the App Settings page.
Scroll to the Preview Features section.
Click the Enable Write Policies checkbox and Save your changes.
Click the Policies in the left sidebar and select Subscription Policies.
Click Add Policy, complete the Enter Name field.
Select the access type:
Read Access: Control who can view the data source.
Write Access: Control who can view and modify data in the data source.
Select the level of access restriction you would like to apply to your data sources:
Allow anyone: Check the Require Manual Subscription checkbox to turn off automatic subscription. Enabling this feature will require users to manually subscribe to the data source if they meet the policy.
Allow anyone who asks (and is approved):
Click Anyone or An individual selected by user from the first dropdown menu in the subscription policy builder.
Note: If you choose An individual selected by user, when users request access to a data source they will be prompted to identify an approver with the permission specified in the policy and how they plan to use the data.
Select the Owner (of the data source), USER_ADMIN, GOVERNANCE, or AUDIT permission from the subsequent dropdown menu.
Note: You can add more than one approving party by selecting + Add Another Approver.
Allow users with specific groups/attributes:
Use the subsequent dropdown to choose the group or attribute for your condition. You can add more than one condition by selecting + Add Another Condition. The dropdown menu in the subscription policy builder contains conjunctions for your policy. If you select or, only one of your conditions must apply to a user for them to see the data. If you select and, all of the conditions must apply.
Check the Require Manual Subscription checkbox to turn off automatic subscription. Enabling this feature will require users to manually subscribe to the data source if they meet the policy.
If you would like to make your data source visible in the list of all data sources in the UI to all users, click the Allow Data Source Discovery checkbox. Otherwise, this data source will not be discoverable by users who do not meet the criteria established in the policy.
If you would like users to have the ability to request approval to the data source, even if they do not have the required attributes or traits, check the Request Approval to Access checkbox. This will require an approver with permissions to be set.
Select how you want Immuta to merge multiple global subscription policies that apply to a single data source.
Always Required: Users must meet all the conditions outlined in each policy to get access (i.e., the conditions of the policies are combined with AND
).
Share Responsibility: Users need to meet the condition of at least one policy that applies (i.e., the conditions of the policies are combined with OR
).
Allow individually selected users
From the Where should this policy be applied dropdown menu, select When selected by data owners, On all data sources, or On data sources. If you selected On data sources, finish the condition in one of the following ways:
tagged: Select this option and then search for tags in the subsequent dropdown menu.
with columns tagged: Select this option and then search for tags in the subsequent dropdown menu.
with column names spelled like: Select this option, and then enter a regex and choose a modifier in the subsequent fields.
in server: Select this option and then choose a server from the subsequent dropdown menu to apply the policy to data sources that share this connection string.
created between: Select this option and then choose a start date and an end date in the subsequent dropdown menus.
Beneath Whose Data Sources should this policy be restricted to, add users or groups to the policy restriction by typing in the text fields and selecting from the dropdown menus that appear.
Opt to complete the Enter Rationale for Policy (Optional) field.
Click Create Policy, and then click Activate Policy or Stage Policy.
There are several different that are available for building subscription policies. Some of these functions, listed below, are narrowly focused on orchestrated RBAC use cases. Orchestrated RBAC is when an organization has many roles that represent access, and rather than switching to using the ABAC model provided by Immuta, they use these special functions to orchestrate existing roles using Immuta.
Specifically, the functions to enable orchestrated-RBAC are:
@hostname
@database
@schema
@table
@hasTagAsAttribute('Attribute Name', 'dataSource' or 'column')
@hasTagAsGroup('dataSource' or 'column')
Policy:
@hasAttribute('SpecialAccess', '@hostname.@database.*')
User:
has the attribute
SpecialAccess
with the valueus-east-1-snowflake.default.*
The user would be subscribed to all the data sources in the default
database. Note this has nothing to do with tags, it is based purely on the physical name of the host, database, schema, and table in the native data platform. Also note that the user attribute contains an asterisk *
to denote everything under the default database hierarchy. Asterisks are supported only for the infrastructure special functions:
@hostname
@database
@schema
@table
This is because, since it's an infrastructure view, Immuta can assume a 4-level hierarchy (hostname.database.schema.table) and an asterisk can be placed between any two objects in that 4-level hierarchy to represent any object, such as us-east-1-snowflake.*.hr
. That would give the user access to any schema named hr
in host us-east-1-snowflake
no matter the database.
However, that is not possible when using the tag-based special functions:
@hasTagAsAttribute('Attribute Name', 'dataSource' or 'column')
@hasTagAsGroup('dataSource' or 'column')
Lastly, the asterisk represents any object, but cannot be used for a concatenated wildcard like so: snowfl*.tpc.*.*
Policy:
@hasTagAsAttribute('PersonalData', 'dataSource')
User:
has the attribute key
PersonalData
with the values
Discovered.Person Name
Discovered.Entity
Data source 1:
tagged:
Discovered.Country
Discovered.Passport
Discovered.Person Name
Data source 2:
tagged:
Discovered.State
Discovered.Postal Code
Discovered.Entity.Social Security Number
Data source 3:
tagged:
Discovered.State
Discovered.Passport
The user would be subscribed to data source 1 and 2, but the user would not be subscribed to data source 3. This is because access moves from left-to-right in the hierarchy based on what the user possesses (the wildcard asterisk is implied).
So if a user had a more specific attribute key PersonalData
with the values Discovered.Entity.Social Security Number
, they would only get access to hypothetical data source 2, because their attribute is further left or matches (in this case matches) Discovered.Entity.Social Security Number
.
The below table provides more examples:
This could be helpful for use cases with a policy like the following:
If user has the attribute “Allowed_Domain.Domain A” they get access to generic data that is part of domain A.
If user has the attribute “Badge_Allowed.Badge X” they should gain access to both “generic data + any additional data (only in domain A because they only have “Data Domain A General Access”) that has been tagged as “Badge X”.
In this case it can be two separate subscription policies, such as
Policy 1: @hasTagAsAttribute(Allowed_Domain, ‘datasource’)
this would limit to the domains where they are allowed to see generic data.
Policy 2: @hasTagAsAttribute(Badge_Allowed, ‘datasource’)
this would limit to the badges they are allowed to see.
Then, when the data sources are tagged with table tags that represent access, if the table only has the domain tag, only policy 1 will apply; however, if it has a domain tag and a badge tag, both policies will be applied and merged successfully by Immuta.
Click the Policies icon in the left sidebar and navigate to the Data Policies or Subscription Policies tab.
Click the dropdown menu in the Actions column of a policy and select Clone.
Open the dropdown menu and click Edit in the Global Policy Builder. Then make your changes using the dropdown menus.
Click Create Policy, select Activate Policy, and then click Confirm.
The policy will display as Pending
until it is enforced on all data sources it applies to. See the for details.
Note: If a cloned policy contains custom certifications, the certifications will also be cloned.
Click the Policies icon in the left sidebar and navigate to the Data Policies tab.
Click the dropdown menu in the Actions column of one of the templated policies and select Activate. Note: If data governors decide to stage an active policy, they select Stage from this dropdown menu.
The policy will display as Pending
until it is enforced on all data sources it applies to. See the for details.
Click the dropdown menu in the Actions column of a policy and select Stage. Note: If data governors decide to make a staged policy active, they select Activate from this dropdown menu.
Click Confirm in the dialog that appears.
Private preview
Write policies are only available to select accounts. Contact your Immuta representative to enable this feature.
Immuta offers two types of to manage read and write access in a single system:
Read access policies manage who can read data.
Write access policies manage who can modify data.
Both of these access types can be enforced at any of the restriction levels outlined in the .
The table below illustrates the access types supported by each integration.
To create a read or write access policy, see the .
Once a read or write access policy is enforced on an Immuta data source, it translates to the relevant privileges on the table, view, or object in the remote platform. The sections below detail how these access types are enforced for each integration.
The Snowflake integration supports read and write access subscription policies. However, when applying read and write access policies to Snowflake data sources, the privileges granted by Immuta vary depending on the object type. For example, users can register Snowflake views as Immuta data sources and apply read and write policies to them, but when a write policy is applied to a view only the SELECT
privilege will take effect in Snowflake, as views are read-only objects.
Users can register any object stored in Snowflake’s information_schema.tables
view as an Immuta data source. The table below outlines the Snowflake privileges Immuta issues when read and write policies are applied to various object types in Snowflake. Beyond the privileges listed, Immuta always grants the USAGE
privilege on the parent schema and database for any object that access is granted to for a particular user.
The Databricks Unity Catalog integration supports read and write access subscription policies. When users create a subscription policy in Immuta, Immuta uses the Unity Catalog API to issue GRANTS
or REVOKES
against the catalog, schema, or table in Databricks for every user affected by that subscription policy.
Users can register any object stored in Databricks Unity Catalog’s information_schema.tables
view as an Immuta data source. However, when applying read and write access policies to these data sources, the privileges granted by Immuta vary depending on the object type. For example, users can register foreign tables as Immuta data sources and apply read and write policies to them, but only a read policy will take effect in Databricks and allow users to SELECT
those tables. If a write policy is applied, Immuta will not issue SELECT
or MODIFY
privileges in Databricks.
The table below outlines the Databricks privileges Immuta issues when read and write policies are applied to various object types in Databricks Unity Catalog. Beyond the privileges listed, Immuta always grants the USAGE
privilege on the parent schema and catalog for any object that access is granted to for a particular user.
The Databricks Spark integration supports read access subscription policies. When a read access policy is applied to a data source, Immuta modifies the logical plan that Spark builds when a user queries data to enforce policies that apply that user. If the user is subscribed to the data source, the user is granted SELECT
on the object in Databricks. If the user does not have read access to the object, they are denied access.
The Starburst (Trino) integration supports read and write access subscription policies. In the Starburst (Trino) integration's default configuration, the following access values grant read and write access to Starburst (Trino) data when a user is granted access through a subscription policy:
READ
: When a user is granted read access to a data source, they can SELECT
on tables or views and SHOW
on tables, views, or columns in Starburst (Trino). This setting in enabled by default when you configure the Starburst (Trino) integration.
WRITE
: In its default setting, the Starburst (Trino) integration's write access value controls the authorization of SQL operations that perform data modification (such as INSERT
, UPDATE
, DELETE
, MERGE
, and TRUNCATE
). When users are granted write access to a data source through a subscription policy, they can INSERT
, UPDATE
, DELETE
, MERGE
, and TRUNCATE
on tables and REFRESH
on materialized views. This setting is enabled by default when you configure the Starburst (Trino) integration.
Because Starburst (Trino) can govern certain table modification operations (like ALTER
) separately from data modification operations (like INSERT
), Immuta allows users to specify what modification operations are permitted on data in Starburst (Trino). Administrators can allow table modification operations (such as ALTER
and DROP
tables) to be authorized as write operations through advanced configuration in the Immuta web service or Starburst (Trino) cluster with the following access values:
OWN
: When mapped via advanced configuration to Immuta write policies, users who are granted write access to Starburst (Trino) data can ALTER
and DROP
tables and SET
comments and properties on a data source.
CREATE
: When this privilege is granted on Starburst (Trino) data, an Immuta user can create catalogs, schemas, tables, or views on a Starburst (Trino) cluster. CREATE
is a Starburst (Trino) privilege that is not controlled by Immuta policies, and this property can only be set in the access-control.properties
file on the Starburst (Trino) cluster.
Administrators can customize table and data modification settings in one or both of the following places; however, the access-control.properties
overrides the settings configured in the Immuta web service:
Immuta web service: Configuring write policies in the Immuta web service allows all Starburst (Trino) clusters targeting that Immuta tenant to receive the same write policy configuration for Immuta data sources. This configuration only affects tables or views registered as Immuta data sources. Use the option below to control how unregistered data is affected.
Immuta web service access grants mapping
Customizing read and write access in the Immuta web service affects operations on all Starburst (Trino) data registered as Immuta data sources in that Immuta tenant. This configuration method should be used when all Starburst (Trino) data source operations should be affected identically across Starburst (Trino) clusters connected to the Immuta web service. Example configurations are provided below. Contact your Immuta representative to customize the mapping of read or write access policies for your Immuta tenant.
Default configuration
The default setting shown below maps WRITE
to READ
and WRITE
permissions and maps READ
to READ
. Both the READ
and WRITE
permission should always include READ
.
In this example, if a user is granted write access to a data source through a subscription policy, that user can perform data modification operations (INSERT
, UPDATE
, MERGE
, etc.) on the data.
Custom configuration
The following configuration example maps WRITE
to READ
, WRITE
, and OWN
permissions and maps READ
to READ
. Both READ
and WRITE
permissions should always include READ
.
In this example, if a user gets write access to a data source through a subscription policy, that user can perform both data (INSERT
, UPDATE
, MERGE
, etc.) and table (ALTER
, DROP
, etc.) modification operations on the data.
Starburst (Trino) cluster access grants mapping
The Starburst (Trino) integration can also be configured to allow read and write policies to apply to any data source (registered or unregistered in Immuta) on a specific Starburst (Trino) cluster.
The default setting shown below maps WRITE
to READ
and WRITE
permissions and maps READ
to READ
. Both the READ
and WRITE
permission should always include READ
.
In this example, if a user is granted write access to a data source through a subscription policy, that user can perform data modification operations (INSERT
, UPDATE
, MERGE
, etc.) on the data.
The following configuration example maps WRITE
to READ
, WRITE
, and OWN
permissions and maps READ
to READ
. Both READ
and WRITE
permissions should always include READ
.
In this example, if a user gets write access to a data source through a subscription policy, that user can perform both data (INSERT
, UPDATE
, MERGE
, etc.) and table (ALTER
, DROP
, etc.) modification operations on the data.
Two properties customize the behavior of read or write access for all Immuta users on that Starburst cluster:
immuta.allowed.immuta.datasource.operations
: This property governs objects (catalogs, schemas, tables, etc.) that are registered as data sources in Immuta. For these permissions to apply, the user must be subscribed in Immuta and not be an administrator (who gets all permissions).
immuta.allowed.non.immuta.datasource.operations
: This property governs objects (catalogs, schemas, tables, etc.) that are not registered as data sources in Immuta. This is the only property that allows the CREATE
permission, since CREATE
is enforced on new objects that do not exist in Starburst or Immuta yet (such as a new table being created with CREATE TABLE
).
Default configuration
By default, Immuta allows READ
and WRITE
operations to be authorized on data registered in Immuta, while all operations are permitted for data sources that are not registered in Immuta.
Custom configuration
In the example below, the configuration allows READ
, WRITE
, and OWN
operations to be authorized on data sources registered in Immuta and all operations are permitted on data that is not registered in Immuta. If a user gets write access to data registered in Immuta through a subscription policy, that user can perform both data (INSERT
, UPDATE
, MERGE
, etc.) and table (ALTER
, DROP
, etc.) modification operations on the data.
The Redshift integration supports read access subscription policies. Immuta grants the SELECT
Redshift privilege to the PUBLIC
role when the integration is configured, which allows all users who meet the conditions of a subscription policy to access the Immuta-managed view. When a data source is created, Immuta creates a corresponding dynamic view of the table with a join to a secure view that contains all Immuta users, their entitlements, their projects, and a list of the tables they have access to. When a read policy is created or updated (or when a user's entitlements change, they switch projects, or when their data source access is approved or revoked), Immuta updates the secure view to grant or revoke users' access to the data source. If a user is granted access to the data source, they can access the view. If a user does not have read access to the view, zero rows are returned when they attempt to query the view.
The Azure Synapse Analytics integration supports read access subscription policies. Immuta grants the SELECT
privilege to the PUBLIC
role when the integration is configured, which allows all users who meet the conditions of a subscription policy to access the Immuta-managed view. When a read policy is created or removed (or when a user's entitlements change, they switch projects, or when their data source access is approved or revoked), Immuta updates the view that contains the users' entitlements, projects, and a list of tables they have access to grant or revoke their access to the dynamic view. Users' read access is enforced through an access check function in each individual view. If a user is granted access to the data source, they can access the view. If a user does not have read access to the view, they receive an Access denied: you are not subscribed to the data source
error when they attempt to query the view.
The Google BigQuery integration supports read access subscription policies. In this integration, Immuta creates views that contain all policy logic. Each view has a 1-to-1 relationship with the original table, and read access controls are applied in the view. After data sources are registered, Immuta uses the custom user and role, created before the integration is enabled, to push the Immuta data sources as views into a mirrored dataset of the original table. Immuta manages grants on the created view to ensure only users subscribed to the Immuta data source will see the data.
READ
and READWRITE
access levelsWith the exception of the Starburst integration, users can only modify existing data when they are granted write access to data; they cannot create new tables or delete tables.
Write actions are not currently captured in audit logs.
To build policy at scale, you must use . Global policies allow you to build policies that reference tags rather than physical tables or columns. So instead of building a policy like this mask column name in table customers
, you can instead build a policy such as mask columns tagged name anywhere you see the name tag
.
How you get the tags on the tables and columns is outlined in the use case.
Overview on how to
Overview of and
Full for all data policies
Details on how to if there's a large amount of change due to data engineering in your data platform(s)
Details on how and are managed
There is no special configuration necessary, once a user is a data owner, the global policy builder is available to them, and they can . The only difference is that global policy will be additionally restricted to only the data sources the user owns.
: Click the Policies page icon in the left sidebar and select the Subscription Policies tab. Click Add Subscription Policy and complete the Enter Name field.
: Navigate to a specific data source and click the Policies tab. Click Add Subscription Policy and select New Local Subscription Policy.
Allow users with specific groups/attributes: See the for instructions.
Read and write access can also be granted manually by a data owner. See the for details.
When you have multiple global ABAC subscription policies to enforce, create separate global ABAC subscription policies, and then Immuta will .
When you have multiple global ABAC subscription policies to enforce, create separate global ABAC subscription policies, and then Immuta will .
Request Approval to Access Selecting this option allows users to request access to a data source and be manually approved by a specified user, even if the requesting user does not meet the group or attribute conditions in the policy. This setting requires Allow Data Source Discovery to be selected as well; otherwise, they could never discover the data source to request the override. See for a tutorial.
For instructions on creating ABAC subscription policies, see the .
Find a real example of the power behind merging subscription policies in the use case.
More specifically, if two or more global subscription policies of the listed below apply to the same table (even if an ABAC subscription policy is also present) source they will conflict:
After an application admin has , data governors and owners can create global subscription policies using all the functions and variables outlined below.
The above table provides some basic guidance, for more information refer to the .
See the tutorial for details on how to build a policy that includes advanced DSL.
You can , but there are very few scenarios in which it is recommended to disable. It is best to instead have subscription policies active on standby that will be applied as you add tags to your data sources that drive those subscription policies. This is because global policies that match registered data sources will apply to data sources, no matter which subscription policy is enabled by default.
The ability to configure the behavior of the default subscription policy has been deprecated. Once this configuration setting is removed from the app settings page, Immuta will not apply a subscription policy to registered data sources unless an existing global policy applies to them. To set an "Allow individually selected users" subscription policy on all data sources, with that condition that applies to all data sources or apply a to individual data sources.
For instructions on changing the default subscription policy setting, see the .
Choose the condition that will drive the policy: when user is a member of a group or possesses attribute. Note: To build more complex policies than the builder allows, follow the policy guide.
Note: To make this option selected by default, see .
This is because Immuta cannot rely on a 4-level hierarchy always being the case. For example, *.Age
could mean many things in a tag hierarchy. However it does support using parent attributes to apply to child attributes as described in .
It is also possible to build subscription policies separately that use these special functions and have them on data sources.
While this approach is extremely powerful, in many cases, it will continue to leave you dealing with policy complexity associated with RBAC. Read the use case for more details, specifically guide.
The policy will display as Pending
until it is removed from all data sources it applies to. See the for details.
Administrators can customize write access configuration to grant additional Starburst (Trino) table modification privileges. See the below for an overview and example configurations.
Starburst (Trino) cluster: Configuring write policies using the file on a Starburst () cluster allows access to be broadly customized for Immuta users on that cluster. This configuration file takes precedence over write policies passed from the Immuta web service. Use this option if all Immuta users should have the same level of access to data regardless of the configuration in the Immuta web service.
The default configuration and an example of a custom configuration are provided below. See the for guidance on configuring these properties in your Starburst cluster.
The Amazon S3 integration supports read and write access subscription policies. Users can apply read and write access policies to data in S3 to restrict what prefixes, buckets, or objects users can access or modify. To enforce access controls on this data, Immuta creates S3 grants that are administered by S3 Access Grants, an AWS feature that defines access permissions to data in S3. To query a data source they are subscribed to, users request temporary credentials from their Access Grants instance. These just-in-time access credentials provide access to a prefix, bucket, or object with a permission level of in S3. When a user or application requests temporary credentials to access S3 data, the S3 Access Grants instance evaluates the request against the grants Immuta has created for that user. If a matching grant exists, S3 Access Grants assumes the IAM role associated with the location of the matching grant and scopes the permissions of the IAM session to the S3 prefix, bucket, or object specified by the grant and vends these temporary credentials to the requester. If the grant does not exist for the user, they receive an Access denied
error.
Immuta read policies translate to the READ
access level in S3 Access Grants and Immuta write policies translate to the READWRITE
access level. The table below outlines the Amazon S3 actions granted on an S3 data source when users meet the restrictions specified in an Immuta read or write access subscription policy that is applied to the data source. See the for more details about grants, access levels, and actions.
Active
Enforced
If policies are edited in this state, the changes will be immediately enforced on data sources when the changes are saved.
Deleted
Not enforced
Once a policy has been deleted, it cannot be recovered or reactivated.
Disabled
Not enforced
Data owners or governors can place a policy in this state at the local level for a specific data source. Although this is similar to the staged policy state, this policy will still be enforced on other data sources after it is disabled for a specific data source.
Pending
Not enforced
When Immuta pushes a new policy or a policy change to the remote platform, the policy state displays as Pending
until the change is applied to all relevant data sources in the remote platform. For example, if an active global policy is staged, the policy state is Pending
until that policy is removed from all data sources it applied to when it was active. If a policy's conditions are updated to apply to data sources tagged PII
, it will display as Pending
until it is enforced on all data sources tagged PII
and removed from data sources without that tag. Use this state to understand the amount of time your policies take to be applied to or removed from data in your remote platform. See the Data engineering with limited policy downtime guide for strategies to address policy latencies.
Staged
Not enforced
This state is useful when regularly editing and reviewing policies. This state also allows you to lift a policy's enforcement without deleting the policy so that it can easily be re-enforced. See Clone, Activate, or Stage a Global Policy for a tutorial.
@database
Users who have an attribute key that matches a database will be subscribed to the data source(s) within the database.
@hasAttribute('SpecialAccess', '@hostname.@database.*'): If a user had the attribute SpecialAccess: us-east-1-snowflake.default.*
, they would get subscribed to all the data sources in the default
database.
@hasAttribute('Attribute Name', 'Attribute Value')
Users who have the specified attribute are subscribed to the data source.
@hasAttribute('Occupation', 'Manager'): Any user who has the attribute Occupation
and the attribute key Manager
will be subscribed to the data source(s).
@hasTagAsAttribute('Attribute Name', 'dataSource' or 'column' )
Users who have an attribute key that matches a tag on a data source or column will be subscribed to that data source.
@hasTagAsAttribute('PersonalData', 'dataSource'): Users who have the attribute key PersonalData
with the values Discovered.Passport
,Discovered.Entity
would be subscribed to Data Source 1, which is tagged:[Discovered.Passport
] and Data Source 2, which is tagged:[Discovered.Entity
]. However, they would not be subscribed to Data Source 3, which is tagged: [Discovered.Country
].
@hasTagAsGroup('dataSource' or 'column' )
Users who are members of a group that matches a tag on a data source or column (respectively) will be subscribed to that data source.
@hasTagAsGroup('dataSource'): If Data Source 1 has the tags NewHire
and Interns
applied, users who are members of the groups New Hire
or Interns
would be subscribed to Data Source 1.
@hostname
Users who have an attribute key that match a hostname will be subscribed to the data source(s) with that hostname.
@hasAttribute('SpecialAccess', '@hostname.*'): If a user had the attribute SpecialAccess : us-east-1-snowflake.*
, they would get subscribed to all the data sources with the us-east-1-snowflake
hostname.
@iam
Users who sign in with the IAM with the specified ID (ID that displays on the App Settings page) will be subscribed to the data source.
@iam == 'oktaSamlIAM': Any user whose IAM ID is oktaSamlIAM
can be subscribed to the data source.
@isInGroups('List', 'of', 'Groups')
Users who are members of the specified group(s) can be subscribed to the data source.
@isInGroups('finance','marketing','newhire'): Users who are members of the groups finance
, marketing
, or newhire
can be subscribed to the data source.
@schema
Users who have an attribute key that match this schema will be subscribed to the data source(s) under that schema.
@hasAttribute('SpecialAccess', '@hostname.@database.@schema'): If a user had the attribute SpecialAccess : us-east-1-snowflake.default.public.*
, they would get subscribed to all the data sources under the public
schema.
@table
Users who have an attribute key that match this table will be subscribed to the data source(s).
@hasAttribute('SpecialAccess', '@hostname.@database.@schema.@table'): If a user had the attribute SpecialAccess : us-east-1-snowflake.default.public.credit_transactions
, they would get subscribed to the credit_transactions
data source.
'PersonalData': [Discovered.Person Name
, 'Discovered.Entity']
['Discovered.Identifier Indirect', Discovered.Person Name
]
Yes
Exact match on Discovered.Person Name
'PersonalData': ['Discovered.Entity']
['Discovered.PHI', 'Discovered.Entity.Age']
Yes
User attribute 'Discovered.Entity' is a hierarchical parent of data source tag 'Discovered.Entity.Age'
'Access': [Discovered.Person Name
, 'Discovered.Entity']
['Discovered.Identifier Indirect', Discovered.Person Name
]
No
The policy is written to only match values under the 'PersonalData' attribute key. Not 'Access'.
'PersonalData': ['Discovered']
['Discovered.Entity.Age']
Yes
User attribute 'Discovered' is a hierarchical parent of data source tag 'Discovered.Entity.Age'
'PersonalData': ['Discovered.Entity.Social Security Number']
['Discovered.Entity']
No
Hierarchical matching only happens in one direction (user attribute contains data source tag). In this case, the user attribute is considered hierarchical child of the data source tags.
Read
READ
GetObject
GetObjectVersion
GetObjectAcl
GetObjectVersionAcl
ListMultipartUploadParts
ListObjects
ListObjectsVersions
ListBucketMultipartUploads
KmsDecrypt
Write
READWRITE
GetObject
GetObjectVersion
GetObjectAcl
GetObjectVersionAcl
ListMultipartUploadParts
ListObjects
ListObjectsVersions
ListBucketMultipartUploads
KmsDecrypt
PutObject
PutObjectAcl
PutObjectVersionAcl
DeleteObject
DeleteObjectVersion
AbortMultipartUpload
KmsGenerateDataKey
Data policies determine what users see when they query data in a table they have access to.
This guide provides a general overview of data policies and their behavior.
Certification, exemptions, and diffs: Certify policies, exempt users from policies, and view policy diffs on a data source.
All data policy types: This guide describes all the data policies available in Immuta.
Masking policies: This guide describes the types of masking policies available and when to use each.
Row-level policies: Row-level policies compare data values with user metadata at query-time to determine whether or not the querying user should have access to the individual rows of data.
Custom WHERE clause functions: This guide describes the custom functions you can use to extend the PostgreSQL WHERE syntax.
Data policy conflicts and fallback: In some cases, two conflicting global masking policies apply to a single data source. This guide describes how Immuta handles those conflicts.
Custom data policy certifications: When building a global data policy, governors can create custom certifications that must be acknowledged by data owners when the policy is applied to data sources.
Orchestrated masking policies: These policies reduce conflicts between masking policies that apply to a single column, allowing policies to scale more effectively across your organization.
Best practice: write global policies
Build global policies with tags instead of writing local policies to manage data access. This practice will prevent you from having to write or rewrite single policies for every data source added to Immuta.
Determine your policy scope:
Global policy: Click the Policies page icon in the left sidebar and select the Data Policies tab. Click Add Policy and enter a name for your policy.
Local policy: Navigate to a specific data source and click the Policies tab. Scroll to the Data Policies section and click Add Policy.
Select Mask from the first dropdown menu.
Select columns tagged, columns with any tag, columns with no tags, all columns, or columns with names spelled like.
Select a masking type:
using a constant: Enter a constant in the field that appears next to the masking type dropdown.
Enter a regular expression and replacement value in the fields that appear next to the masking type dropdown.
From the next dropdown, choose to make the regex Case Insensitive and/or Global.
Select using fingerprint or specifying the bucket from the subsequent dropdown menu.
If specifying the bucket, select the Bucket Type and then enter the bucket size.
Note: If you choose by rounding as your masking type, the statistics of the data fingerprint will autogenerate the bucket size when the policy is applied to a data source.
with K-Anonymization: Select either using fingerprint or requiring group size of at least and enter a group size in the subsequent dropdown menu.
using the custom function: Enter the custom function native to the underlying database.
Note: The function must be valid for the data type of the column. If it is not, the default masking type will be applied to the column.
Select everyone except, everyone, or everyone who to continue the condition.
everyone except: In the subsequent dropdown menus, choose is a member of group, possesses attribute, or is acting under purpose. Complete the condition with the subsequent dropdown menus.
for everyone who: Complete the Otherwise clause. You can add more than one condition by selecting + Add Another Condition. The dropdown menu in the policy builder contains conjunctions for your policy. If you select or, only one of your conditions must apply to a user for them to see the data. If you select and, all of the conditions must apply.
Opt to complete the Enter Rationale for Policy (Optional) field, and then click Add.
For global policies: Click the dropdown menu beneath Where should this policy be applied and select When selected by data owners, On all data sources, or On data sources. If you selected On data sources, finish the condition in one of the following ways:
tagged: Select this option and then search for tags in the subsequent dropdown menu.
with columns tagged: Select this option and then search for tags in the subsequent dropdown menu.
with column names spelled like: Select this option, and then enter a regex and choose a modifier in the subsequent fields.
in server: Select this option and then choose a server from the subsequent dropdown menu to apply the policy to data sources that share this connection string.
created between: Select this option and then choose a start date and an end date in the subsequent dropdown menus.
Click Create Policy. If creating a global policy, you then need to click Activate Policy or Stage Policy.
This step is optional, but data governors can add certifications that outline acknowledgements or require approvals from data owners. For example, data governors could add a custom certification that states that data owners must verify that tags have been added correctly to their data sources before certifying the policy.
Click Add Certification in the data policy builder.
Enter a Certification Label and Certification Text in the corresponding fields of the dialog that appears.
Click Save.
Data owners who are not governors can write restricted subscription and data policies, which allow them to enforce policies on multiple data sources simultaneously, eliminating the need to write redundant local policies.
Unlike global policies, the application of these policies is restricted to the data sources owned by the users or groups specified in the policy and will change as users' ownerships change.
Click Policies in the left sidebar and select Data Policies.
Click Add Policy and complete the Enter Name field.
Select how the policy should protect the data. Click a link below for instructions on building that specific data policy:
Opt to complete the Enter Rationale for Policy (Optional) field, and then click Add.
From the Where should this policy be applied dropdown menu, select When selected by data owners, On all data sources, or On data sources. If you selected On data sources, finish the condition in one of the following ways:
tagged: Select this option and then search for tags in the subsequent dropdown menu.
with columns tagged: Select this option and then search for tags in the subsequent dropdown menu.
with column names spelled like: Select this option, and then enter a regex and choose a modifier in the subsequent fields.
in server: Select this option and then choose a server from the subsequent dropdown menu to apply the policy to data sources that share this connection string.
created between: Select this option and then choose a start date and an end date in the subsequent dropdown menus.
Beneath Whose Data Sources should this policy be restricted to, add users or groups to the policy restriction by typing in the text fields and selecting from the dropdown menus that appear.
Click Create Policy, and then click Activate Policy or Stage Policy.
Requirement and prerequisite:
CREATE_DATA_SOURCE
or GOVERNANCE
Immuta permission
Determine your policy scope:
Global policy: Click the Policies page icon in the left sidebar and select the Data Policies tab. Click Add Policy and enter a name for your policy.
Local policy: Navigate to a specific data source and click the Policies tab. Scroll to the Data Policies section and click Add Policy.
Select Limit usage to purpose(s) in the first dropdown menu.
In the next field, select a specific purpose that you would like to restrict usage of this data source to or ANY PURPOSE. You can add more than one condition by selecting + Add Another Condition. The dropdown menu in the policy builder contains conjunctions for your policy. If you select or, only one of your conditions must apply to a user for them to see the data. If you select and, all of the conditions must apply.
Select for everyone or for everyone except. If you select for everyone except, you must select conditions that will drive the policy such as group, purpose, or attribute.
Opt to complete the Enter Rationale for Policy (Optional) field, and then click Add.
For global policies: Click the dropdown menu beneath Where should this policy be applied, and select On all data sources, On data sources, or When selected by data owners. If you select On data sources, finish the condition in one of the following ways:
tagged: Select this option and then search for tags in the subsequent dropdown menu.
with columns tagged: Select this option and then search for tags in the subsequent dropdown menu.
with column names spelled like: Select this option, and then enter a regex and choose a modifier in the subsequent fields.
in server: Select this option and then choose a server from the subsequent dropdown menu to apply the policy to data sources that share this connection string.
created between: Select this option and then choose a start date and an end date in the subsequent dropdown menus.
Click Create Policy. If creating a global policy, you then need to click Activate Policy or Stage Policy.
Create a project: To restrict access to data and associate your data source with a purpose, create a project and add the purpose and relevant data sources to the project.
To manage and apply existing policies to data sources, a user must have either the CREATE_DATA_SOURCE Immuta permission or be manually assigned the owner role on a data source.
After a policy with a certification requirement is applied to a data source, data owners will receive a notification indicating that they need to certify the policy.
Navigate to the Policies tab of the affected data source, and review the policy in the Data Policies section.
Click Certify Policy.
In the Policy Certification modal, click Sign and Certify.
Once you have a data policy in effect, you can view the changes in your policies by clicking the Policy Diff button in the data policies section on a data source's policies tab.
The Policy Diff button displays previous policies and the current policy applied to the data source.
Determine your policy scope:
Global policy: Click the Policies page icon in the left sidebar and select the Data Policies tab. Click Add Policy and enter a name for your policy.
Local policy: Navigate to a specific data source and click the Policies tab. Scroll to the Data Policies section and click Add Policy.
Select Only show data by time from the first dropdown.
Select where data is more recent than or older than from the next dropdown, and then enter the number of minutes, hours, days, or years that you would like to restrict the data source to. Note that unlike many other policies, there is no field to select a column to drive the policy. This type of policy will be driven by the data source's event-time column, which is selected at data source creation.
Choose for everyone, everyone except, or for everyone who to drive the policy. If you choose for everyone except, use the subsequent dropdown to choose the group, purpose, or attribute for your condition. If you choose for everyone who as a condition, complete the Otherwise clause before continuing to the next step.
Opt to complete the Enter Rationale for Policy (Optional) field, and then click Add.
For global policies: Click the dropdown menu beneath Where should this policy be applied, and select On all data sources, On data sources, or When selected by data owners. If you select On data sources, finish the condition in one of the following ways:
tagged: Select this option and then search for tags in the subsequent dropdown menu.
with columns tagged: Select this option and then search for tags in the subsequent dropdown menu.
with column names spelled like: Select this option, and then enter a regex and choose a modifier in the subsequent fields.
in server: Select this option and then choose a server from the subsequent dropdown menu to apply the policy to data sources that share this connection string.
created between: Select this option and then choose a start date and an end date in the subsequent dropdown menus.
Click Create Policy. If creating a global policy, you then need to click Activate Policy or Stage Policy.
In some cases, two conflicting global masking policies apply to a single data source. When this happens, the policy containing a tag deeper in the hierarchy will apply to the data source to resolve the conflict.
Consider the following global data policies created by a data governor:
Data policy 1:
Mask columns tagged
PII
by making null for everyone on data sources with columns taggedPII
Data policy 2:
Mask columns tagged
PII.SSN
using hashing for everyone on data sources with columns taggedPII.SSN
If a data owner creates a data source and applies the PII.SSN
tag to a column, both of these global masking policies will apply to the column with that tag. Instead of having a conflict, the policy containing a deeper tag in the hierarchy will apply.
In this example, data policy 2 will be applied to the data source because PII.SSN
is deeper and thus considered more specific than PII
. If data owners wanted to use data policy 1 on the data source instead, they would need to disable data policy 2.
Should two or more masking policies target the same column and have the same hierarchy depth, the policy that was authored first will win out. This is a conservative approach that avoids the original policy being changed unexpectedly.
Similar to masking policies, it is possible for two or more row-level policies to target the same table. When this occurs, all row-level policies will be applied and AND'ed together, meaning the user will need to meet all in some capacity to see any rows in the table at all.
To OR separate row-level policies together, build them into a single Immuta policy together with an OR.
When masking columns, the type of the column matters. For example, it is not possible to hash a numeric column, because the hash would render the number as a string.
Many data platforms make the user account for this by building separate data policies for every column type that could exist now or in the future, which is quite onerous.
Instead, Immuta has intelligent fallbacks. An intelligent fallback occurs when a masking type targets a column type that is incompatible with the masking type. In this case, Immuta will fall back to the most appropriate masking type which retains the level of privacy or better required by the previous type.
For example, if a hashing masking type hits a numeric type, it would intelligently fallback to nulling the column instead, since nulls are allowed in numeric types.
Sometimes a global data policy will target a table and the policy cannot be applied as written. This can happen for several reasons, but the most common is that the row-level policy logic is not relevant to the table in question.
For example, with the following policy
@attributeValuesContains('Attribute Name', 'SOME_COLUMN')
If SOME_COLUMN does not exist in the table, the row-level policy will not work (this is why it is always recommended to use the @columnTagged('tag name')
function instead of hard coding column names).
In the case where an error such as this occurs with a global data policy, the lockout policy will kick in. The lockout policy is a row-level policy that blocks any rows from returning for any users. This may seem extreme, but since Immuta does not know how to apply the policy, the lockout policy avoids data leaks until the policy is edited to work correctly.
Private preview
This feature is only available to select accounts. Contact your Immuta representative to enable this feature.
Orchestrated masking policies (OMP) reduce conflicts between masking policies that apply to a single column, allowing policies to scale more effectively across your organization. Furthermore, OMP fosters distributed data stewardship, empowering policy authors who share responsibility of a data set to protect it while allowing data consumers acting under various roles or purposes to access the data.
When multiple masking policies apply to a column, Immuta combines the exception conditions of the masking policy so that data subscribers can access the data when they satisfy one of those exception conditions. Multiple masking policies will be enforced on a column if the following conditions are true:
Policies use the same masking type.
Policies use the for everyone except
condition.
Databricks Spark or Starburst (Trino) integration
OMP supports the following masking types:
Constant
Hashing
Format preserving masking
Null
Regex
Rounding
Governors can apply policies to all columns in a data source or target specific columns with tags or a regular expression. Without orchestrated masking policies enabled, when multiple global policies apply to the same columns, Immuta could only apply one of those policies.
Consider the following example to examine how policies behaved when one tag is used in two different policies:
Mask PII Global Policy 1: Mask using hashing the value in columns tagged email
except when user is acting under the purpose Email Campaign
.
Mask PII Global Policy 2: Mask using hashing the value in columns tagged email
except when user is acting under purpose Marketing
.
For columns tagged email
, only one of these policies is enforced. The Mask PII Global Policy 2 is not applied to the data source, so Immuta is not enforcing the masking policy properly for users who should be able to see emails because they are acting under the Marketing purpose.
Consider the following example where multiple masking policies apply to columns that have multiple tags, resulting in one policy applying:
Global Policy 3: Mask using hashing the value in columns tagged Employee Data
unless users are acting under the purpose Retention Analysis
.
Global Policy 4: Mask using hashing the value in columns tagged HR Data
unless users are acting under the purpose Employee Satisfaction Survey
.
If a column is tagged Employee Data
and HR Data
, Immuta will only apply one of the policies.
With orchestrated masking policies, Immuta applies multiple global masking policies that apply to a single column by combining the policy exceptions with OR. For these policies to combine, the masking type must be identical and the policy must use the for everyone except condition.
Consider the following example, both of these policies will apply to the data source:
Mask PII Global Policy 1: Mask using hashing the value in columns tagged email
except when user is acting under the purpose Email Campaign
.
Mask PII Global Policy 2: Mask using hashing the value in columns tagged email
except when user is acting under purpose Marketing
.
Users acting under the purpose Marketing
or Email Campaign
will be able to see emails in the clear.
However, in the following example, only one of these policies will apply to the data source because one masks using a constant and the other masks using hashing:
Global Policy 5: Mask using the constant REDACTED
the value in columns tagged Employee Data
unless users are acting under the purpose Retention Analysis
.
Global Policy 6: Mask using hashing the value in columns tagged HR Data
unless users are acting under the purpose Employee Satisfaction Survey
.
No UI enhancements were made in this release. Multiple masking policies applied to the same column are visible on a data source, but there is no indication that the exceptions are combined with OR
.
Masking types must match exactly for the policies to be combined. For example, both policies must mask using rounding.
Existing policies will not automatically migrate to the new policy logic when you enable the feature. To re-compute existing policies with the new logic, you must manually trigger global policy changes by staging and re-enabling each policy.
Table (BASE TABLE
)
SELECT
SELECT, INSERT, UPDATE, DELETE, TRUNCATE
View (VIEW
)
SELECT
Materialized view (MATERIALIZED VIEW
)
SELECT
External table (EXTERNAL TABLE
)
SELECT
Event table (EVENT TABLE
)
SELECT
Iceberg table (IS_ICEBERG=YES
)
SELECT
Dynamic table (IS_DYNAMIC=YES
)
SELECT
Data object from an incoming Data Share
Table (MANAGED
)
SELECT
SELECT, MODIFY
View (VIEW
)
SELECT
Materialized view (MATERIALIZED_VIEW
)
SELECT
Streaming table (STREAMING_TABLE
)
SELECT
External table (EXTERNAL
)
SELECT
Foreign table (FOREIGN
)
SELECT
Data object from incoming Delta Share
When executing transforms in your data platform, new tables and views are constantly being created, columns added, data changed - transform DDL. This constant transformation can cause latency between the DDL and Immuta policies discovering, adapting, and attaching to those changes, which can result in data leaks. This policy latency is referred to as policy downtime.
The goal is to have as little policy downtime as possible. However, because Immuta is separate from data platforms and those data platforms do not currently have webhooks or eventing service, Immuta does not receive alerts of DDL events. This causes policy downtime.
This page describes the appropriate steps to minimize policy downtime as you execute transforms using dbt or any other transform tool and links to tutorials that will help you complete these steps.
Required:
Schema monitoring: This feature detects destructively recreated tables (from CREATE OR REPLACE
statements) even if the table schema wasn’t changed. Enable schema monitoring when you register your data sources.
Recommended (if you are using Snowflake):
Snowflake table grants enabled: This feature implements Immuta subscription policies as table GRANTS
in Snowflake rather than Snowflake row access policies. Note this feature may not be automatically enabled if you were an Immuta customer before January 2023; see Enable Snowflake table grants to enable.
Low row access mode enabled: This feature removes unnecessary Snowflake row access policies when Immuta project workspaces or impersonation are disabled, which improves the query performance for data consumers.
To benefit from the scalability and manageability provided by Immuta, you should author all Immuta policies as global policies. Global policies are built at the semantic layer using tags rather than having to reference individual tables with policy. When using global policies, as soon as a new tag is discovered by Immuta, any applicable policy will automatically be applied. This is the most efficient approach for reducing policy downtime.
There are three different approaches for tagging in Immuta:
Auto-tagging (recommended): This approach uses sensitive data discovery (SDD) to automatically tag data.
Manually tagging with an external catalog: This approach pulls in the tags from an external catalog. Immuta supports Snowflake, Databricks Unity Catalog, Alation, and Collibra to pull in external tags.
Manually tagging in Immuta: This approach requires a user to create and manually apply tags to all data sources using the Immuta API or UI.
Note that there is added complexity with manually tagging new columns with Alation, Collibra, or Immuta. These listed catalogs can only tag columns that are registered in Immuta. If you have a new column, you must wait until after schema detection runs and detects that new column. Then that column must be manually tagged. This issue is not present when manually tagging with Snowflake or Databricks Unity catalog because they are already aware of the column or using SDD because it runs after schema monitoring.
Using this approach, Immuta can automatically tag your data after it is registered by schema monitoring using sensitive data discovery (SDD). SDD is made of algorithms you can customize to discover and tag the data most important to you and your organization's policies. Once customized and deployed, any time Immuta discovers a new table or column through schema monitoring, SDD will run and automatically tag the new columns without the need for any manual intervention. This is the recommended option because once SDD is customized for your organization, it will eliminate the human error associated with manually tagging and is more proactive than manual tagging, further reducing policy downtime.
Using this approach, you will rely on humans to tag. Those tags will be stored in the data platform (Snowflake, Databricks Unity Catalog) or catalog (Alation, Collibra). Then they will be synchronized back to Immuta. If using this option, Immuta recommends storing the tags in the data platform because the write of the tags to Snowflake or Databricks Unity Catalog can be managed and reused with SQL from tools like dbt, removing burden from manually tagging on every run. API calls to Alation and Collibra are also possible, but are not accessible over dbt. Manually tagging through the Alation or Collibra UI will negatively impact data downtime.
Using this approach, you will rely on humans to tag, but the tags will be stored directly in Immuta. This can be done using the Immuta API or through the Immuta UI However, manually tagging through the Immuta UI will negatively impact data downtime.
When registering tables with Immuta, you must register each database or catalog with schema monitoring enabled. Schema monitoring means that you do not need to individually register tables but rather make Immuta aware of databases, and then Immuta will periodically scan that database for changes and register any new changes for you. You can also manually run schema monitoring using the Immuta API.
Access to and registration of views created from Immuta-protected tables only need to be taken into consideration if you are using both data and subscription policies.
Views will have existing data policies (row-level security, masking) enforced on them that exist on the backing tables by nature of how views work (the query is passed down to the backing tables). So when you tag and register a view with Immuta, you are re-applying the same data policies on the view that already exist on the backing tables, assuming the tags that drive the data policies are the same on the view’s columns.
If you do not want this behavior or its possible negative performance consequences, then Immuta recommends the following based on how you are tagging data:
For auto-tagging, place your incremental views in a separate database that is not being monitored by Immuta. Do not register them with Immuta, and schema monitoring will not detect them from the separate database.
For either manually tagging option, do not tag view columns.
Using either option, the views will only be accessible to the person who created them. The views will not have any subscription policies applied to give other users access because the views are either not registered in Immuta or there are no tags. To give other users access to the data in the view, they should subscribe to the table at the end of the transform pipeline.
However, if you do want to share the views using subscription policies, you should ensure that the tags that drive the subscription policies exist on the view and that those tags are not shared with tags that drive your data policies. It is possible to target subscription policies on all tables or tables from a specific database rather than using tags.
Policy is enforced on READ
. Therefore, if you run a transform that creates a new table, the data in that new table will represent the policy-enforced data.
For example, if the credit_card_number
column is masked for Steve, on read, the real credit card numbers will be dynamically masked. If Steve then copies them into a new table via the transform, he is physically loading masked credit card numbers into that table. Now if another user, Jane, is allowed to see credit card numbers and queries the table, her query will not show the credit card numbers. This is because credit card numbers are already masked in that table. This problem only exists for tables, not views, since tables have the data physically copied into them.
To address this situation, you can do one of the following:
Use views for all transforms.
Ensure the users who are executing the transforms always have a higher level of access than the users who will consume the results of the transforms. Or, if this is not possible,
Set up a dev environment for creating the transformation code; then, when ready for production, have a promotion process to execute those production transformations using a system account free of all policies. Once the jobs execute as that system account, Immuta will discover, tag, and apply the appropriate policy.
Data downtime refers to the techniques you can use to hide data after transformations until Immuta policies have had a chance to synchronize. It makes data inaccessible; however, it is preferred to the data leaks that could occur while waiting for policies to sync.
Whenever DDL occurs, it can result in policy downtime, such as in the following examples:
An existing table or view is recreated with the CREATE OR REPLACE
statement. This will drop all policies resulting in users losing all access. There is one exception: with Databricks Unity, the grants on the table are not dropped, which means masking and row filtering policies are dropped but the table access is not, making policy downtime even more critical to manage.
A new column is added to a table that needs to be masked from users that have access to that table (potential data leak).
A new table is created in a space where other users have read access on future tables (potential data leak).
A tag that drives a policy is updated, deleted, or newly added with no other changes to the schema or table. This is a limitation of how Immuta can discover changes - it is too inefficient to search for tag changes, so schema changes are what drives Immuta to take action.
Immuta recommends all of the following best practices to ensure data downtime occurs during policy downtime:
Do not COPY GRANTS
when executing a CREATE OR REPLACE
statement.
Do not use GRANT SELECT ON FUTURE TABLES
.
Use CREATE OR REPLACE
for all DDL, including altering tags, so that access is always revoked.
Without these best practices, you are making unintentional policy decision in Snowflake that may be in conflict with your organization's actual policies enforced by Immuta.
For example, if the CREATE OR REPLACE
added a new column that contains sensitive data, and the user COPY GRANTS
, it opens that column to users, causing a data leak. Instead, access must be blocked using the above data downtime techniques until Immuta has synchronized.
Immuta recommends all of the following best practices to ensure data downtime occurs during policy downtime:
Do not grant to catalogs or schemas, because those apply to current and future objects (future is the problematic piece).
There is no way to stop Databricks Unity Catalog from carrying over table grants on CREATE OR REPLACE
statements, so ensure you initiate policy uptime as quickly as possible if you have row filters or masking policies on that table.
Without these best practices, you are making unintentional policy decision in Unity Catalog that may be in conflict with your organization's actual policies enforced by Immuta.
For example, if you GRANT SELECT
on a schema, and then someone writes a new table with sensitive data into that schema, it could cause a data leak. Instead, access must be blocked using the above data downtime techniques until Immuta has synchronized.
As discussed above, data platforms do not currently have webhooks or eventing service, so Immuta does not receive alerts of DDL events. Schema monitoring will run every 24 hours by default to detect changes, but schema monitoring should also be run across your databases after you make changes to them. You can manually run schema monitoring using the Immuta API and the payload can be scoped down run schema monitoring on a specific database or schema or column detection on a specific table.
When schema monitoring is run globally, it will detect the following:
Any new table
Any new view
Changes to the object type backing an Immuta data source (for example, when a TABLE
changes to a VIEW
); when an object type changes, Immuta reapplies existing policies to the data source.
Any existing table destructively recreated through CREATE OR REPLACE
(even if there are no schema changes)
Any existing view destructively recreated through CREATE OR REPLACE
(even if there are no schema changes)
Any dropped table
Any new column
Any dropped column
Any column type change (which can impact policy application)
Any tag created, updated, or deleted (but only if the schema changed; otherwise tag changes alone are detected with Immuta’s health check)
Then, if any of the above is detected, for those tables or views, Immuta will complete the following:
Synchronize the existing policy back to the table or view to reduce data downtime
If SDD is enabled, execute SDD on any new columns or tables
If an external catalog is configured, execute a tag synchronization
Synchronize the final updated policy based on the SDD results and tag synchronization
Apply "New" tags to all tables and columns not previously registered in Immuta and lock them down with the "New Column Added" templated global policy
See the image below for an illustration of this process with Snowflake.
The two options for running schema monitoring are described in the sections below. You can implement them together or separately.
If the data platform supports custom UDFs and external functions, you can wrap the /dataSource/detectRemoteChanges
endpoint with one. Then, as your transform jobs complete, you can use SQL to call this UDF or external function to tell Immuta to execute schema monitoring. The reason for wrapping in a UDF or external function is because dbt and transform jobs always compile to SQL, and the best way to make this happen immediately after the table is created (after the transform job completes) is to execute more SQL in the same job.
Consult your Immuta professional for a custom UDF compatible with Snowflake or Databricks Unity Catalog.
The default schedule for Immuta to run schema monitoring is every night at 12:30 a.m. UTC. However, this schedule can be updated by changing some advanced configuration. The processing time for schema monitoring is dependent on the number of tables and columns changed in your data environment. If you want to change the schedule to run more frequently than daily, Immuta recommends you test the runtime (with a representative set of DDL changes) before making the configuration change.
Consult your Immuta professional to update the schema monitoring schedule, if desired.
There are some use cases where you want all users to have access to all tables, but want to mask sensitive data within those tables. While you could do this using just data policies, Immuta recommends you still utilize subscription policies to ensure users are granted access.
Subscription policies allow for Immuta to have a state to move table access into post-data-downtime to realize policy uptime. Without subscription policies, when Immuta synchronizes policy, users will continue to not have access to tables because there is no subscription policy granting them access. If you want all users to have access to all tables, use a global "Anyone" subscription policy in Immuta for all your tables. This will ensure users are granted access back to the tables after data downtime.
Data policies manage what users see when they query data in a table they have access to.
There are three different ways to restrict data with data policies:
Row-level: Filter rows from certain users at query time.
Column masking: Mask values in a column at query time.
Cell masking: Mask specific cells in a column based on separate values in the same row at query time.
When applying a data policy, it will always be enforced for all users, following the principle of least privilege, unless optional exceptions are added to policies. Data policy exceptions are built using any of the following conditions, which can be mixed with boolean logic:
If the user is a member of a group (or several groups)
If the user possesses a particular attribute (or several attributes)
If the user is acting under a purpose (or several purposes) for which the data is allowed to be used
Data policy exceptions are very similar to subscription policies from this perspective. With subscription policies, nobody has access to a newly created table until someone says otherwise with a subscription policy (as long as you follow best practices for newly created tables and views). Similarly, when a masking policy is set on a column or a row-level policy on a table, it applies to everyone until someone says otherwise with an exception to the data policy.
If user metadata is stored in a table in the same data platform where a policy is enforced, it is not necessary to move that user metadata in Immuta. Instead it can be referenced directly using custom WHERE functions in data policies.
Below is an example row-level policy that leverages a lookup table to dynamically drive access to rows in a table:
0123456789
Lewes, DE
00:07:34
4
9876543210
College Park, MD
09:16:08
8
The final column in the table, ACCESS_LEVEL, defines who can see that row of data.
Now consider the following hierarchy:
In this diagram, there are 11 different access levels (AL) to data and the tree defines access. For example, if a user has Vegetables
, they get access levels 2, 3, 4, 9, 10, and 11. If a user has Pear
, they only get access level 8. In other words, a user with Vegetables
would see the first row of the above table, a user with Pear
would see the second row of the above table, and a user with Food
would see both rows of the table.
Taking the example further, that hierarchy tree is represented as a table in the data platform that we wish to use to drive the row-level policy:
1
Food
2
Food
3
Food
4
Food
5
Food
6
Food
7
Food
8
Food
9
Food
10
Food
11
Food
2
Vegetables
3
Vegetables
4
Vegetables
9
Vegetables
10
Vegetables
11
Vegetables
5
Fruits
6
Fruits
7
Fruits
8
Fruits
4
Carrots
9
Leafy
10
Leafy
11
Leafy
5
Orange
6
Orange
7
Orange
8
Pear
10
Lettuce
11
Lettuce
That hierarchy lookup table can be referenced in the row-level policy as user metadata like this:
@columnTagged('access_level')
IN (SELECT ACCESS_LEVEL from [lookup table] where@attributeValuesContains('user_level', 'ROOT')
)
Walking through the policy step-by-step:
@columnTagged('access_level')
: This allows us to target multiple tables with an ACCESS_LEVEL column that needs protecting with a single policy. Simply tag all the ACCESS_LEVEL columns with the access_level
tag and this policy would apply to all of them.
IN (SELECT ACCESS_LEVEL from [lookup table]
: This is selecting the matching ACCESS_LEVEL from the lookup table to use as the IN clause for filtering the actual business table.
where @attributeValuesContains('user_level', 'ROOT')
: This is comparing the user's attribute user_level
to the value in the ROOT column, and if there's a match, that ACCESS_LEVEL is used for filtering in the previous step. See the custom WHERE documentation for more details on these functions.
So, you can then add metadata to your users in Immuta, such as Vegetables
or Pear
and that will result in them seeing the appropriate rows in the business table in question.
The above example used a row-level policy, but it could instead do cell masking using the same technique:
Mask columns tagged
Credit Card Number
using hashing where@columnTagged('access_level')
NOT IN (SELECT ACCESS_LEVEL from [lookup table] where@attributeValuesContains('user_level', 'ROOT')
)
In this case, the credit card number will be masked if the access_level is not found for the user for that row.
Even if not using a lookup table, the power of the @columnTagged('tag name')
function is apparent for applying your masking or row-level policies at scale.
There are several ways to mask data, and choosing the correct masking type has various tradeoffs. It's important to understand those tradeoffs when choosing masking types.
How data policy conflicts and intelligent fallbacks are managed
The Immuta Policy Builder allows you to use custom functions that reference important Immuta metadata from within your where clause. These custom functions can be seen as utilities that help you create policies easier. Using the Immuta Policy Builder, you can include these functions in your policy queries by choosing where in the sub-action drop-down menu.
@attributeValuesContains()
FunctionThis function returns true
for a given row if the provided column evaluates to an attribute value for which the querying user has a corresponding attribute value. This function requires two arguments and accepts no more than three arguments.
@columnTagged()
FunctionThis function returns the column name with the specified tag.
If this function is used in a Global Policy and the tag doesn't exist on a data source, the policy will not be applied.
@groupsContains()
FunctionThis function returns true
for a given row if the provided column evaluates to a group to which the querying user belongs. This function requires at least one argument.
@hasAttribute()
FunctionThis function returns a boolean indicating if the current user has the specified attribute name and value combination. If the specified attribute name or attribute value has a single quote, you will need to escape it using a \'\'
expression within a custom WHERE
policy.
@iam
FunctionThis function returns the IAM ID for the current user.
None.
@isInGroups()
FunctionThis function returns a boolean indicating if the current user is a member of all of the specified groups. If any of the specified groups has a single quote, you will need to escape it using a \'\'
expression within a custom WHERE
policy.
@isUsingPurpose()
FunctionThis function returns a boolean indicating if the current user is using the specified purpose. If the specified purpose has a single quote, you will need to escape it using a \'\'
expression within a custom WHERE
policy.
@purposesContains()
FunctionThis function returns true
for a given row if the provided column evaluates to a purpose under which the querying user is currently acting. This function requires at least one argument and accepts no more than two arguments.
@username
FunctionThis function returns the current user's user name.
None.
Determine your policy scope:
: Click the Policies page icon in the left sidebar and select the Data Policies tab. Click Add Policy and enter a name for your policy.
: Navigate to a specific data source and click the Policies tab. Scroll to the Data Policies section and click Add Policy.
Select the Only show rows action from the first dropdown.
Choose one of the following policy conditions:
Where user
Choose the condition that will drive the policy from the next dropdown: is a member of a group or possesses an attribute.
Use the next field to choose the attribute, group, or purpose that you will match values against.
Use the next dropdown menu to choose the tag that will drive this policy. You can add more than one condition by selecting + Add Another Condition. The dropdown menu in the far right of the policy builder contains conjunctions for your policy. If you select or, only one of your conditions must apply to a user for them to see the data. If you select and, all of the conditions must apply.
Where the value in the column tagged
Select the tag from the next dropdown menu.
From the subsequent dropdown, choose is or is not in the list, and then enter a list of comma-separated values.
Where
Enter a valid SQL WHERE clause in the subsequent field. When you place your cursor in this field, a tooltip details valid input and the column names of your data source. See for more information about specific functions.
Never
The never condition blocks all access to the data source.
Choose the condition that will drive the policy from the next dropdown: for everyone, for everyone except, or for everyone who.
Select the condition that will further define the policy: is a member of group, is acting under a purpose, or possesses attribute.
Use the next field to choose the group, purpose, or attribute that you will match values against.
Choose for everyone, everyone except, or for everyone who to drive the policy. If you choose for everyone except, use the subsequent dropdown to choose the group, purpose, or attribute for your condition. If you choose for everyone who as a condition, complete the Otherwise clause before continuing to the next step.
Opt to complete the Enter Rationale for Policy (Optional) field, and then click Add.
For global policies: Click the dropdown menu beneath Where should this policy be applied, and select On all data sources, On data sources, or When selected by data owners. If you select On data sources, finish the condition in one of the following ways:
tagged: Select this option and then search for tags in the subsequent dropdown menu.
with columns tagged: Select this option and then search for tags in the subsequent dropdown menu.
with column names spelled like: Select this option, and then enter a regex and choose a modifier in the subsequent fields.
in server: Select this option and then choose a server from the subsequent dropdown menu to apply the policy to data sources that share this connection string.
created between: Select this option and then choose a start date and an end date in the subsequent dropdown menus.
Click Create Policy. If creating a global policy, you then need to click Activate Policy or Stage Policy.
Masking policies hide values in data, providing various levels of utility while still preserving privacy. Immuta offers and .
Column masking policies allow you to hide the data in a column. However, there are several different approaches for masking data that allow you to make tradeoffs between privacy (how far you go with masking) vs utility (how much you want the masked data to be useful to the data consumer).
As with all Immuta policy types, it is recommended that you use when to manage policies at scale. When using global policies, tagging your data with metadata becomes critical and is described in detail in the use case.
Categorical Randomized Response: Categorical values are randomized by replacing a value with some non-zero probability. Not all values are randomized, and the consumer of the data is not told which values are randomized and which ones remain unchanged. Values are replaced by selecting a different value uniformly at random from among all other values. If a randomized response policy were applied to a “state” column, a person’s residency could flip from Maryland to Virginia, which would provide ambiguity to the actual state of residency. This policy is appropriate when obscuring sensitive values such as medical diagnosis or survey responses.
Custom Function: This function uses SQL functions native to the underlying database to transform the values in a column. This can be used in numerous use cases, but notional examples include top-coding to some upper limit, a custom hash function, and string manipulation.
K-Anonymization: Masking through k-anonymization is a distinct policy that can operate over multiple attributes. A k-anonymization policy applies rounding and NULL masking policies over multiple columns so that the columns contain at least “K” records, where K is a positive integer. As a result, attributes will only be disclosed when there is a sufficient number of observations. This policy is appropriate to apply over indirect identifiers, such as zip code, gender, or age. Generally, each of these identifiers is not uniquely linked to an individual, but when combined with other identifiers can be associated with a single person. Applying k-anonymization to these attributes provides the anonymity of crowds so that individual rows are made indistinct from each other, reducing the re-identification risk by making it unclear which record corresponds to a specific person. Immuta requires that you . Immuta supports k-anonymization of text, numeric, and time-based data types.
Mask with Format Preserving Masking: This function masks using a reversible function but does so in a way that the underlying structure of a value is preserved. This means the length and type of a value are maintained. This is appropriate when the masked value should appear in the same format as the underlying value. Examples of this would include social security numbers and credit card numbers where Mask with Format Preserving Masking would return masked values in a format consistent with credit cards or social security numbers, respectively. There is larger overhead with this masking type, and it should really only be used when format is critically valuable, such as situations when an engineer is building an application where downstream systems validate content. In almost all analytical use cases, format should not matter.
Mask with Reversibility: This function masks in a way that an authorized user can “unmask” a value and reveal the value to an authorized user. Masking with Reversibility is appropriate when there is a need to obscure a value while allowing an authorized user to recover the underlying value. All of the same use cases and caveats that apply to Replace with Hashing apply to this function. Reversibly masked fields can leak the length of their contents, so it is important to consider whether or not this may be an attack vector for applications involving its use.
Randomized Response: This function randomizes the displayed value to make the true value uncertain, but maintains some analytic utility. The randomization is applied differently to both categorical and quantitative values. In both cases, the noise can be increased to enhance privacy or reduced to preserve more analytic value. Immuta requires that you .
Datetime and Numeric Randomized Response: Numeric and datetime randomized response apply a tunable, unbiased noise to the nominal value. This noise can obscure the underlying value, but the impact of the noise is reduced in aggregate. This masking type can be applied to sensitive numerical attributes, such as salary, age, or treatment dates. Immuta requires that you .
Replace with Constant: This function replaces any value in a column with a specified value. The underlying data will appear to be a constant. This masking carries the same privacy and utility guarantees as Replace with NULL. Apply this policy to strings that require a specific repeated value.
Replace with Hashing: This function masks the values with an irreversible hash, which is consistent for the same value throughout the data source, so you can count or track the specific values, but not know the true raw value. This is appropriate for cases where the underlying value is sensitive, but there is a need to segment the population. Such attributes could be addresses, time segments, or countries. It is important to note that hashing is susceptible to inference attacks based on prior knowledge of the population distribution. For example, if “state” is hashed, and the dataset is a sample across the United States, then an adversary could assume that the most frequently occurring hash value is California. As such, it's most secure to use the hashing mask on attributes that are evenly distributed across a population.
Replace with Null: This function replaces any value in a column with NULL
. This removes any identifiability from the column and removes all utility of the data. Apply this policy to numeric or text attributes that have a high re-identification risk, but little analytic value (names and personal identifiers).
Replace with REGEX: This function uses a regular expression to replace all or a portion of an attribute. REGEX replacement allows for some groupings to be maintained, while providing greater ambiguity to the disclosed value. This masking technique is useful when the underlying data has some consistent structure, the remasked underlying data represents some re-identification risk, and a regular expression can be used to mask the underlying data to be less identifiable.
Rounding: Immuta’s rounding policy reduces, rounds, or truncates numeric or datetime values to a fixed precision. This policy is appropriate when it is important to maintain analytic value of a quantity, but not at its native precision.
Date/Time Rounding: This policy truncates the precision of a datetime value to a user-defined precision. `minute`, `hour`, `day`, `months`, and `year` are the supported precisions.
Numeric Rounding: This policy maps the nominal value to the ceiling of some specified bandwidth. Immuta has a recommended bandwidth based on the Freedman-Diaconis rule.
For example, a regular masking policy looks like the following:
Mask columns tagged
Discovered.Entity.Social Security Number
using hashing for everyone except members of group admins
The cells can be conditionally masked by changing the for
to a where
:
Mask columns tagged
Discovered.Entity.Social Security Number
using hashing wherecountry_of_residence = 'US'
for everyone except members of group admins
That policy will check the country_of_residence
column in the table and if the value is US
the cell will be masked, otherwise the data will be presented in the clear as usual.
Mask columns tagged
Discovered.Entity.Social Security Number
using hashing where@columnTagged('country') = 'US'
for everyone except members of group admins
This example policy targets the column with the tag country
in the policy logic dynamically.
The masking functions described above can be implemented in a variety of use cases. Use the table below to determine the circumstance under which a function should be used.
Applicable to Numeric Data: The masking function can be applied to numeric values.
Column-Value Determinism: Repeated values in the same column are masked with the same output.
Introduces NULLs: The masking function may, under normal or irregular circumstances, return NULL values.
Performance: How performant the masking function will be (10/10 being the best).
Preserves Appearance: The output masked value resembles the valid column values. For example, a masking function would output phone numbers when given phone numbers. Here, NULL values are not counted against this property.
Preserves Averages: The average of the masked values (avg(mask(v))
) will be near the average of the values in the clear (avg(v)
).
Suitable for De-Identification: The masking function can be used to obscure record identifiers, hiding data subject identities and preventing future linking against other identified data.
Provides Deniability of Record Content: A (possibly identified) person can plausibly attribute the appearance of the value to the masking function. This is a desirable property of masking functions that retain analytic utility, as such functions must necessarily leak information about the original value. Fields masked with these functions provide strong protections against value inference attacks.
Preserves Equality and Grouping: Each value will be masked to the same value consistently without colliding with others. Therefore, equal values remain equal under masking while unequal values remain unequal, preserving equality. This implies that counting statistics are also preserved.
Preserves Message Length: The length of the masked value is equal to the length of the original value.
Preserves Range Statistics: The number of data values falling in a particular range is preserved. For strings, this can be interpreted as the number of strings falling between any two values by alphabetical order.
Preserves Value Locality: The output will remain near the input, which may be important for analytic purposes.
Reversible: Qualified individuals can reveal the original input value.
Masking policy support by integration
Since global policies can apply masking policies across multiple different databases at once, if an unsupported masking policy is applied to a column, Immuta will revert to NULLing that column.
Immuta row-level policies compare data values with user metadata at query-time to determine whether or not the querying user should have access to the individual rows of data.
The values contained in one or many columns in the table in question (or a ) need to be referenced by the policy for its logic to take effect.
For example, consider the policy below:
Only show rows where user is a member of a group that matches the value in the column tagged
Department
.
The data values (the values in the column tagged Department
) are matched against the user attribute (their groups) to determine whether or not rows will be visible to the user accessing the data.
The policy targets columns tagged Department
; this means that this policy can be applied globally across all tables and data platforms that have that tag with this single policy rather than having to build a separate policy for individual tables and columns.
It is also possible to use custom functions in row-level policies for more complex use cases.
These wrap Immuta context into free-form SQL logic for the row-level policy. That context can be things like the attributes (@attributeValuesContains()
) or groups (@groupsContains()
) possessed by the user or the username (@username
) - injected into the SQL at runtime.
Avoid referencing explicit column names in custom functions and instead use the @columnTagged('tag name')
function in SQL. In doing so, you can avoid having to reference the physical database world with the custom SQL policies and instead continue to target the metadata/tag world.
Once a user is subscribed to a data source, the data policies that are applied to that data source determine what data the user sees.
For all data policies, you must establish the conditions for which they will be enforced. Immuta allows you to append multiple conditions to the data. Those conditions are based on user attributes and groups (which can come from multiple identity management systems and applied as conditions in the same policy), or purposes they are acting under through Immuta projects.
Conditions can be directed as exclusionary or inclusionary, depending on the policy that's being enforced:
exclusionary condition example: Mask using hashing values in columns tagged PII
on all data sources for everyone except users in the group AUDIT
.
: Only show rows where user is a member of a group that matches the value in the column tagged Department
.
The table below outlines the policy types supported by each integration. Details about each of these policies are included in the section.
*Supported with Caveats:
On Databricks data sources, joins will not be allowed on data protected with replace with NULL/constant policies.
On Trino data sources, the Immuta function @iam
for WHERE clause policies can block the creation of views.
For example, governors could mask values using hashing for users acting under a specified purpose while masking those same values by making null for everyone else who accesses the data.
This variation can be created by selecting for everyone who when available from the condition dropdown menus and then completing the Otherwise clause.
For example, if the purpose Research
included Marketing
, Product
, and Onboarding
as sub-purposes, a governor could write the following global policy:
Limit usage to purpose(s) Research for everyone on data sources tagged PHI.
This hierarchy allows you to create this as a single purpose instead of creating separate purposes, which must then each be added to policies as they evolve.
Now, any user acting under the purpose or sub-purpose of Research
- whether Research.Marketing
or Research.Onboarding
- will meet the criteria of this policy. Consequently, purpose hierarchies eliminate the need for a governor to rewrite these global policies when sub-purposes are added or removed. Furthermore, if new projects with new Research purposes are added, for example, the relevant global policy will automatically be enforced.
Masking policies hide values in data, providing various levels of utility while still preserving privacy.
This policy masks the values with an irreversible sha256 hash, which is consistent for the same value throughout the data source, so you can count or track the specific values, but not know the true raw value.
This policy makes values null, removing any utility of the data the policy applies to.
With this policy, you can replace the values with the same constant value you choose, such as 'Redacted', removing any utility of that data.
This policy is similar to replacing with a constant, but it provides more utility because you can retain portions of the true value. When authoring the policy in Immuta, the regex and the replacement value do not need to be in single or double quotes.
The following regex rule would mask the final digits of an IP address:
Mask using a regex
\d+$
the value in the columnsip_address
for everyone.
In this case, the regular expression \d+$
\d
matches a digit (equal to [0-9])
+
Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
$
asserts position at the end of the string, or before the line terminator right at the end of the string (if any)
This ensures we capture the last digit(s) after the last .
in the ip address. We then can enter the replacement for what we captured, which in this case is XXX
. So the outcome of the policy, would look like this: 164.16.13.XXX
This regex rule applies masking to telephone numbers variably depending on the presence of a dash (implying a prefix), space, or only digits:
Mask using a regex (\+?\d{0,3}[-\s]?)?\d{4} the value in the column tagged
Discovered...Telephone Number
for everyone.
The image below illustrates authoring a regex global policy that will apply to Databricks Unity Catalog data sources:
Databricks Unity Catalog integration regex_replace function
This is a technique to hide precision from numeric values while providing more utility than simply hashing. For example, you could remove precision from a geospatial coordinate. You can also use this type of policy to remove precision from dates and times by rounding to the nearest hour, day, month, or year.
Deprecation notice
Support for unmask requests has been deprecated.
Note: The user receiving the unmasking request must send the unmasked value to the requester.
With Reversible Masking, the raw values are switched out with consistent values to allow analysis without revealing the underlying sensitive data. The direct identifier is replaced with a token that can still be tracked or counted.
This option masks the value, but preserves the length and type of the value.
Preserving the data format is important if the format has some relevance to the analysis at hand. For example, if you need to retain the integer column type or if the first 6 digits of a 12-digit number have an important meaning.
This option uses functions native to the underlying database to transform the column. Single quotes enclosing the regex and escaping special characters are required. The following example masks telephone numbers variably depending on the presence of a dash (implying a prefix), space, or only digits:
The image below illustrates authoring a global policy using this custom function:
Limitations
The masking functions are executed against the remote database directly. A poorly written function could lead to poor quality results, data leaks, and performance hits.
Using custom functions can result in changes to the original data type. In order to prevent query errors you must ensure that you cast this result back to the original type.
The function must be valid for the data type of the selected column. If it is not
Local policies will error and show a message that the function is not valid.
Global policies will error and change to the default masking type (hashing for text and NULL for all others).
For all of the policies above, both at the local and global policy levels, you can conditionally mask the value based on a value in another column. This allows you to build a policy that looks something like: "Mask bank account number where country = 'USA'" instead of blindly stating you want bank account masked always.
Sample data is processed during computation of k-anonymization policies
When a k-anonymization policy is applied to a data source, the columns targeted by the policy are queried under a fingerprinting process that generates rules enforcing k-anonymity. The results of this query, which may contain data that is subject to regulatory constraints such as GDPR or HIPAA, are stored in Immuta's metadata database.
The location of the metadata database depends on your deployment:
Self-managed Immuta deployment: The metadata database is located in the server where you have your external metadata database deployed.
SaaS Immuta deployment: The metadata database is located in the AWS global segment you have chosen to deploy Immuta.
K-anonymity is measured by grouping records in a data source that contain the same values for a common set of quasi identifiers (QIs) - publicly known attributes (such as postal codes, dates of birth, or gender) that are consistently, but ambiguously, associated with an individual.
The k-anonymity of a data source is defined as the number of records within the least populated cohort, which means that the QIs of any single record cannot be distinguished from at least k other records. In this way, a record with QIs cannot be uniquely associated with any one individual in a data source, provided k is greater than 1.
In Immuta, masking with k-anonymization examines pairs of values across columns and hides groups that do not appear at least the specified number of times (k). For example, if one column contains street numbers and another contains street names, the group 123, "Main Street"
probably would appear frequently while the group 123, "Diamondback Drive"
probably would show up much less. Since the second group appears infrequently, the values could potentially identify someone, so this group would be masked.
After the fingerprint service identifies columns with a low number of distinct values, users will only be able to select those columns when building the policy. Users can either use a minimum group size (k) given by the fingerprint or manually select the value of k.
Note: The default cardinality cutoff for columns to qualify for k-anonymization is 500.
Masking multiple columns with k-anonymization
Governors can write global data policies using k-anonymization in the global data policy builder.
When this global policy is applied to data sources, it will mask all columns matching the specified tag.
Applying k-anonymization over disjoint sets of columns in separate policies does not guarantee k-anonymization over their union.
If you select multiple columns to mask with k-anonymization in the same policy, the policy is driven by how many times these values appear together. If the groups appear fewer than k times, they will be masked.
For example, if Policy A
Policy A: Mask with k-anonymization the values in the columns
gender
andstate
requiring a group size of at least 2 for everyone
was applied to this data source
the values would be masked like this:
Note: Selecting many columns to mask with k-anonymization increases the processing that must occur to calculate the policy, so saving the policy may take time.
However, if you select to mask the same columns with k-anonymization in separate policies, Policy C and Policy D,
Policy C: Mask with k-anonymization the values in the column
gender
requiring a group size of at least 2 for everyonePolicy D: Mask with k-anonymization the values in the column
state
requiring a group size of at least 2 for everyone
the values in the columns will be masked separately instead of as groups. Therefore, the values in that same data source would be masked like this:
Sample data is processed during computation of randomized response policies
When a randomized response policy is applied to a data source, the columns targeted by the policy are queried under a fingerprinting process. To enforce the policy, Immuta generates and stores predicates and a list of allowed replacement values that may contain data that is subject to regulatory constraints (such as GDPR or HIPAA) in Immuta's metadata database.
The location of the metadata database depends on your deployment:
Self-managed Immuta deployment: The metadata database is located in the server where you have your external metadata database deployed.
SaaS Immuta deployment: The metadata database is located in the AWS global segment you have chosen to deploy Immuta.
This policy masks data by slightly randomizing the values in a column, preserving the utility of the data while preventing outsiders from inferring content of specific records.
All members of this cohort have indicated substance abuse, sensitive personal information that could have damaging consequences, and, even though direct identifiers have been removed and k-anonymization has been applied, outsiders could infer substance abuse for an individual if they knew a male welder in this zip code.
In this scenario, using randomized response would change some of the Y's in substance_abuse
to N's and vice versa; consequently, outsiders couldn't be sure of the displayed value of substance_abuse
given in any individual row, as they wouldn't know which rows had changed.
How the randomization works
Immuta applies a random number generator (RNG) that is seeded with some fixed attributes of the data source, column, backing technology, and the value of the high cardinality column, an approach that simulates cached randomness without having to actually cache anything.
For numeric data, Immuta uses the RNG to add a random shift from a 0-centered Laplace distribution with the standard deviation specified in the policy configuration. For most purposes, knowing the distribution is not important, but the net effect is that on average the reported values should be the true value plus or minus the specified deviation value.
Preserving data utility
Using randomized response doesn't destroy the data because data is only randomized slightly; aggregate utility can be preserved because analysts know how and what proportion of the values will change. Through this technique, values can be interpreted as hints, signals, or suggestions of the truth, but it is much harder to reason about individual rows.
Additionally, randomized response gives deniability of record content not dataset participation, so individual rows can be displayed.
In some cases, you may want several different masking policies applied to the same column through Otherwise policies. To build these policies, select everyone who instead of everyone or everyone except. After you specify who the masking policy applies to, select how it applies to everyone else in the Otherwise condition.
You can add and remove tags in Otherwise conditions for global policies (unlike local policy Otherwise conditions), as illustrated above; however, all tags or regular expressions included in the initial everyone who rule must be included in an everyone or everyone except rule in the additional clauses.
Feature limitations
Masking struct and array columns is only available for Databricks data sources.
Immuta only supports Parquet and Delta table types.
Spark supports a class of data types called complex types, which can represent multiple data values in a single column. Immuta supports masking fields within array and struct columns:
array: an ordered collection of elements
struct: a collection of elements that are primitive or complex types
Without this feature enabled, the struct and array columns of a data source default to jsonb
in the Data Dictionary, and the masking policies that users can apply to jsonb
columns are limited. For example, if a user wanted to mask PII inside the column patient
in the image below, they would have to apply null masking to the entire column or use a custom function instead of just masking name
or address
.
After a global or local policy masks the columns containing PII, users who do not meet the exception specified in the policy will see these values masked:
Note: Immuta uses the >
delimiter to indicate that a field is nested instead of the .
delimiter, since field and column names could include .
.
Caveats
Struct columns with many fields
If users have struct columns with many fields, they will need to either
create the data source against a cluster running Spark 3 or
add spark.debug.maxToStringFields 1000
to their Spark 2 cluster's configuration.
To get column information about a data source, Immuta executes a DESCRIBE
call for the table. In this call, Spark returns a simple string representation of the schema for each column in the table. For the patient
column above, the simple string would look like this:
struct<name:string,ssn:string,age:int,address:struct<city:string,state:string,zipCode:string,street:text>>
Immuta then parses this string into the following format for the data source's dictionary:
However, if the struct contains more than 25 fields, Spark truncates the string, causing the parser to fail and fall back to jsonb
. Immuta will attempt to avoid this failure by increasing the number of fields allowed in the server-side property setting, maxToStringFields
; however, this only works with clusters on a Spark 3 runtime. The maxToStringFields
configuration in Spark 2 cannot be set through the ODBC driver and can only be set through the Spark configuration on the cluster with spark.debug.maxToStringFields 1000
on cluster startup.
These policies hide entire rows or objects of data based on the policy being enforced; some of these policies require the data to be tagged as well.
These policies match a user attribute with a row/object/file attribute to determine if that row/object/file should be visible. This process uses a direct string match, so the user attribute would have to match exactly the data attribute in order to see that row of data.
For example, to restrict access to insurance claims data to the state for which the user's home office is located, you could build a policy such as this:
Only show rows where user possesses an attribute in
Office Location
that matches the value in the columnState
for everyone except when user is a member of groupLegal
.
In this case, the Office Location
is retrieved by the identity management system as a user attribute or group. If the user's attribute (Office Location
) was Missouri
, rows containing the value Missouri
in the State
column in the data source would be the only rows visible to that user.
This policy can be thought of as a table "view" created automatically for the user based on the condition of the policy. For example, in the policy below, users who are not members of the Admins
group will only see taxi rides where passenger_count < 2
.
Only show rows where
public.us.taxis.passenger_count <2
for everyone except when user is a member of group Admins.
WHERE clause policy requirement
All columns referenced in the policy must have fully qualified names. Any column names that are unqualified (just the column name) will default to a column of the data source the policy is being applied to (if one matches the name).
These policies restrict access to rows/objects/files that fall within the time restrictions set in the policy. If a data source has time-based restriction policies, queries run against the data source by a user will only return rows/blobs with a date in its event-time
column/attribute from within a certain range.
The time window is based on the event time you select when creating the data source. This value will come from a date/time column in relational sources.
These policies return a limited percentage of the data, which is randomly sampled, at query time. but it is the same sample for all the users. For example, you could limit certain users to only 10% of the data. Immuta uses a hashing policy to return approximately 10% of the data, and the data returned will always be the same; however, the exact number of rows exposed depends on the distribution of high cardinality columns in the database and the hashing type available. Additionally, Immuta will adjust the data exposed when new rows are added or removed.
Best practice: row count
Immuta recommends you use a table with over 1,000 rows for the best results when using a data minimization policy.
Public preview: This feature is currently in public preview and available to all accounts.
If a global masking policy applies to a column, you can still use that masked column in a global row-level policy.
Consider the following policy examples:
Masking policy: Mask values in columns tagged Country
for everyone except users in group Admin
.
Row-level policy: Only show rows where user possesses an attribute in OfficeLocation
that matches the value in column tagged Country
for everyone.
Both of these policies use the Country
tag to restrict access. Therefore, the masking policy and the row-level policy would apply to data source columns with the tag Country
for users who are not in the Admin
group.
Limitations
When this policy is activated by a governor, it will automatically be enforced on data sources that have the New
tag applied to them.
Write access is controlled through workspaces and scratch paths
View-based integrations are read-only
View-based integrations are read-only
View-based integrations are read-only
Building a cell masking policy is done in the same manner as . The primary difference is when selecting who the policy should apply to, a where clause is injected.
It is recommended that when referencing columns in custom SQL that you not use the physical column name as shown in the example above. Instead use the function. This will allow you to target the policy on any table with a country_of_residence
column no matter how that column is spelled on the physical table. For example, you would change the policy to the following example:
See the for an outline of masking policies supported by each integration.
For all policies except , inclusionary logic allows governors to vary policy actions with an Otherwise clause.
Purposes help define the scope and use of data within a project and allow users to meet . Governors create and manage purposes and their sub-purposes, which project owners then add to their project(s) and use to drive Data Policies.
Purposes can be constructed as a hierarchy, meaning that purposes can contain nested sub-purposes, much like in Immuta. This design allows more flexibility in managing purpose-based restriction policies and transparency in the relationships among purposes.
Refer to the for a tutorial on purpose-based restrictions on data.
Hashed values are different across data sources, so you cannot join on hashed values unless you . Immuta prevents joins on hashed values to protect against link attacks where two data owners may have exposed data with the same masked column (a quasi-identifier), but their data combined by that masked value could result in a sensitive data leak.
The Databricks Unity Catalog integration uses Spark’s built in regex_replace
function. That Databricks function currently . Regex will not work on this platform unless these settings are appropriately configured.
This option masks the values using hashing, but allows users to submit an to users who meet the exceptions of the policy.
This option also allows users to submit an to users who meet the exceptions of the policy.
Note: When building conditional masking policies with custom SQL statements, avoid using a column that is masked using in the SQL statement, as this can lead to different behavior depending on your data platform and may produce results that are unexpected.
To ensure this process does not violate your organization's data localization regulations, you need to first before you can use it in your Immuta tenant.
To ensure this process does not violate your organization's data localization regulations, you need to first before you can use it in your Immuta tenant.
For example, if an analyst wanted to publish data from a health survey she conducted, she could remove direct identifiers and apply to indirect identifiers to make it difficult to single out individuals. However, consider these survey participants, a cohort of male welders who share the same zip code:
For string data, the random number generator essentially flips a biased coin. If the coin comes up as tails, which it does with the frequency of the replacement rate , then the value is changed to any other possible value in the column, selected uniformly at random from among those values. If the coin comes up as heads, the true value is released.
After Complex Data Types is enabled on the , the column type for struct columns for new data sources will display as struct
in the Data Dictionary. (For data sources that are already in Immuta, users can edit the data source and change the column types for the appropriate columns from jsonb
to struct
.) Once struct fields are available, they can be searched, tagged, and used in masking policies. For example, a user could tag name
, ssn
, and street
as PII instead of the entire patient
column.
Note: When building row-level policies with custom SQL statements, avoid using a column that is masked using in the SQL statement, as this can lead to different behavior depending on whether you’re using the Spark or Snowflake and may produce results that are unexpected.
You can put any valid SQL in the policy. See the for a list of custom functions.
This feature is only available for and integrations.
This feature is only supported for , not local data policies.
This policy pairs with to mask newly added columns to data sources until data owners review and approve these changes from the requests tab of their profile page.
To learn how to activate this policy, navigate to the .
1
Attribute Name
String
Required
The name of the attribute to retrieve values for
2
Column Name/SQL Expression
String
Required
The column that contains the value to match the attribute key against
3
Placeholder
String
Optional
A placeholder in case the list of values is empty
1
Tag Name
String
Required
The name of the tag
1
Column Name/SQL Expression
String
Required
The column that contains the value to match the group against
2
Placeholder
String
Optional
A placeholder in case the list of values is empty
1
Attribute Name
String
Required
The name of the attribute
2
Attribute Value
String
Required
The value to correspond with the attribute name
1
Group names
Array (String)
Required
A list of group names, e.g. groups('group_a', 'group_b', 'group_c')
1
Purpose
String
Required
The name of the purpose to check the user against
1
Column Name/SQL Expression
String/Expression
Required
The column that contains the value to match the purpose against
2
Placeholder
String
Optional
A placeholder in case the list of values is empty
Female
Ohio
Female
Florida
Female
Florida
Female
Arkansas
Male
Florida
Null
Null
Female
Florida
Female
Florida
Null
Null
Null
Null
Female
Null
Female
Florida
Female
Florida
Female
Null
Null
Florida
...
...
...
...
...
880d0096
75002
Male
Welder
Y
f267334b
75002
Male
Welder
Y
bfdb43db
75002
Male
Welder
Y
260930ce
75002
Male
Welder
Y
046dc7fb
75002
Male
Welder
Y
...
...
...
...
...
Requirement: CREATE_PROJECT
Immuta permission
Navigate to the Projects tab under Data in the sidebar, and click the New Projects button.
Fill out the Basic Information:
Enter a name for your project in the Project Name field.
Opt to complete the Project Description field to help identify your project.
Opt to enter project Documentation to provide context for members.
Select the purposes and any policy adjustments:
Choose to select a purpose from the list of purposes or create a new purpose for the project.
To create a new purpose, click Create Purpose and complete the prompts in the modal.
Note that all purposes added to a project will need to be created or approved by a user with the GOVERNANCE
or PROJECT_MANAGEMENT
permission. Once purposes have been applied to a project, only these users can add data sources to the project.
Add a native workspace configuration: Select your workspace configuration from the Workspace Configuration dropdown menu: Databricks or Snowflake.
Databricks: Opt to edit the sub-directory in the Workspace Directory field (this sub-directory auto-populates as the project name) and enter the Workspace Database Name.
Snowflake: Name the Workspace Schema. By default, the schema name is based off of the project name, but you can change it here. Your project workspace will exist within this schema under Snowflake under the database configured by the application admin.
Use the dropdown menu to select the Hostname. Projects can only be configured to use one Snowflake host.
Select one or more Warehouses to be available to project members when they are working in the native workspace.
Add data sources to the project using the dropdown menu. Data sources can also be added after the project is created.
Click Affirm and Create.
Project naming convention
Use a naming convention for projects that reflects the naming convention for databases. (e.g., If the project in Dev is called: “my_project” name the project “dev_my_project.") The data will end up in the project database prefix, so you can trace the source and make edits upstream in that project as necessary.
Projects are private by default but can be made public and shared with other users by changing the subscription policies setting. Governors are the only users who can manage subscription policies for projects with purposes.
In the project, click the Policies tab.
Click Edit Subscription Policy.
Select the group of users who will have access:
Allow anyone: Selecting this option makes the project visible to everyone. Opt to require manual subscription by selecting the checkbox. This will require the users to manually subscribe to the project to gain access.
Allow anyone who asks (and is approved): Selecting this option makes the project visible in search results, but users must request access and be granted permission. This restriction supports multiple approving parties, so project owners can allow more than one approver or users with specified permission types to approve other users who request access to the project.
Click Anyone or An individual selected by user from the first dropdown menu.
Note: If you choose An individual selected by user, when users request access to a project they will be prompted to identify an approver with the permission specified in the policy.
Select the USER_ADMIN, GOVERNANCE, or AUDIT permission from the subsequent dropdown menu. You can add more than one approving party by selecting + Add Another Approver.
Allow users with specific groups/attributes: Selecting this option allows users with the specified groups and attributes to join the project.
Choose whether to build the policy off user groups or user attributes:
is a member of group: Type the group name and select the group.
possesses attribute: Type the attribute and select it. Then select the value from the dropdown menu.
Opt to + Add Another Condition. When adding another condition, choose how the conditions will be required. If you select or, only one of the conditions must apply to a user for them to subscribe to the project. If you select and, all of the conditions must apply.
Opt to allow users who do not meet the restrictions defined in the policy to still be able to discover the project by selecting the Allow Project Discovery checkbox.
Once saved, users with the proper authorizations will be automatically subscribed. Opt to require users to manually subscribe to the project by selecting the Require Manual Subscription checkbox.
Allow individually selected users: Selecting this option hides the project from the search results. Project owners must manually add and remove users, and the Private label will appear next to the project name.
Click Save to finish your policy.
In the project, click the Members tab.
Click the Add Members button.
Start typing a user's or group's name in the Add Members modal and select it from the dropdown that appears.
Opt to add an expiration to the subscription by entering the number of days until the access will expire.
Select the role.
Click Add.
Current project members will receive notifications that new users have been added to the project. A similar entry will be posted to the project's activity pane.
When building a global data policy, governors can create custom certifications, which must then be acknowledged by data owners when the policy is applied to data sources.
For example, data governors could add a custom certification that states that data owners must verify that tags have been added correctly to their data sources before certifying the policy.
When a global data policy with a custom certification is cloned, the certification is also cloned. If the user who clones the policy and custom certification is not a governor, the policy will only be applied to data sources that user owns.
Determine your policy scope:
Global policy: Click the Policies page icon in the left sidebar and select the Data Policies tab. Click Add Policy and enter a name for your policy.
Local policy: Navigate to a specific data source and click the Policies tab. Scroll to the Data Policies section and click Add Policy.
Select Minimize data source from the first dropdown.
Complete the enter percentage field to limit the amount of data returned at query-time.
Select for everyone except from the next dropdown menu to continue the condition. Additional options include for everyone and for everyone who.
Use the next field to choose the attribute, group, or purpose that you will match values against.
Notes:
If you choose for everyone who as a condition, complete the Otherwise clause before continuing to the next step.
You can add more than one condition by selecting + Add Another Condition. The dropdown menu in the far right of the Policy Builder contains conjunctions for your policy. If you select or, only one of your conditions must apply to a user for them to see the data. If you select and, all of the conditions must apply.
Opt to complete the Enter Rationale for Policy (Optional), and then click Add.
For global policies: Click the dropdown menu beneath Where should this policy be applied, and select On all data sources, On data sources, or When selected by data owners. If you select On data sources, finish the condition in one of the following ways:
tagged: Select this option and then search for tags in the subsequent dropdown menu.
with columns tagged: Select this option and then search for tags in the subsequent dropdown menu.
with column names spelled like: Select this option, and then enter a regex and choose a modifier in the subsequent fields.
in server: Select this option and then choose a server from the subsequent dropdown menu to apply the policy to data sources that share this connection string.
created between: Select this option and then choose a start date and an end date in the subsequent dropdown menus.
Click Create Policy. If creating a global policy, you then need to click Activate Policy or Stage Policy.
Projects combine users and data sources under a common purpose, which can then be used to restrict access to data and streamline collaboration.
There are three main use cases Immuta projects can help with:
For any of these use cases, project workspaces can be created to allow users to write data to the project.
This section includes conceptual, how-to, and reference guides for using projects to enforce purpose-based access controls on your data.
This section includes conceptual, how-to, and reference guides that explain how equalized entitlements work in Immuta and how you can use them to effectively collaborate across your department without risk of data leaks.
This section includes conceptual and how-to guides that explain how to effectively use masked joins for your business use case.
This section includes conceptual, how-to, and reference guides for writing data to projects and sharing that data with other Immuta users with proper access controls enforced.
Purpose-based access control
Purpose-based access control makes access decisions based on the purpose for which a given user or tool intends to use the data. This method of data access also provides flexibility for you to override policies and grant access to unmasked data to an individual for a very specific reason. Immuta recommends using purposes to create exceptions to global data policies.
There is some up-front work that needs to occur to make this possible.
A user with the GOVERNANCE
Immuta permission creates legitimate purposes for access to different data types unmasked. As part of creating the purposes, they may want to alter the acknowledgement statement the user must agree to when acting under that purpose.
A data owner or governor updates the masking or row-level policies to include those purposes as exceptions to the policy.
Users create a project and connect the project to both the policy and the purpose by
adding data sources with the policies they want users to be excluded from and
adding the purposes to the project
However, that project does nothing until the purpose is approved by a user with the PROJECT_MANAGEMENT
Immuta permission.
Once that approval is complete, the user wanting the exception must acknowledge they will only use the data for that purpose.
Using the Immuta UI, the user switches to that project context. Once switched to that project, the approved exceptions occur for the user.
These exceptions can be made temporary by deleting the project once access is no longer needed or un-approving the purpose for the project after the need for access is gone.
Requirement: GOVERNANCE
or PROJECT_MANAGEMENT
Immuta permission
Click Governance and select Purposes in the navigation menu.
Click + Add Purpose.
Complete the Purpose Name field, and opt to customize the acknowledgement statement or add a description.
Click Create.
Click Governance and select Purposes in the navigation menu.
Open the dropdown menu and click View or Edit in the actions column of the purpose you want to add sub-purposes to.
Click Add Sub-Purposes.
Enter a name in the Enter nested purpose field in the Sub-Purpose Builder.
Click the arrow to the right of the purpose or sub-purpose(s) to continue adding nested purpose fields.
Click Save.
A list of sub-purposes will populate at the bottom of the page. You can manage these sub-purposes by clicking Edit in the Actions column at any time.
Click Data in the navigation menu and select Projects.
Go to the project overview page and click Add Purposes.
Select the purpose from the dropdown menu or click Create Purpose.
Complete the prompts in the modal and then click Save.
For the purpose to go into effect, it must be approved by a user with the GOVERNANCE
or PROJECT_MANAGEMENT
Immuta permission.
Click Governance and select Purposes in the navigation menu.
Open the dropdown menu and click Edit in the Actions column of the purpose you would like to customize.
Click Edit above the acknowledgement statement, customize the text, and then click Confirm.
The page displays the updated statement, which now will be used by all projects and purposes. The updated statement will also be used by any new members joining existing projects containing purposes with default statements.
By default, sub-purposes will inherit the acknowledgment statements of their parent purposes. However, you can customize the acknowledgement statement for an individual sub-purpose as well by following the process above.
Navigate to your user profile page and click the Requests tab.
Approve or deny the purpose request in the actions column.
Click Governance and select Purposes in the navigation menu.
Click the more actions icon in the Actions column of the purpose or sub-purpose you want to delete.
Select Delete, and then click Confirm to approve the deletion.
Any project that contained the deleted purpose will be treated as no longer compliant, so the project overview page will display a violation
tag. Project owners can remediate the violation by removing the purpose from the project.
Purpose-based access control is a method of data access control that makes access decisions based on the reason a user or tool intends to use the data, which provides flexibility data governance teams need to build a high-powered, granular access control model. Furthermore, most regulations (like GDPR and HIPAA) include purpose clauses that require sensitive data only be collected and used for precise reasons.
For example, the GDPR’s Purpose Limitation Principle states that “Personal data should only be collected and processed for a legitimate specific purpose.” Furthermore, the regulation claims that the specific purpose “should be expressed in an unambiguous, transparent, and simple manner” in order to be compliant. The goal of this clause is to ensure that sensitive information is not being unnecessarily collected, stored, and exposed to risk by organizations that use it. With purpose-based access control, organizations can exhibit the granular, purpose-based control over data access that ensures compliance with these standards.
Immuta projects allow you to connect purposes to data sources and users to enforce purpose-based access controls on your data.
This getting started guide outlines how to quickly implement purpose-based access controls for your business use case using Immuta projects, purposes, and global data policies.
The how-to guides in this section illustrate how to create and manage projects and purposes.
Create a project: Create a project to group data sources and users.
Create a purpose: Create a purpose for your project to enforce access restrictions on the project data sources.
Adjust a policy: Redistribute the noise of k-anonymization across multiple columns of a data source within a project to make specific columns more useful for analysis.
Project management: This section guides project owners, governors, and project managers through managing project settings, data sources, and members.
Projects and purposes: Immuta projects allow you to connect purposes to data sources and users to enforce purpose-based access controls on your data.
Policy adjustments: Project owners can use policy adjustments to increase a data set's utility while retaining the amount of k-anonymization that upholds de-identification requirements. With this feature enabled, users can redistribute the noise across multiple columns of a data source within a project to make specific columns more useful for their analysis.
Why use purposes?: This explanatory guide contains a conceptual overview of purposes in Immuta and the business value achieved by using purpose-based access controls.
Requirement: You must own the project
Navigate to the project.
Click the Policies tab.
In the Project Equalization section, click the toggle button on the far right to On.
Navigate to the project Overview tab.
Click the Add Purposes button in the center of the page.
Select the desired purpose(s) from the dropdown menu. The project must have one purpose with noise reduction for policy adjustments to function. Noise reduction is indicated by the policy adjustment amount highlighted in gray. If there is no policy adjustment shown, it is the default, none.
Click Save.
The purpose will be staged until a user with the GOVERNANCE
or PROJECT_MANAGEMENT
permission approves and activates it. If you have one of these permissions, the staging will be skipped.
After the purpose has been approved, click I Agree to agree to the terms of the purpose.
Note: All members of the project must agree to the terms of the purpose; if they decline, they will be removed from the project.
Navigate to the Policy Adjustment tab.
Select the data source from the dropdown menu. You will receive a No Adjustments Available message if there are no columns in the data sources that are associated with adjustable policies.
Select the columns from the dropdown menu.
In the priority ranking window, give weight from most to least important columns. The higher the weight the more usability will be provided, while still providing de-identification through the other columns.
Opt to select Keep Fields in the Clear for specific columns.
After assigning the weight and ensuring that the remaining weight is zero, click the Adjust button.
Check that the percent NULL is appropriate for your usability. If you are content with the percent NULL, move on to the next step; if you are not satisfied with the percent NULL, repeat the previous steps until satisfied.
Click Apply after you have made the acceptable adjustments.
After you ensure that your data source has two columns with k-anonymization policies applied,
Navigate to your project and click the Overview tab.
Click Add Purposes and select a purpose with a Noise Reduction and Fields in Clear tags, or create a new purpose with Noise Reduction and Fields in Clear enabled.
Click Save, and then click I Agree.
Navigate to the Policy Adjustment tab.
Select the data source(s) and columns from the subsequent dropdown menus.
Select the Keep in the Clear checkbox next to columns to be kept in the clear.
Adjust the weight for the remaining columns. The Remaining Weight must equal 0.
Click Adjust and then Apply.
The values for the fields you selected will now be visible to users acting under the project.
Public preview
This feature is in public preview. It is available to all customers and can be enabled on the Immuta app settings page.
Project owners can use policy adjustments to increase a data set's utility while retaining the amount of k-anonymization that upholds de-identification requirements. With this feature enabled, users can redistribute the noise across multiple columns of a data source within a project to make specific columns more useful for their analysis. Since these adjustments only occur within the project and do not change the individual data policies, data users must be acting under the project to see the adjustments in the data source.
For example, a policy might mask these data source columns with k-anonymization: Income
, Education
, EmploymentStatus
, Gender
, and Location Code
. When the analyst examines the data, the percent NULL has been predetermined by Immuta with an equal weight across all of these columns. However, if the analyst's work hinges on the EmploymentStatus
column, the project owner can adjust the weights on the policy adjustment tab in the project to make the necessary data (EmploymentStatus
) less NULL.
For columns that are already well-disclosed (meaning they already have a low percent null), the same percent null will display even when you drastically change the weight distribution.
Increasing the weight of a column that is already well-disclosed will not change the outcome. Generally, the biggest impact will be seen when you increase the weights of the largest percent null column. (The only exception to this is if that column already has a lot of native nulls in the remote database.)
This feature provides an option to allow fields in the clear when creating a purpose, permitting specified analysts to bypass k-anonymization in specific circumstances.
When any purpose with the allow fields in the clear property enabled is approved for use within a project, a project member can proceed through the policy adjustment workflow and specify columns to be unmasked.
In today’s world of modern privacy regulations, deciding what a single user can see is not just about who they are, but what they are doing. For example, the same user may not be able to see credit card information normally, but if they are doing fraud detection work, they are allowed to.
This may sound silly because it’s the same person doing the analysis, so why should we make this distinction? This gets into a larger discussion about controls. When most think about controls, we think about data controls - how do we hide enough information (hide rows, mask columns) to lower our risk. There’s a second control called contextual controls; what this amounts to is having a user agree they will only use data for a certain purpose and not step beyond that purpose. Combining contextual controls with data controls is the most effective way to reduce your overall risk.
In addition to the data controls you’ve seen, Immuta is also able to enforce contextual controls through what we term purposes. You are able to assign exceptions to policies, and those exceptions can be the purpose of the analysis in addition to who the user is (and what attributes they have). This is done through Immuta projects; projects contain data sources, have members, and are also protected by policy, but most importantly, projects can also have a purpose which can act as an exception to data policy. Projects can also be a self-service mechanism for users to access data for predetermined purposes without having to involve humans for ad hoc approvals.
The goal of Immuta is to modernize the management of data policies in organizations. One key aspect of modernization is to remove day-to-day human involvement in policy decision making, which is fragile and subjective. One decision process is how and when to make exceptions to policies.
This decision process could be something like: “because Morgan is analyzing employee attrition and retention, they should be able to see employee satisfaction survey data.” Notice it is not “because Morgan is HR, they should be able to see employee satisfaction survey data.” It’s not who Morgan is, but what Morgan is doing that should allow them heightened access. The purpose for data access drives the policy decision and can be approved objectively because you are not approving who can see data, but what can be done with it. However, if Morgan has to ask permission every time there is a new survey, this becomes a subjective and time-consuming process for the organization.
This is where purposes and purpose-based access control can help. Purposes allow you to define exceptions to rules as the purpose for which the user is acting. The key point here is you can define these purposes ahead of time, before any user actually tries to get an exception. Immuta projects are valuable not just for purpose-based access control, but because of the documentation trail they provide, the collaboration they allow, and the data access process they automate.
Additionally, to ensure that purposes are being used correctly and applied accurately, project purposes must be approved before the purpose is active in a project. That approver will see the lists of tables, the project members, and project documentation and decide if they want to approve it or not. This approval is recorded by Immuta, creating a documentation trail. After a purpose is approved within a project, any changes to the members or data sources will require the purpose to be re-approved, ensuring that the project continues to be compliant to an organization's requirements.
Purpose-based access control makes access decisions based on the purpose for which a given user or tool intends to use the data, which reduces risk and aligns with many privacy regulations, such as GDPR and CCPA. This method of data access also provides flexibility for you to override policies and grant access to unmasked data to an individual for a very specific reason, without time-consuming and ad hoc manual approvals.
Immuta recommends using purposes to create exceptions to global data policies.
See this Getting started guide for instructions on how to implement purpose-based access control for your organization.
Requirement: You must own the project or have the GOVERNANCE
or PROJECT_MANAGEMENT
Immuta permission
Navigate to the Project Overview tab.
Click + Add Purposes.
Create a new purpose in the modal or select purposes from the dropdown menu.
Click Save, and then click I Agree.
Navigate to the Project Overview tab.
Scroll to the purposes section and click Remove.
Click I Agree
See the Manage project equalization guide.
See the Enable masked joins guide.
Requirement: CREATE_PROJECT
Immuta permission (and you must own the project)
Click the Project Overview tab.
Click the Edit button in the Documentation section.
Document the details of your project in the text box that appears, and then click Save.
Requirement: CREATE_PROJECT
(and you must own the project), GOVERNANCE
, or PROJECT_MANAGEMENT
Immuta permission
Select a project and navigate to the Project Overview tab.
Scroll to the Tags section and click the Add Tags button.
Begin typing the tag name in the window that appears, and then select the tag from the dropdown menu. A list of chosen tags will populate at the bottom of this window.
After selecting all relevant tags, click the Add button.
Requirement: CREATE_PROJECT
(and you must own the project), GOVERNANCE
, or PROJECT_MANAGEMENT
Immuta permission
Navigate to the Project Overview tab.
Scroll to the Tags section and click on the tag that you want to remove to open its side sheet.
Click Remove.
Click Confirm to delete the tag.
Requirements:
CREATE_PROJECT
(and you must own the project), GOVERNANCE
, or PROJECT_MANAGEMENT
Immuta permission to disable or enable a project
CREATE_PROJECT
Immuta permission to delete a project
Click the Data icon and select Projects in the sidebar.
Select the My Projects tab.
Click the more options icon next to the project and select Disable.
Click the Data icon and select Projects in the sidebar.
Select the My Projects tab.
Click the more options icon next to the project and select Enable.
Deleting a project permanently removes it from Immuta. Projects must first be disabled before they can be deleted.
Click the Data icon and select Projects in the sidebar.
Select the My Projects tab.
Click the more options icon next to the disabled project and select Delete.
Click Confirm.
Let’s say Sally and Bob are working together on a project, and Sally has more data access than Bob. She can see PII and Bob cannot, but they both can see credit card numbers. Without Immuta, an admin would have to know what tables they intend to use, scrub all those tables of PII, and then give Sally and Bob access to those new tables in a place where they can safely work on the data and save any output they create.
With Immuta, this is a lot easier. Sally or Bob could create the project, add the tables as data sources, invite the other person to be a project member, and equalize it. The equalization will compare the members of the project to the data policies on the tables in the project and find the intersection: Sally and Bob both can see credit card numbers. That intersection becomes the equalization setting. Once the project is equalized in this example, Sally will lose access to PII, but both Sally and Bob will retain access to credit card numbers.
Now Sally and Bob want to do some transformation on the data and write it somewhere. This is where project workspaces come into play. Once workspaces are configured in a Snowflake or Databricks Spark integration, Immuta will create a schema in the native database dedicated to this project where Sally and Bob can write their output. Within this workspace
Only members of the project will have access to that schema to write to or read from. (For example, in Snowflake Immuta limits it to a particular role it creates in Snowflake.) This is important, because if someone who can’t see credit card numbers somehow had access to where Sally and Bob were writing, they would gain access to data (the credit card numbers) they shouldn’t see.
They will only be able to access the tables that are in the project. (This may be critical if someone approved the purpose for only those tables.)
When determining where you should give analysts WRITE access, you need to consider the entire universe of where they have READ access, and that universe is constantly changing. This is an impossible proposition for you to manage without Immuta projects.
Users at different levels of access can work together without help from an admin (to scrub the data).
Customers can avoid data leaks on analyst writes.
Project equalization
The same security restrictions regarding data sources are applied to projects: project members still need to be subscribed to data sources to access data, and only users with appropriate attributes and credentials can see the data if it contains any row-level or masking security.
However, project equalization improves collaboration by ensuring that the data in the project looks identical to all members, regardless of their level of access to data. When enabled, this feature automatically equalizes all permissions so that no project member has more access to data than the member with the least access. For a tutorial on enabling equalization, navigate to the Manage equalization guide. Note: Only project owners can add data sources to the project if this feature is enabled.
Once project equalization is enabled, the subscription policy for the project is locked and can only be adjusted by the project owner by changing the equalized entitlements. For users to access data sources within the project (and for the equalization to take effect), users must switch their context to the project.
This setting adjusts the minimum entitlements (i.e., users' groups and attributes) required to join the project and to access data within the project. When project equalization is enabled, equalized entitlements default to Immuta's recommended settings, but project owners can edit these settings by adding or removing parts of the entitlements. However, making these changes entails two potential disadvantages:
If you add entitlements, members might see more data as a whole, but at least some members of the project will be out of compliance. The status of users' compliance is visible from the members tab within the project.
If you remove entitlements, the project will be open to users with fewer privileges, but this change might make less data visible to all project members. Removing entitlements is only recommended if you foresee new users joining with less access to data than the current members.
This setting determines how often user credentials are validated, which is critical if users share data with project members outside of Immuta, as they need a way to verify that those members' permissions are still valid.
Once project equalization is enabled, the project subscription policy builder locks and can only be adjusted by manually editing the equalized entitlements. Then, the subscription policy will combine with the entitlement settings, depending on the policy type.
The way entitlements and approvals combine differs depending on the policy type; for clarity, the table below illustrates various scenarios for each type. Every row demonstrates how a specific project subscription policy changes after project equalization is enabled (when an equalized entitlement is set and when no entitlement is set) and how the policy reverts if project equalization is subsequently disabled.
Anyone
Allow user to subscribe when user is a member of group Accounting
Individual users you select
Individual users you select
Allow users to subscribe when approved by anyone with permission owner (of this project)
Allow users to subscribe when they satisfy all of the following: is a member of group Accounting and is approved by anyone with permission owner (of this project)
Allow users to subscribe when approved by anyone with permission owner (of this project)
Allow users to subscribe when approved by anyone with permission owner (of this project)
Allow users to subscribe to the project when user is a member of group Legal
Allow users to subscribe to the project when user is a member of group Accounting
Individual users you select
Individual users you select
Individual users you select
Allow users to subscribe to the project when user is a member of group Accounting
Individual users you select
Individual users you select
For example, consider the subscription policy of the following sample project, Fraud Prevention, before project equalization is enabled:
Fraud prevention
Subscription policy: Allow users to subscribe when approved by anyone with permission owner (of this project).
After enabling project equalization, the following equalized entitlement is recommended by Immuta: User is a member of group Claims and Billing Department.
In this particular example, the equalized subscription policy contains the equalized entitlement and the approval of the original policy, so users must satisfy both conditions to subscribe:
the user must be a member of the group Claims and Billing Department
the user must be approved by anyone with permission Owner (of this project).
Use project equalization so that all project members see the same data, and re-equalize projects if new members or data sources are added to the project.
Requirement: You must own the project
In the project, click the Policies tab.
In the Project Equalization section, click the toggle button to On.
Note: Only project owners can add data sources to the project if this feature is enabled.
Click Edit next to Equalized Entitlements.
In the Equalized Entitlements Builder, select either is a member of a group or possesses attribute from the user condition dropdown menu.
If you selected is a member of a group, select the appropriate group from the resulting dropdown.
If you selected possesses attribute, select the appropriate key and value from the subsequent dropdown menus.
Click Save.
Use the recommended equalized entitlements
Use Immuta's recommended equalized entitlements to protect your data in projects. Changing these entitlements creates two potential disadvantages:
If you add entitlements, members might see more data as a whole, but at least some members of the project will be out of compliance.
If you remove entitlements, the project will be open to users with fewer privileges, but this change might make less data visible to all project members. Removing entitlements is only recommended if you foresee new users joining with less access to data than the current members.
To view members' compliance status after changing the equalized entitlements,
Navigate to the Members tab from the Project Overview page.
Click the Not In Compliance text to view the details about the user's status.
Users who are not in compliance will be unable to view data sources within the project until the compliance issues are resolved.
To revert entitlements to those recommended by Immuta,
Click Edit next to Equalized Entitlements.
Click Use Recommended.
Click Confirm.
Update the validation frequency to specify how often users must log into Immuta to retain access to the project.
Click Edit in the Validation Frequency section.
Enter an integer in the first field of the Validation Frequency modal that appears.
Select Days or Hours in the next dropdown.
Click Save.
Navigate to the Policies tab.
In the Project Equalization section, click the toggle button to Off.
Click Yes, Turn Off in the confirmation window.
Requirements:
If project equalization or masked joins is enabled, you must own the project.
If a purpose has been added to and approved for the project, you must have the GOVERNANCE
or PROJECT_MANAGEMENT
Immuta permission.
Otherwise, you must be a project member.
Navigate to the Project Overview tab.
Click the Add Data Sources button.
Start typing the name of a data source you'd like to include in the project.
Select the data source from the list of auto-completed options in the dropdown menu.
Repeat this process to add additional data sources to the list. You can remove them using the more options icon.
Opt to re-equalize the project by clicking the toggle on.
When complete, click the Save button at the bottom of the list.
You can automatically add all data sources to a project that contain a Limit usage to purpose policy that matches the purpose of that project.
Select a Project, and click the Add Data Sources button.
Click Add By Purpose.
All data sources matching the project's purpose(s) will populate at the bottom of the dialog. Review this list, and then click Save.
Set your current project to be the one you want new data sources in.
Navigate to the Data Sources page.
Select the checkboxes for the data sources you want in a project.
Select the bulk actions more options icon in the top right corner.
Click Add To Current Project.
Project equalization improves collaboration by ensuring that the data in the project looks identical to all members, regardless of their level of access to data. When enabled, this feature automatically equalizes all permissions so that no project member has more access to data than the member with the least access.
Manage equalization: Enable project equalization and manage equalized entitlements to create a common level of access for all project members.
Equalized access: This guide describes the design and behavior of project equalization.
Why equalize access?: This explanatory guide offers an example use case for project equalization to highlight its business value.
Sometimes, accessing two tables separately doesn't violate compliance regulations, but accessing these two tables when they are joined creates a serious privacy problem. Immuta can help avoid creating these toxic combinations of data.
Tables are joined on a key, a column in one table that matches a column on the other table and allows the join. To prevent joining those tables, you would mask the keys so they no longer match one another. This is an important distinction: you do not make data anonymous only because it’s directly sensitive (a direct identifier) or because it’s indirectly sensitive (an indirect identifier) but potentially because it’s a join key that you may not want to be used for joining.
Because of this risk, Immuta uses a unique salt for hashing per table when masking a column to break referential integrity by default, making sure the masked values aren’t able to join. This means you can’t join on two masked columns unless you tell Immuta you want to allow that. To do so, you have to add those tables with the masked keys to a project and enable masked joins. When you enable masked joins in a project, Immuta uses a consistent salt across all data sources in that project, which returns referential integrity and allows joining.
Projects give you control over toxic data combinations.
While masked joins are allowed, only project owners can add data sources to the project. Additionally, masked columns can only be joined if they are masked using hashing.
In order to join mask columns across data sources, those data sources must be linked by a project.
Requirement: You must own the project
Create a project or select an existing project.
Navigate to the Overview tab.
Click the Allow Masked Joins toggle.
Click Confirm.
Masked joins allow masked columns to be joined within the context of a project.
Enable a masked joins guide: Enable masked joins for data sources within your project.
Why use masked joins?: This explanatory guide offers an example use case for implementing masked joins to highlight their business value.
After workspaces are configured, project owners can enable workspaces within their projects. This feature allows project members to write data to projects and share this data with other users as derived data sources.
Requirement: You must own the project
Prerequisites:
Databricks cluster configuration
Before creating a workspace, the cluster must send its configuration to Immuta; to do this, run a simple query on the cluster (i.e., show tables
). Otherwise, an error message will occur when you attempt to create a workspace.
Navigate to the Policies tab and enable Project Equalization by clicking the Project Equalization slider to on.
Scroll to the Native Workspace section and click Create.
Select Databricks from the Workspace Configuration dropdown menu.
Opt to edit the sub-directory in the Workspace Directory field; this sub-directory auto-populates as the project name.
Enter the Workspace Database Name.
Click Create to enable the workspace.
Databricks cluster configuration
Before creating a workspace, the cluster must send its configuration to Immuta; to do this, run a simple query on the cluster (i.e., show tables
). Otherwise, an error message will occur when you attempt to create a workspace.
Scroll to the Native Workspace section on the policies tab and click the toggle to disable the workspace.
Click Delete in the native workspace section.
Choose one of the following options in the modal:
Purge Generic Workspace Data: Permanently delete data, while the data used by derived data sources is preserved. Note: If you created a derived data source that references a view on top of a table in Snowflake that isn't a derived data source, that table will be deleted and break the derived data source.
Purge Everything & Delete Derived Data Sources: Permanently delete data and purge all derived data sources.
Click Delete.
Immuta projects combine users and data sources under a common purpose. Sometimes this purpose is for a single user to organize their data sources or to control an entire schema of data sources through a single projects screen; however, most often this is an Immuta purpose for which the data has been approved to be used and will restrict access to data and streamline team collaboration.
Project members: The users who will create, manage, and use projects.
Purposes: Purposes allow governors to define exceptions to policies based on how a user will use the data.
Data sources: Projects create an environment within Immuta where the user will only see the data sources within the project, despite their subscription to other data sources.
Subscription policy: Project users with the appropriate permissions can set a subscription policy on a project to control the users who can join.
Documentation: Project users can create documentation within the Immuta project page, allowing for an easy and consistent trail of communication.
Equalization: Project equalization ensures that the data in the project looks identical to all members, regardless of their level of access to data.
Workspaces: With equalization enabled, project users can create Snowflake or Databricks workspaces where users can view and write data.
Derived data sources: Derived data sources are the data sources that come from workspaces. Once they are created, they automatically inherit policies from their parent sources.
Masked joins: Within a project, users can join masked columns.
The features and capabilities of each user differ based on the user's role within the project and within Immuta. Roles and their capabilities are outlined below.
Project owner
Users with the CREATE_PROJECT
permission can
Project governor
Users with the GOVERNANCE
permission can
Project manager
Users with the PROJECT _MANAGEMENT
permission can
Project member
Once subscribed to a project, all users can
remove data sources they’ve added to the project
Best practice: purposes
Consider purposes as attributes. Attributes identify a user, and purposes identify why that user should have access.
Data governors and users with the PROJECT_MANAGEMENT
permission are responsible for configuring and approving project purposes and acknowledgement statements.
Purposes help define the scope and use of data within a project and allow users to meet purpose restrictions on policies. Governors create and manage purposes and their sub-purposes, which project owners then add to their project(s) and use to drive data policies. Project owners can also create a purpose; however, they remain in a staged state until a governor or project manager approves the purpose request.
Purposes can be constructed as a hierarchy, containing nested sub-purposes, much like tags in Immuta. This design allows more flexibility in managing purpose-based restriction policies and transparency in the relationships among purposes.
For example, if the purpose Research
included Marketing
, Product
, and Onboarding
as sub-purposes, a governor could write the following global policy:
Limit usage to purpose(s) Research for everyone on data sources tagged PHI.
This hierarchy allows you to create this as a single purpose instead of creating separate purposes, which must then each be added to policies as they evolve.
Now, any user acting under the purpose or sub-purpose of Research
- whether Research.Marketing
or Research.Onboarding
will meet the criteria of this policy. Consequently, purpose hierarchies eliminate the need for a governor to rewrite these global policies when sub-purposes are added or removed. If new projects with new Research purposes are added, the relevant global policy will automatically be enforced.
Projects with purposes require owners and subscribers to acknowledge that they will only use the data for those purposes by affirming or rejecting acknowledgement statements. Each purpose has its own acknowledgement statement, and a project with multiple purposes requires users to accept more than one acknowledgement statement. Immuta keeps a record of each project member's response to the acknowledgement statement(s) and records the purpose, the time of the acknowledgement, and the text of the acknowledgement. Purposes can use Immuta's default acknowledgement statement or one customized by a data governor.
If users accept the statement, they become a project member. If they reject the acknowledgement statement, they are denied access to the project.
When a user is working within the context of a project, they will only see the data in that project. This helps to prevent data leaks when users collaborate. Users can switch project contexts to access various data sources while acting under the appropriate purpose. By default, there will be no project selected, even if the user belongs to one or more projects in Immuta.
When users change project contexts, queries reflect users as acting under the purposes of that project, which may allow additional access to data if there are purpose restrictions on the data source(s). This process also allows organizations to track not just whether a specific data source is being used, but why.
Schema projects are a collection of data sources that Immuta creates when multiple data sources belong to the same schema. For information about schema projects, see the Schema project guide.
Click on the Data Source icon in the panel on the left.
By default, all data sources will be displayed.
Opt to filter the data sources by your access, the backing technology, or the health of the data sources using filters.
To view the overview details of a data source, click on the arrow icon for data source, or open the data source overview page by clicking the name.
Click the Get Access button from either the data sources list page or the data source overview tab, which can be accessed by clicking on the Data Source.
If prompted, fill out the set up by the system admin and click Request Access.
A notification will be sent to the data owners informing them of your request.
Once reviewed, you will receive a notification with a response indicating if your request was accepted or denied.
If accepted, the status displayed next to that data source will be updated to "Subscribed" and you will have access to the data source via your personal SQL connection. If not accepted, a reason will be provided in the notification details.
To request access to multiple data sources simultaneously,
Navigate to the data sources list page.
Select the checkboxes for the data sources you want to subscribe to.
Select More Actions.
Click Request Access.
If prompted, fill out the set up by the system admin and click Request Access.
A notification will be sent to the data owners informing them of your request.
Once reviewed, you will receive a notification with a response indicating if your request was accepted or denied.
If accepted, the status displayed next to that data source will be updated to "Subscribed" and you will have access to the data source via your personal SQL connection. If not accepted, a reason will be provided in the notification details.
If you no longer need access to a data source, click Unsubscribe in the upper right corner of the Data Source Overview tab.
If a data source health check fails and needs to be re-generated,
Click the health indicator in the upper right-hand corner.
Select Re-run on the job you want to run.
Note: To generate a fingerprint, the row count must be up-to-date.
To view the data dictionary,
Select the data source from the data source list page.
Navigate to the Data Dictionary tab.
The data dictionary will display and include the column’s name, type, and tags. Masked columns will display a symbol next to their names.
Contact information for data owners is provided for each data source, which allows other users to ask them questions about accessibility and attributes required for viewing the data.
To view this contact information, click the Contacts tab.
Deprecation notice
Support for this feature has been deprecated.
To request to unmask a value in a data source,
Navigate to the Data Source Overview tab, click the dropdown menu in the top right corner, and select Make Unmasking Request.
In the modal that appears, select the column from the first dropdown menu, and then complete the Values to Unmask and the Reason for Unmasking fields.
Select the user to unmask the value from the final dropdown menu, and then click Submit.
A Tasks tab will then appear for your data source that details the task type and the status of your request. You may also view task information or delete the task from this page.
To view information about a task,
Navigate to the Tasks tab from the Data Source Overview page.
Click the Task Info icon in the Actions column of the relevant task.
To delete a task,
Navigate to the Tasks tab from the Data Source Overview page.
Click Delete Task.
Project workspaces
With , project users can create or where users can view and write data. Then, those users can create to share this data with other users.
Combining Immuta projects and Snowflake workspaces allows users to access and write data directly in Snowflake.
With Snowflake workspaces, Immuta enforces policy logic on registered tables and represents them as secure views in Snowflake. Since secure views are static, creating a secure view for every unique user in your organization for every table in your organization would result in secure view bloat; however, Immuta addresses this problem by virtually grouping users and tables and equalizing users to the same level of access, ensuring that all members of the project see the same view of the data. Consequently, all members share one secure view.
While interacting directly with Snowflake secure views in these workspaces, users can write within Snowflake and create , all the while collaborating with other project members at a common access level. Because these derived data sources will inherit all of the appropriate policies, that data can then be shared outside the project. Additionally, derived data sources use the credentials of the Immuta system Snowflake account, which will allow them to persist after a workspace is disconnected.
Snowflake workspaces can be used on their own or with the Snowflake integration.
Immuta enforces policy logic on data and represents it as in Snowflake. Because projects group users and tables and to the same level of access, all members will see the same view of the data and, consequently, will only need one secure view. Changes to policies immediately propagate to relevant secure views.
An Immuta user with the CREATE_PROJECT
permission with Snowflake data sources.
The Immuta project owner which balances every project members’ access to the data to be the same.
The Immuta project owner which automatically generates a subfolder in the root path specified by the application admin and remote database associated with the project.
Project members can access data sources within the project and use WRITE to create derived tables. To ensure equalization, users will only see data sources within their project as long as they are working in the Snowflake Context.
The CREATE_DATA_SOURCE_IN_PROJECT
permission is given to specific users so they can ; the derived tables will inherit the policies, and then the data can be shared outside the project.
If a project member leaves a project or a project is deleted, that Snowflake Context will be removed from the user's Snowflake account.
Immuta only supports a single root location, so all projects will write to a subdirectory under this single root location.
If an administrator changes the default directory, the Immuta user must have full access to that directory. Once any workspace is created, this directory can no longer be modified.
roles in Snowflake: IMMUTA_[project name]
schemas in the Snowflake IMMUTA database: [project name]
secure views in the project schema for any table in the project
To switch projects, users have to change their Snowflake Session Context to the appropriate Immuta project. If users are not entitled to a data source contained by the project, they will not be able to access the Context in Snowflake until they have access to all tables in the project. If changes are made to a user's attributes and access level, the changes will immediately propagate to the Snowflake Context.
Because users access data only through secure views in Snowflake, it significantly decreases the amount of role management for administrators in Snowflake. Organizations should also consider having a user in Snowflake who is able to create databases and make GRANTs on those databases and having separate users who are able to read and write from those tables.
Few roles to manage in Snowflake; that complexity is pushed to Immuta, which is designed to simplify it.
A small set of users has direct access to raw tables; most users go through secure views only, but raw database access can be segmented across departments.
Policies are built by the individual database administrators within Immuta and are managed in a single location, and changes to policies are automatically propagated across thousands of tables’ secure views.
Self-service access to data based on data policies.
Users work in various contexts in Snowflake natively, based on their collaborators and their purpose, without fear of leaking data.
All policies are enforced natively in Snowflake without performance impact.
Security is maintained through Snowflake primitives (roles and secure views).
Performance and scalability is maintained (no proxy).
Policies can be driven by metadata, allowing massive scale policy enforcement with only a small set of actual policies.
Derived tables can be shared back out through Immuta, improving collaboration.
User access and removal are immediately reflected in secure views.
Users will only be able to access the directory and database created for the workspace when acting under the project. The Immuta Spark SQL Session will apply policies to the data, so any data written to the workspace will already be compliant with the restrictions of the equalized project, where all members see data at the same level of access. When users are ready to write data to the project, they should use the SparkSQL session to copy data into the workspace.
The Immuta project members query equalized data within the context of the project, collaborate, and write data, all within Databricks.
Immuta only supports a single root location, so all projects will write to a subdirectory under this single root location.
If an administrator changes the default directory, the Immuta user must have full access to that directory. Once any workspace is created, this directory can no longer be modified.
Administrators can place a configuration value in the cluster configuration (core-site.xml
) to mark that cluster as unavailable for use as a workspace.
When acting in the workspace project, users can read data using calls like spark.read.parquet("immuta:///some/path/to/a/workspace")
.
To write delta lake data to a workspace and then expose that delta table as a data source in Immuta, you must specify a table when creating the derived data source (rather than a directory) in the workspace for the data source.
Immuta currently supports the gs
schema for Google Cloud Platform. The primary difference between Databricks on Google Cloud Platform and Databricks on AWS or Azure is that it is deployed to Google Kubernetes Engine. Databricks handles automatically provisioning and auto scaling drivers and executors to pods on Google Kubernetes Engine, so Google Cloud Platform admin users can view and monitor the Google Kubernetes resources in the Google Cloud Platform.
Stage Immuta installation artifacts in Google Storage, not DBFS: The DBFS FUSE mount is unavailable, and the IMMUTA_SPARK_DATABRICKS_DBFS_MOUNT_ENABLED
property cannot be set to true
to expose the DBFS FUSE mount.
Stage the Immuta init script in Google Storage: Init scripts in DBFS are not supported.
Stage third party libraries in DBFS: Installing libraries from Google Storage is not supported.
Maven library installation is only supported in Databricks Runtime 8.1+.
/databricks/spark/conf/spark-env.sh
is mounted as read-only:
Set sensitive Immuta configuration values directly in immuta_conf.xml
: Do not use environment variables to set sensitive Immuta properties. Immuta is unable to edit the spark-env.sh
file because it is read-only; therefore, remove environment variables and keep them from being visible to end users.
Use /immuta-scratch
directly: The IMMUTA_LOCAL_SCRATCH_DIR
property is unavailable.
Allow the Kubernetes resource to spin down before submitting another job: Job clusters with init scripts fail on subsequent runs.
The DBFS CLI is unavailable: Other non-DBFS Databricks CLI functions will still work as expected.
To write data to a table in Databricks through an Immuta workspace, use one of the following supported provider types for your table format:
avro
csv
delta
orc
parquet
Deprecation notice: Support for this feature has been deprecated.
A derived data source is a data source that is created within an equalized project and contains data from its parent sources. Consequently, when the derived data source is created, it will inherit the data policies from its parent data sources to keep the data secure.
Policy inheritance for derived data sources is a feature unique to the environment that an equalized project creates. Within the equalized project, every user sees the same data and work can be shared and collaborated on without any risk of a user viewing more than they should. When a derived data source is created, it inherits the data policies from its parent sources and a subscription policy is created from the equalized entitlements on the project, allowing project members to safely share secure data.
Consider these data sources, within an equalized Project 1, that each contain subscription and data policies:
Data source A
Subscription policy: Allow users to subscribe to the data source when user is a member of group Medical Claims
Data policies:
Mask by making null the value in the column(s) city except for members of group Legal
Mask by making null the value in the column(s) gender for everyone
Data source B
Subscription policy: Allow users to subscribe to the data source when user is approved by anyone with permission owner and anyone with permission governance
Data policy: Limit usage to purpose(s) Research for everyone
If a user creates a derived data source, Data Source C, from these two data sources, Data Source C will inherit these policies, which will be unchangeable:
Data source C
Subscription policy: Allow user to subscribe when they satisfy all of the following:
is a member of group Legal and is a member of group Medical Claims
is approved by anyone with permission owner (of data source B) and anyone with permission governance
Data policy: Limit usage to purpose(s) Research for everyone
Derived data sources inherit policies from parent sources
Sensitive data discovery applies Discovered tags to derived data sources; however, because they inherit policies from their parent sources, the global policies that contain these tags will not apply to derived data sources.
Notice that one of the data policies in Data Source A, mask by making null the value in the column(s) gender for everyone, is not included in data source C. This is because the creator could not have seen the values in the parent sources; therefore, there are no values in the derived data source to be masked.
Most local data policies will not need to be present in the derived data source with the exception of limit usage to purpose(s) policies. And no global policies will be added to a derived data source.
Data source C's policies are reliant on which groups are in the project, and as the groups change so do the policies.
For example, if there were a data user in the project who was not in the Legal group, then that trait would not be needed in the subscription policy because, with equalization, those values would not be visible to the project members in the parent data source.
The subscription and data policies in the derived data source will always be the minimum required permissions and traits because of project equalization.
Derived data source policies will not adapt with the parent data sources. Any changes in the parent data source policies will be logged in the Relationships tab of the derived data source page, but will not be changed in the derived data source policies.
The data owner may choose to add new local data policies to the derived data source to keep up with any changes, but the inherited policies are not adjustable.
Any changes within the parent data source's data will not trickle down into the derived data source. After the creation of the derived data source, they stay connected for auditing and relationships, not for updating content.
If members use data outside the project to create their data source, they must first add that data to the project and re-derive the data source through the project connection. When creating a derived data source, members are prompted to certify that their data is derived from the parent data sources they selected upon creation.
With project equalization enabled, project users can create Snowflake or Databricks Spark project workspaces where users can view and write data.
: Create a project workspace to allow Snowflake users subscribed to the project to write data to the project.
: Create a project workspace to allow Databricks users subscribed to the project to write data to the project.
: Write data to a project when working in the context of a Snowflake or Databricks project workspace.
: Create a derived data source to share the data you've written with other Immuta users.
: This reference guide describes the components and design of project workspaces for Snowflake and Databricks and defines derived data sources.
: This reference guide lists the available functions for switching your project context in Databricks.
Once the workspace is created, project members will see relevant data sources when working under the project context.
.
Write data to the project workspace in Snowflake or Databricks:
Snowflake: Select the role created by the project workspace. The role created will be a combination of the database name (configured by the application admin) and the schema name. Then, write data to this location.
Databricks: Write data to the directory and database created in Databricks for the project workspace.
Now that data has been written to the workspace, users can share this data with others by making it a derived data source in Immuta.
Deprecation notice: Support for this feature has been deprecated.
Select a project.
Select the data source from which the new data was created.
Select Table for the virtual population option.
Click Edit and select the tables you created, and then click Apply.
Opt to edit the Basic Information fields, and then click Create.
After workspaces are configured, project owners can enable workspaces within their projects. This feature allows project members to to the project and share this data with other users as .
Requirement: You must own the project
Prerequisites:
.
.
External IDs have been connected with an IAM or in for Snowflake.
Data sources registered by : Snowflake workspaces generate static views with the credentials used to register the table as an Immuta data source. Those tables must be registered in Immuta by an excepted role so that policies applied to the backing tables are not applied to the project workspace views.
Navigate to the Policies tab and enable project equalization by clicking the Project Equalization slider to on.
Scroll to the Native Workspace section and click Create.
Select Snowflake from the Workspace Configuration dropdown menu.
Name the Workspace Schema. By default, the schema name is based off of the project name, but you can change it here. Your project workspace will exist within this schema under Snowflake under the database configured by the Application Admin.
Use the dropdown menu to select the Hostname. Projects can only be configured to use one Snowflake host.
Select one or more Warehouses to be available to project members when they are working in the Snowflake workspace.
Click Create to enable the workspace.
Scroll to the Native Workspace section on the policies tab and click the toggle to disable the workspace.
Click Delete in the native workspace section.
Choose one of the following options in the modal:
Purge Generic Workspace Data: Permanently delete data, while the data used by derived data sources is preserved. Note: If you created a derived data source that references a view on top of a table in Snowflake that isn't a derived data source, that table will be deleted and break the derived data source.
Purge Everything & Delete Derived Data Sources: Permanently delete data and purge all derived data sources.
Click Delete.
Data consumers are Immuta users that consume the data available through Immuta in their data platform as usual. Find how-to guides specific to data consumers listed below.
: Subscribe to data sources in Immuta, run health jobs, and complete other actions as a data source subscriber.
Query data: Query policy-protected data in your normal data platform.
: Subscribe to projects in Immuta to collaborate with others, work under a purpose, or write data to a workspace.
Prerequisites:
REVOKE
users' access to raw tables
Use your tool of choice to connect to Redshift.
Select the Immuta database name used when .
Query the Immuta-protected data, which takes the form of immuta_database.backing_schema.table_name
:
Immuta Database: The Immuta database name used when .
Backing Schema: The schema that houses the backing tables of your Immuta data sources.
Table Name: The name of the table backing your Immuta data sources.
Run your query (it is recommended that you use the catalog in the query). It should look something like this:
set
manage , , and
enable and
, , and
create or workspaces
manage , , and
add and remove
and
add and remove
(unless project equalization or masked joins is enabled)
After sending an , a tasks tab will appear on the data source overview page listing the target and requesting users, the task type, and the state of the task.
Immuta projects are represented as within Snowflake. As they are linked to Snowflake, projects automatically create corresponding
Using and , Databricks Spark project workspaces are a space where every project member has the same level of access to data. This equalized access allows collaboration without worries about data leaks. Not only can project members collaborate on data, but they can also write protected data to the project.
An Immuta user with the CREATE_PROJECT
permission with Databricks data sources.
The Immuta project owner which balances every project members’ access to the data to be the same.
The Immuta project owner which automatically generates a subfolder in the root path specified by the application admin and remote database associated with the project.
The Immuta project members use their newly written derived data and . These derived data sources inherit the necessary Immuta policies to be securely shared outside of the project.
Immuta currently supports the abfss
schema for Azure General Purpose V2 Storage Accounts. this includes support for Azure Data Lake Gen 2. When configuring Immuta workspaces for Databricks on Azure, the Azure Databricks workspace ID must be provided. More information about how to determine the workspace ID for your workspace can be found in the . It is also important that the additional configuration file is included on any clusters that wish to use Immuta workspaces with credentials for the container in Azure Storage that contains Immuta workspaces.
Install third party libraries as cluster-scoped: Notebook-scoped libraries have limited support. See the page for more details.
For detailed instructions on creating a derived data source, navigate to .
immuta.set_current_project(id)
Sets the user's current project to the project ID denoted by the id parameter. This UDF must be called in its own notebook cell to ensure the changes take effect.
immuta.set_current_project() (no parameters)
Sets the user's current project to None.
immuta.clear_caches()
Clears all client caches for the current user's ImmutaClient instance. This can be used when a user would like to invalidate cached items, like data source subscription information or if the state of Immuta has changed and the cache is outdated. For backward compatibility, this UDF is also available at default.immuta_clear_caches()
default.immuta_clear_metastore_cache()
Clears the cluster-wide Metastore cache. This UDF can only be run by a privileged user.
immuta.get_current_project
select * from immuta.get_current_project
This virtual table returns a single row with "name" and "id" columns that show your currently selected project.
immuta.list_projects
select * from immuta.list_projects
This virtual table returns rows with "name," "id," and "current_project" columns. Each row is a different project to which you are subscribed (and can use as your current project). The "current_project" row will be true for the row defining the project that you have set as your current project.
Click Data in the navigation menu and then select Projects.
Click Join Project from either the all projects view or the project overview tab, which can be accessed by clicking on the project.
Click Join to confirm you want to join the project.
After you have been granted access to the project, go to the project and click I Agree to acknowledge that you will only use the project for its specified purposes.
Click the dropdown menu in the top right corner of the console.
Select a project. Once selected, the current project will display at all times in the top right corner of the console.
If you unsubscribe from the project, this display will default to No Current Project.
View your available projects by running the following query in Spark: select * from immuta.list_projects. In the resulting table, note the values listed in the id column; this value will be used at the parameter in the following step.
Run select immuta.set_current_project(<id>
). This UDF must be called in its own notebook cell to ensure the changes take effect.
Your project context will be switched, and that project's data sources and workspaces will now be visible. To set your project context to None, run select immuta.set_current_project()
with no parameters.
Note: Since the UDFs are not actually registered with the FunctionRegistry, if you call DESCRIBE FUNCTION immuta.set_current_project
, you won't get back the documentation for the UDF. For a complete list of functions, see the Project UDFs reference guide.
Any project member can create project-based API keys, which are used for authenticating external tools with Immuta.
Navigate to the Project Overview tab.
Click the Get API Key button at the bottom of the left panel.
An API Key modal will display with your requested information. Store these credentials somewhere secure. If you misplace this information, you will have to generate a new key and re-authenticate all services connected to Immuta via this key.
Click the Close button.
Project SQL accounts are unique to each project and only provide access to the data sources in that project. Project SQL credentials cannot be retrieved from Immuta if they are lost. Credentials can only be re-generated using the instructions below. When a user generates new SQL credentials for a project, any existing SQL credentials for that project the user may have had are revoked.
Navigate to the Project Overview tab.
Click SQL Connection in the upper right corner.
A window will display with the connection information. Store these credentials somewhere secure. If you misplace them, you will have to generate a new account and re-authenticate all services connected to Immuta via this account.
Click Close.
Navigate to the Project Overview tab.
Click the Leave Project button in the upper right corner, and then click Confirm.
Any project member can add data sources to a project, unless the project equalization or allow masked joins features are enabled; in those cases only project owners can add data sources to the project.
Select the project, and then navigate to the Project Overview tab.
Click the Add Data Sources button in the Data Sources section in the center pane.
Start typing the name of a data source you'd like to include in the project.
Select the data source from the list of auto-completed options in the dropdown menu.
Repeat this process to add additional data sources to the list. Click Remove to remove them.
When complete, click the Add button at the bottom of the list.
You can automatically add all data sources to a project that contain a Limit usage to purpose policy that matches the purpose of that project.
Select a Project, and click the Add Data Sources button on the Data Sources tab.
Click Add By Purpose in the top right of the dialog.
All data sources matching the project's purpose(s) will populate at the bottom of the dialog. Review this list, and then click Save.
Select a project and navigate to the Overview or Data Sources tab.
Click Add Derived Data Source.
Begin typing in the Search by Name or Description text box, and then select the data source(s) from which your new data source will be derived.
Click Save.
Follow the instructions for creating a data source.
Deprecation notice: Support for this feature has been deprecated.
As a project member, you can only delete data sources you've added to the project.
Select a project, and then click the Data Sources tab.
Click the Remove Data Source icon in the Actions column of the data source you want to remove.
Click Confirm in the window that appears.
Immuta's UI provides a list of all projects, excluding those that have been set to private. Users can search for projects by keyword, tag, or data source.
To access a list of all projects, click the projects icon in the left sidebar, and then click the All Projects tab.
To filter projects by keyword, type one or more keywords into the search box at the top of the page, and select a keyword from the auto-completed results. If a list does not display, then no keywords matching that text currently exist.
To filter projects by tags, type one or more tag names into the search box at the top of the page and select a tag from the list of auto-completed results. If a list of tags does not display, then no tags matching that text currently exist.
To filter projects by data sources, type one or more data source names into the search box at the top of the page and select a data source from the list of auto-completed results. If a list of data sources does not display, then no data sources matching that text currently exist.
Prerequisites:
REVOKE
users' access to raw tables
GRANT
users' access to the Immuta schema
Click the Data menu in Synapse Studio.
Click the Workspace tab.
Expand the databases, and you should see the dedicated pool you specified when configuring the integration.
Expand the dedicated pool and you should see the Immuta schema you created when configuring the integration.
Select that schema.
Select New SQL script and then Empty script.
Run your query (note that Synapse does not support LIMIT and the SQL is case sensitive). It should look something like this:
Prerequisites:
Execute the USE SECONDARY ROLES ALL
command or change your role to the table grants role.
Query the data as you normally would in Snowflake.
Prerequisite: Users have been granted SELECT
privileges on all or relevant Snowflake tables
Query the data as you normally would in Snowflake:
Prerequisites:
Select SQL from the navigation menu in Databricks.
Click Create → Query.
Run your query as you normally would:
Prerequisites:
Use your tool of choice to connect to Starburst.
Query the Immuta-protected data as you normally would:
Requirement: You must own the project
On the Members tab, click the Role of the member whose role you want to change.
Select a different role: subscribed or owner.
On the Members tab, click the Deny button next to the user or group you want to remove.
Complete the Reasoning field in the window that appears, and then click Submit.