Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Prerequisite: Before using this walkthrough, please ensure that you’ve first done the Parts 1-5 of the POV Data Setup and the Schema Monitoring and Automatic Sensitive Data Discovery walkthrough.
It’s recommended, but not required, that you also complete the Separating Policy Definition from Role Definition: Dynamic Attributes walkthrough.
The use case for this walkthrough is fairly simple, but unachievable in most role-based access control models. Long story short, the only way to support AND boolean logic with a role-based model is by creating a new role that conflates the two or more roles you want to AND together.
Let’s take an example: we want users to only see certain data if they have security awareness training AND have consumer privacy training. It would be natural to assume you need both separately as metadata attached to users to drive the policy, but when you build policies in a role based model, it assumes roles are either OR’ed together in the policy logic or you can only act under one role at a time, and because of this, you will have to create a single role to represent this combination of requirements “users with security awareness and consumer privacy training”. This is completely silly and unmanageable - you need to account for every possible combination relevant to a policy, and you have no way of knowing that ahead of time.
We go over the benefits of an attribute-based access control model over a role-based model ad nauseam in the Separating Policy Definition from Role Definition: Dynamic Attributes walkthrough if you have not done that yet and want more details. (We recommend you do if this walkthrough is of interest to you.)
If you only have to manage 7 understandable attributes/roles vs 700 - wouldn’t you want to? That’s the real value here.
Scalability: Far fewer roles to manage.
Understandability: Policies (and roles) are clearly understood. No one super user is required to explain what is going on.
Evolvability: No fear of making changes, changes are made easily, again, without the need for super user tribal knowledge.
Durability: Changes in data and users will not result in data leaks.
Because of this, the business reaps
Increased revenue: accelerate data access / time-to-data.
Decreased cost: operating efficiently at scale, agility at scale.
Decreased risk: prove policy easily, avoid policy errors.
Assumptions: Your user has the following permissions in Immuta (note you should have these by default if you were the initial user on the Immuta install):
GOVERNANCE: in order to build policy against any table in Immuta OR are a “Data Owner” of the registered tables. (You likely are the Data Owner and have GOVERNANCE permission.)
USER_ADMIN: in order to manage groups/attributes on users.
We need to have attributes or groups assigned to you to drive policy. With Immuta these can come from anywhere (we mean literally anywhere), and Immuta will aggregate them to use in policy. Most commonly these come from your identity manager, such as LDAP, Active Directory, Okta, etc., but for simplicity sake, we are going to assign attributes to you in Immuta.
Click the People icon and select Users in the left sidebar.
Select your name and click + Add Attributes.
In the Add Attributes modal, type Training Accomplished in the Attribute field and click Create.
In the Attribute value field, create these two values: Security Awareness and Consumer Privacy.
Repeat these steps for the non-admin user you created in Part 3 of the POV Data Setup. However, give that user
Attribute: Training Accomplished
Attribute value: Security Awareness
Notice that second user does not have Consumer Privacy training.
Click the Policies icon in the left sidebar of the Immuta console.
On the Data Policies tab, click + Add Data Policy.
Name the policy: RLS and condition.
Select the action: Only show rows.
Select the sub-action: where.
Set the where clause as: salary < 200000.
Leave for everyone except, and change the exception to
possesses attribute
Training Accomplished
Security Awareness
Click + Add Another Condition.
Make sure the logic is and
Possesses attribute
Training Accomplished
Consumer Privacy
Click Add.
Under, Where should this policy be applied?, select
On Data Sources
with column names spelled like
salary
Don’t select any modifiers. We did it this way because the salary column has no column tags we can use. (We could have added a tag to it, though, if we wanted.)
Click Create Policy and then Activate Policy.(Ignore any warnings of policy overlap.)
Following the Query Your Data guide, test that your user sees the rows with salary above 200000 because you have both trainings and your non-admin user only sees rows with a salary under 200000 because they only have one of the two required trainings.
This is just another example of why role-based policy management can get you in trouble. This problem specifically leads to an industry phenomenon termed “role explosion.” Roles must account for every possible combination of requirements for a policy since that logic cannot be prescribed as an AND condition.
Almost every database follows a role-based model, including legacy policy engines such as Apache Ranger. For example, with Snowflake you can only act under one role at a time, so all policy logic must consider that, in Databricks you may have multiple groups that are all assumed, but the methods for defining policy do not allow AND logic against those groups. This same problem holds true for Ranger policy logic.
The answer is an attribute based model where you can separate defining policy from defining user and data metadata, providing scalability and avoiding role explosion. We have seen this in real customer use cases. In one great example we required 1 policy in Immuta for the equivalent controls requiring 96 rules in Ranger.
For more reading on the RBAC anti-pattern: Data Governance Anti-Patterns: Stop Conflating Who, Why, and What
More reading on RBAC vs ABAC in general: Role-Based Access Control vs. Attribute-Based Access Control — Explained
And if you really want to go deep, NIST on ABAC: Guide to Attribute Based Access Control (ABAC) Definition and Considerations
Feel free to return to the POV Guide to move on to your next topic.
Prerequisite: Before using this walkthrough, please ensure that you’ve done Parts 1-3 in the POV Data Setup walkthrough.
Immuta considers itself a “live” metadata aggregator - not only metadata about your data but also your users. Considering data specifically, to be “live” means Immuta will monitor for schema changes in your database and reflect those changes in your Immuta instance. This allows you to register your databases with Immuta and not have to worry about registering individual tables today or in the future.
Additionally, when the tables are discovered through the registration process, Immuta will inspect the table data for sensitive information and tag it as such. These tags are critical for scaling tag-based policies which you will learn about in subsequent walkthroughs. This sensitive data discovery is done by inspecting samples of your data and using algorithms to decide what we believe the data contains. Those tags are editable or new custom tags can be curated and added by you.
It is also possible to add tags curated or discovered in other systems or catalogs. While this is not specifically covered in this walkthrough, it’s important to understand.
Both the monitoring for new data and discovering and tagging sensitive data aligns with the Scalability and Evolvability theme, removing redundant and arduous work. As users create new tables or columns in your database, those tables/columns will be automatically registered in Immuta and automatically tagged. Why does this matter? Because once they are registered and tagged, policies can immediately be applied - this means humans can be completely removed from the process by creating tag-based policies that dynamically attach themselves to new tables. (We’ll walk through tag-based policies shortly.)
Because of this, the business reaps
Increased revenue: accelerate data access / time to data because where sensitive data lives is well understood.
Decreased cost: operating efficiently at scale, agility at scale.
Decrease risk: sensitive data discovered immediately.
Assumptions: Your user has the following permissions in Immuta (note you should have these by default if you were the initial user on the Immuta installation):
CREATE_DATA_SOURCE: in order to register the data with Immuta
GOVERNANCE: to create a custom tag in Immuta
We are going to create a custom tag to tag the data with, this will:
Help differentiate your real data from this fake POV data.
Help build global policies across these tables from multiple compute/warehouses, if you have more than one.
To create a custom tag,
Click the Governance icon in the left sidebar of the Immuta console.
Click on the Tags tab.
Click + Add Tags.
Name the tag Immuta POV. You can delete the nested tag placeholder in order to save.
Click Save.
Let’s walk through registration of a schema to monitor (You do not need GOVERNANCE permission to do this step, only CREATE_DATA_SOURCE):
From the Data Source page, click the + New Data Source button.
Data Platform: Choose the data platform of interest. This should align to where you loaded the data in the POV Data Setup walkthrough, but of course could be your own data as well. Note that if you are using the same Databricks workspace for Databricks and SQL Analytics, you only need to load it once.
Connection Information: This is the account Immuta will use to monitor your database and query the data metadata. This account should have read access to the data that you need to register. For simplicity, you may want to use the same account you used to load the data in the POV Data Setup walkthrough, but it’s best if you can use an Admin account for registering the data and a separate user account for querying it (which we’ll do later). It should also point to the data you loaded in the POV Data Setup walkthrough which should be the immuta_pov
database unless you named the database something else or placed the data somewhere else.
Virtual Population: There are several options here for how you want Immuta to monitor your database to automatically populate metadata. In our case we want to choose the first option: Create sources for all tables in this database and monitor for changes.
Basic Information: This section allows you to apply a convention to how the tables are named. If you have multiple data warehouses/compute and you’ve already registered these tables once and are registering them now from a 2nd (or more) warehouse/compute, you will have to change the naming convention for the Immuta data source name and schema project so you can tell them apart in the Immuta UI. This will NOT impact what they are named in the native database.
Advanced (Optional):
Note that Sensitive Data Discovery is enabled.
We are going to add that Immuta POV tag we created above by going to the last section “Data Source 1. Tags”
Click Edit.
Enter Immuta POV and click Add.
This will add that tag to any table that is discovered now or in the future.
You can leave the defaults for the rest.
Click Create to kick off the job.
Repeat these steps for each warehouse/compute you have (not to be confused with a Snowflake warehouse; we mean other data warehouses, like Redshift, Databricks SQL analytics, etc.).
You will be dumped into a screen that depicts the progress of your monitoring job. You’ll also see a gear spinning in the upper right corner of the screen which depicts the jobs that are running, one of those being the “fingerprint,” which is what is used to gather statistics about your tables and run the Sensitive Data Discovery.
Once the tables are registered and the gear stops spinning, click into the Immuta POV Immuta Fake Hr Data table. Once there, click on the Data Dictionary tab. In there you will see your columns as well as the Sensitive Data that was discovered. Also note that because we found a specific entity (such as Discovered.Entity.Person Name
), we also tag that column with other derivative tags (such as Discovered.Identifier Indirect
). This hierarchy will become important in the Hierarchical Tag-Based Policy Definitions walkthrough.
Also visit the Data Dictionary in the Immuta POV Immuta Fake Credit Card Transactions table. If you scroll to the bottom column, transaction_country
, you’ll notice we incorrectly tagged it as Discovered.Entity.State
- you can go ahead and remove that tag. Notice it is simply disabled so that when monitoring runs again it will not be re-tagged with the incorrect Discovered.Entity.State
tag.
One thing worth mentioning is that the table is completely protected after being discovered based on our default policy. We’ll learn more about this in subsequent sections.
This anti-pattern is pretty obvious - instead of automatically detecting schema/database changes you would have to manually manage that, and instead of automatically detecting sensitive data, you would also have to manually manage that.
It’s not just the manual time suck, but also complicates the process, because not only must you understand when a new table is present, but you then must remember to tag it and potentially protect it appropriately. This leaves you ripe for data leaks as new data is created across your organization, almost daily.
If you came to this walkthrough from the POV Data Setup, please make sure to complete the final Part 5 there!
Otherwise, feel free to return to the POV Guide to move on to your next topic.
Prerequisite: Before using this walkthrough, please ensure that you’ve first done the Parts 1-5 of the POV Data Setup and the Schema Monitoring and Automatic Sensitive Data Discovery walkthrough.
There are a myriad of techniques and processes companies use to determine what users should have access to which tables. We’ve seen 7 people having to respond to an email chain for approval before a DBA runs a table GRANT statement, for example. Manual approvals are sometimes necessary, of course, but there’s a lot of power and consistency in establishing objective criteria for gaining access to a table rather than subjective human approvals.
Let’s take the “7 people approve with an email chain” example. We like to ask the question: “why do any of you 7 say yes to the user gaining access?” If it’s objective criteria, you can completely automate this process. For example, if the approver says, “I approve them because they are in group x and work in the US,” that is user metadata that could allow the user to automatically gain access to the tables, either ahead of time or when requested. This removes a huge burden from your organization and avoids mistakes. Note that many times it can be the purpose of what the user is working on that drives if they should have access or not; we cover that in our purpose exceptions walkthrough next.
Being objective is always better than subjective: it increases accuracy, removes bias, eliminates errors, and proves compliance. If you can be objective and prescriptive about who should gain access to what tables - you should.
Because of this, the business reaps
Increased revenue: accelerate data access / time-to-data, no waiting for humans to make decisions.
Decreased cost: operating efficiently at scale, agility at scale because humans are removed from the daily approval flows.
Decreased risk: avoid policy errors through subjective bias and mistakes.
Assumptions: Your user has one of the following permissions in Immuta. (Note that you should have these by default if you were the initial user on the Immuta installation):
GOVERNANCE: in order to build policy against any table in Immuta
“Data Owner” of the registered tables (you likely are the Data Owner and have GOVERNANCE permission).
Up until now we’ve only talked about data policies, which limit what users see WITHIN a table. Subscription policies manage who can see what tables, similar to table GRANTs you may already be familiar with.
Immuta supports multiple modes of subscription policies:
Allow anyone: This is where anyone can access the table. (We did this in Part 5 of the POV Data Setup.)
Allow anyone who asks (and is approved): These are manual subjective approvals.
Allow users with specific groups/attributes: These are objective policies for access that we will walk through.
Allow individually selected users: This is just like database GRANTs, the table is hidden until a user or group is manually granted access.
As you can see, Immuta does support subjective approvals through “Allow anyone who asks (and is approved)” because there are some regulatory requirements for them, although we’d argue this is an anti-pattern. To show you the power of objective subscription policies, we’ll walk through “Allow users with specific groups/attributes” policy building.
Following the Query Your Data guide, confirm that both your user and the non-admin user you created in Part 3 of the POV Data Setup can query any POV table (“Fake HR Data” or “Fake Credit Card Transactions”). Both can query them due to the “Allow anyone” subscription policy you created in Part 5 of the POV Data Setup; otherwise, the non-admin user would not have been able to query it. We are going to edit that policy to change it to an “Allow users with specific groups/attributes” policy.
Click the Policies icon in the left sidebar of the Immuta console.
Click the Subscription Policies tab at the top. You should see the “Open Up POV Data” subscription policy you created in Part 5 of the POV Data Setup.
Edit the Open Up POV Data subscription policy by clicking the menu button on its right and selecting Edit.
Change How should this policy grant access? from Allow anyone to Allow users with specific groups/attributes.
For Allow users to subscribe when user, select the condition
possesses attribute
Key: Country
Value: JP (Remember, you created this attribute in the Schema Monitoring and Automatic Sensitive Data Discovery walkthrough.)
At this point there are several other settings for this policy. For the purposes of this walkthrough, leave all of the below options unchecked.
Allow Discovery: Users who don’t meet the policy (don’t have Country JP in this case) will still be able to see/discover this table in the Immuta UI, otherwise it will be hidden.
Require users to take action to subscribe: If left unchecked (the default) users will automatically be subscribed to the table when the policy is activated or when they reach a state (get Country JP) that allows them to be subscribed. If checked, the user would visit the data source page in the catalog and request a subscription (and be immediately added because they meet the policy).
On merge, allow shared policy responsibility: This will allow other data owners or governors to build policies against these tables as alternatives to your policy (essentially additional policies are OR'ed rather than AND'ed).
Leave the Where should this policy be applied? as is.
Click Save Policy.
Following the Query Your Data guide, confirm that now your admin user with Country JP can see any POV table (“Fake HR Data” or “Fake Credit Card Transactions”); however, the non-admin user you created in Part 3 of the POV Data Setup with only Country US cannot query either table (in some cases, based on enforcement, the user can query the table, but they just won’t get any data back). You can also look at any of the “Members” tab of the data source in the Immuta UI and see your non-admin user has been removed. Additionally, if you log in to Immuta as your non-admin user, you will notice all those tables are gone from the catalog.
Now let’s go back and make it so the non-admin user you created in Part 3 of the POV Data Setup can see the tables again:
Click the Policies icon in the left sidebar.
Click the Subscription Policies tab at the top.
Edit the Open Up POV Data subscription policy again by clicking the menu button on its right and selecting Edit.
Click + Add Another Condition.
Change and to or” (IMPORTANT).
Select the condition
possesses attribute
Key: Country
Value: US
Leave everything else as is and click Save Policy.
Following the Query Your Data guide, confirm both users can see the tables again.
The anti-pattern is manual approvals. We understand that there are some regulatory requirements for this, but if there’s any possible way to switch to objective approvals, you should do it. With subjective human-driven approvals, there is bias, larger chance for errors, and no consistency - this makes it very difficult to prove compliance and is simply passing the buck (and risk) to the approvers and wasting their valuable time.
One could argue that it’s subjective or biased to assign a user the Country JP attribute. This is not true, because, remember, we separated the data policy from user metadata. The act of giving a user the Country JP attribute is simply defining that user, there is no implied access given to the user from this act and that attribute will be objective - e.g., you know if they are in JP or not.
As we’ve seen in our other anti-patterns, the approach where an access decision is conflated with a role or group is common practice (Ranger, Snowflake, Databricks, etc). So not only do you end up with manual approval flows, but you also end up with role explosion from so many roles to meet every combination of access, as we described in the Policy boolean logic walkthrough. If you want a real life example of this in action, you can watch a financial institution’s YouTube video about a software product they built whose sole purpose was to help users understand what roles they needed to request access to (through manual approval) in order to get to the data they needed - they had so much role explosion they had to build an app to handle this.
Feel free to return to the POV Guide to move on to your next topic.
Prerequisite: Before using this walkthrough, please ensure that you’ve first done the Parts 1-5 of the POV Data Setup and the Schema Monitoring and Automatic Sensitive Data Discovery walkthrough.
Let’s draw an analogy. Imagine you are planning your wedding reception. It’s a rather posh affair, so you have a bouncer checking people at the door.
Do you tell your bouncer who’s allowed in? (exception-based) Or, do you tell the bouncer who to keep out? (rejection-based)
The answer to that question should be obvious, but many policy engines allow both exception- and rejection-based policy authoring, which causes a conflict nightmare. Ignoring that anti-pattern for a moment (which we’ll cover in the Anti-Pattern section) exception-based policy authoring in our wedding analogy means the bouncer has a list of who should be let into the reception. This will always be a shorter list of users/roles if following the principle of least privilege, which is the idea that any user, program, or process should have only the bare minimum privileges necessary to perform its function - you can’t go to the wedding unless invited. This aligns with the concept of privacy by design, the foundation of the CPRA and GDPR, which states “Privacy as the default setting.”
What this means in practice is that you should define what should be hidden from everyone, and then slowly peel back exceptions as needed.
Using an exception-based approach is a security standard across the board; this is because it’s a scalable approach that avoids costly data leaks and allows the business to move quickly. The “how” will be discussed in more detail in the Anti-Patterns section.
Because of this, the business reaps
Increased revenue: accelerate data access / time-to-data.
Decreased cost: operating efficiently at scale, agility at scale by building exceptions to agreed-upon foundational policies.
Decreased risk: avoid data leaks by not conflating conflicting exception- and rejection-based policies.
Assumptions: Your user has the following permissions in Immuta (note you should have these by default if you were the initial user on the Immuta installation):
GOVERNANCE: in order to build policy against any table in Immuta OR are a “Data Owner” of the registered tables (you likely are the Data Owner and have GOVERNANCE permission).
USER_ADMIN: in order to manage groups/attributes on users.
We need to have attributes or groups assigned to you to drive policy. With Immuta these can come from anywhere (we mean literally anywhere), and Immuta will aggregate them to use in policy. Most commonly these come from your identity manager, such as LDAP, Active Directory, Okta, etc., but for simplicity sake, we are going to assign attributes to you in Immuta.
Click the People icon and select Users in the left sidebar.
Select your name and click + Add Attributes.
In the Add Attributes modal, enter Department in the Attribute field.
Enter HR for Attribute value field.
Repeat these steps for the non-admin user you created in Part 3 of the POV Data Setup. However, give that user the Attribute Department with the Attribute Value Analytics (instead of HR).
In Immuta, visit the Fake HR Data data source (from any warehouse/compute).
Go to the Data Dictionary tab and find where you have the Discovered.Entity.Person Name
tags. Let’s build a policy against that tag that includes an exception.
Click the Policies icon in the left sidebar.
Click + Add New Data Policy.
Name it Mask Person Name.
For action, select Mask.
Leave columns tagged.
Type in the tag Discovered.Entity.Person Name
.
Change masking type to using a constant.
Type in the constant REDACTED.
Leave for everyone except and change the exception to
possesses attribute
Department
HR
Click Add.
Leave Where should this policy be applied? as is. (Immuta will guess correctly based on the previous steps.)
You can further refine where this policy is applied by adding another circumstance:
Click + Add Another Circumstance.
Change the or to an and.
Select tagged for the circumstance. (Make sure you pick “tagged” and not “with columns tagged.")
Type in Immuta POV for the tag name. (Remember, this was the tag you created in Schema Monitoring and Automatic Sensitive Data Discovery.) Note that if you are a Data Owner of the tables without GOVERNANCE permission the policy will be automatically limited to the tables you own.
Click Create Policy and then Activate Policy.
Following the Query Your Data guide, test that your user sees the Person Name tagged columns in the clear, because you are part of Department HR and your non-admin user sees “REDACTED” for the same columns because they are not part of Department HR.
Let’s make this a little more complex. Let’s say that we want people in Department HR to see hashed names, but everyone else to see REDACTED. To do this, let’s update the policy:
From the Policies page, click the menu button on the Mask Person Name policy we just created and click Edit.
Click the three dot button on the actual policy definition and then select Edit. (Note you edit that separately because you can have multiple policy definitions in the single policy.)
Change everyone except to everyone who.
Change using a constant to using hashing in the first policy.
Click Update.
Click Save Policy.
Again, following the Query Your Data guide, test that your user sees the Person Name tagged columns now hashed, because you are part of Department HR and your non-admin user sees “REDACTED” for the same columns because they are not part of Department HR.
A key point to realize here is that when you did “everyone who” you were actually building a rejection-based policy, but to ensure there was no data leak, Immuta forced you to also have that catch-all OTHERWISE statement at the end, similar to a for-else in coding. This retains the exception-based concept to avoid a data leak.
How could your data leak if it wasn’t exception based?
What if you did two policies:
Mask Person Name using hashing for everyone who possesses attribute Department HR.
Mask Person Name using constant REDACTED for everyone who possesses attribute Department Analytics.
Now, some user comes along who is in Department finance - guess what, they will see the Person Name columns in the clear because they were not accounted for, just like the bouncer would let them into your wedding because you didn’t think ahead of time to add them to your deny list.
Again, fairly obvious: rejection-based policies are the Anti-Pattern and are completely contradictory to the industry standard of least privileged access; yet, for some reason, tools like Ranger rely on them and send users tumbling down this trap.
There are two main issues:
Ripe for data leaks: Rejection-based policies are extremely dangerous and why Immuta does not allow them except with a catch-all OTHERWISE statement at the end, which you walked through. Again this is because if a new role/attribute comes along that you haven’t accounted for, that data will be leaked. It is impossible for you to anticipate every possible user/attribute/group that could possibly exist ahead of time just like it’s impossible for you to anticipate any person off the street that could try to enter your posh wedding that you would have to account for on your deny list.
Ripe for conflicts and confusion: Tools that specifically allow both rejection-based and exception-based policy building create a conflict disaster. Let’s walk through a simple example, noting this is very simple, imagine if you have hundreds of these policies:
Policy 1: mask name for everyone who is member of group A
Policy 2: mask name for everyone except members of group B
What happens if someone is in both groups A and B? We have to fall back on policy ordering to avoid this conflict, which requires users to understand all other policies before building their policy and it is nearly impossible to understand what a single policy does without looking at all policies.
Feel free to return to the POV Guide to move on to your next topic.
Prerequisite: Before using this walkthrough, please ensure that you’ve first done the Parts 1-5 of the and the walkthrough.
While many platforms support the concept of object tagging / sensitive data tagging, very few truly support hierarchical tag structures.
First, a quick overview of what we mean by hierarchical tag structure:
This would be a flat tag structure:
SUV
Subaru
Truck
Jeep
Gladiator
Outback
Each tag stands on its own and is not associated with one another in any way; there’s no correlation between Jeep and Gladiator nor Subaru and Outback.
A hierarchical tagging structure establishes these relationships, and we’ll explain why this is important momentarily.
SUV.Subaru.Outback
Truck.Jeep.Gladiator
“Support” for a tagging hierarchy is more than just supporting the tag structure itself. More importantly, policy enforcement should respect the hierarchy as well. Let’s run through a quick contrived example. Let's say that you wanted the following policies:
Mask by making null any SUV data
Mask using hashing any Outback data
Instead, if your policy engine truly supports a tagging hierarchy like Immuta does, it will recognize that Outback is more specific than SUV, and have that policy take precedence.
Mask by making null any SUV data
Mask using hashing any SUV.Subaru.Outback
data
Policies are applied correctly without any need for complex ordering of policies.
This allows the business to think about policy and application of policy based on a logical model of their data, because of this, you are provided:
Understandability: Policies are easily read and understood on their own without having to also comprehend precedence of policy (e.g., inspect each policy in combination with all other policies).
Evolvability: What if you need to change all Subaru data to hashing now? With Immuta, that’s an easy change, just update the policy. With solutions that don’t support tagging hierarchy, you must understand both the policy and its precedence. With a tagging hierarchy the precedence was taken care of when building the logical tagging model.
Correctness: If two policies hit each other at the same level of the hierarchy, the user is warned of this conflict when building the 2nd policy. This is important because in this case, there likely is a true conflict on the opinion of what the policy should be doing and the business can make a decision. With policy ordering this conflict is not apparent.
Because of this, the business reaps
Increased revenue: accelerate data access / time-to-data.
Decreased cost: operating efficiently at scale, agility at scale by avoiding comprehension of all policies at once in order to create/edit more of them.
Decreased risk: avoid policy errors through missed conflicts and not understanding policy precedence.
Assumptions: Your user has the following permissions in Immuta (note you should have these by default if you were the initial user on the Immuta installation):
GOVERNANCE: in order to build policy against any table in Immuta OR
“Data Owner” of the registered tables. (You likely are the Data Owner and have GOVERNANCE permission.)
To build a policy using tags,
In Immuta, visit the Fake HR Data data source (from any warehouse/compute).
Go to the Data Dictionary tab and view where you have the Discovered.Identifier Direct
and the Discovered.Entity.Social Security Number
tags. Let’s build two separate policies using those.
Policy 1:
Click the Policies icon in the left sidebar of the Immuta console.
Click + Add New Data Policy.
Name it Mask Direct Identifiers.
For action, select Mask.
Leave columns tagged.
Type in the tag Discovered.Identifier Direct
.
Change masking type to by making null.
Change everyone except to everyone. (This policy will have no exceptions.)
Click Add.
Leave Where should this policy be applied? as is. (Immuta will guess correctly based on previous steps.)
Click Create Policy and then Activate Policy.
Policy 2:
Click + Add New Data Policy.
Name it Mask SSN.
For action, select Mask.
Leave columns tagged.
Type in the tag Discovered.Entity.Social Security Number
.
Change masking type to using hashing.
Change everyone except to everyone. (This policy will have no exceptions.)
Click Add.
Leave Where should this policy be applied? as is. (Immuta will guess correctly based on previous steps.)
You can further refine where this policy is applied by adding another circumstance:
Click + Add Another Circumstance.
Change the or to an and.
Select tagged for the circumstance. (Make sure you pick “tagged” and not “with columns tagged.”)
Click Create Policy and then Activate Policy.
Now visit the Fake HR Data data source again (from any warehouse/compute).
Click the Policies tab.
You will see both of those policies applied; however, the “Mask Direct Identifiers” mask SSN policy was not applied because it was not as specific as the “Mask SSN” policy.
This has already been covered fairly well in the business value section, but policy precedence ordering is the anti-pattern and is unfortunately commonly found in tools such as Sentry and Ranger. The problem is that you put the onus on the policy builder to understand the precedence rather than baking that into your data metadata. The policy builder must understand all other policies and cannot build their policy in a vacuum. Similarly, anyone reading policy must consider it in tandem with every other policy and its precedence to understand how policy is going to actually be enforced. Other tools, like Snowflake and Databricks have no concept of policy precedence, which leaves you no solution at all to this problem.
Yes, this does put some work on the business to correctly build “specificity” into their tagging hierarchy (depth == specificity). This is not necessarily easy; however, this logic will have to live somewhere, and having it in the tagging hierarchy rather than policy order again allows you to separate policy definition from data definition. This provides you scalability, evolvability, understandability, and, we believe most importantly, correctness because policy conflicts can be caught at policy-authoring-time as described in the business value section.
Prerequisites: Before using this walkthrough, please ensure that you’ve first done the Parts 1-5 of the and the walkthrough.
When building access control into our database platforms we are all used to a concept called Role Based Access Control (RBAC). Roles both define who is in them, but also determine what those users get access to. A good way to think about this is Roles conflate the who and what: who is in them and what they have access to (but lack the why).
In contrast, Attribute Based Access Control (ABAC) allows you to decouple your roles from what they have access to, essentially separating the what and why from the who, which also allows you to explicitly explain the “why” in the policy. This gives you an incredible amount of scalability and understandability in policy building. Note this does not mean you have to throw away your roles necessarily, you can make them more powerful and likely scale them back significantly.
If you remember this picture and article from the start of this POV, most of the Ranger, Snowflake, Databricks, etc. access control scalability issues are rooted in the fact that it’s an RBAC model vs ABAC model.
This walkthrough will run you through a very simple scenario that shows why separating the who from the what is so critical to scalability and future proofing policies.
If you only have to manage 7 understandable policies vs 700 - wouldn’t you want to? That’s the real value here.
Scalability: Far fewer policies and roles to manage.
Understandability: Policies (and roles) are clearly understood. No one super user is required to explain what is going on.
Evolvability: No fear of making changes, changes are made easily, again, without the need for super user tribal knowledge.
Durability: Changes in data and users will not result in data leaks.
Because of this, the business reaps
Increased revenue: accelerate data access / time to data.
Decreased cost: operating efficiently at scale, agility at scale, data engineers aren’t spending time managing roles and complex policies.
Decreased risk: prove policy easily, avoid policy errors, understand what policy is doing.
Assumptions: Your user has the following permissions in Immuta (note you should have these by default if you were the initial user on the Immuta installation):
USER_ADMIN: in order to change attributes on users
GOVERNANCE: in order to build policy against any table in Immuta OR
“Data Owner” of the registered tables from Part 4 without the GOVERNANCE permission. (You likely are the Data Owner and have GOVERNANCE permission.)
This is a simple row-level policy that will restrict what countries you see in the “Immuta Fake Credit Card Transactions” table.
In order to do ABAC, we need to have attributes or groups assigned to you to drive policy. With Immuta these can come from anywhere (we mean literally anywhere), and Immuta will aggregate them to use in policy. Most commonly these come from your identity manager, such as LDAP, Active Directory, Okta, etc., but for simplicity sake, we are going to assign attributes to you in Immuta.
Click the People icon and select Users in the left sidebar.
Select your name and click + Add Attributes.
In the Add Attributes menu, type Country in the Attribute field and click create.
In the Attribute value field, type US and click create. Repeat the same process to add JP as an attribute value.
Repeat these steps for the non-admin user you created in Part 3 of the POV Data Setup. However, leave off JP and ONLY give US to that non-admin user.
In the Immuta UI, look at the Data Dictionary for the Immuta Fake Credit Card Transactions table (you can do this by visiting the data source in Immuta and clicking the Data Dictionary tab); notice that the column transaction_country
is tagged with Discovered.Entity.Location
. This will be important when policy building.
Click the Policies icon in the left sidebar of the Immuta console. (Note: This is not the Policy tab in the “Immuta Fake Credit Card Transactions” data source; that tab is for local policies).
On the Data Policies tab, click + Add Data Policy.
Name the policy: RLS walkthrough.
Select the action Only show rows.
Leave the sub-action as where user.
Set the qualification as possesses attribute.
Set the attribute key as Country. (Remember, we added those US and JP attributes to you under Country.)
Set the field as Discovered.Entity.Location
. (Remember, the transaction_country
column was tagged this.)
Change for everyone except to for everyone. This means there are no exceptions to the policy.
Click Add.
Leave the default circumstance Where should this policy be applied? with On data sources with columns tagged Discovered.Entity.Location
. This was chosen because it was the tag you used when building the policy.
You can further refine where this policy is applied by adding another circumstance:
Click + Add Another Circumstance.
Change the or to an and.
Select tagged for the circumstance. (Make sure you pick “tagged” and not “with columns tagged.”)
Click Create Policy and Activate Policy.
Now the policy is active and easily understandable. We are saying that the user must have a matching Country attribute to the value in the column transaction_country
in order to see that row and there are no exceptions to that policy. However, there’s a lot of hidden value in how you built this policy so easily:
Because you separated who the user is (their Country) from the policy definition (above) the user’s country is injected dynamically at runtime, this is the heart of ABAC. In an RBAC model this is not possible because the who and the what are conflated. You would have to create a Role PER COUNTRY. Not only that, you would also have to create a Role per combination of Country (remember, you had US and JP). RBAC is very similar to writing code without being able to use variables. Some vendors will claim you can fix this limitation by creating a lookup table that mimics ABAC, however, when that is done, you remove all your policy logic from your policy engine and instead place it in this lookup table.
We also didn’t care how the transaction_country
column was named/spelled because we based the policy on the logical tag, not the physical table(s). If you had another table with that same tag but the transaction_country
column spelled differently, the policy would have still worked. This allows you to write the policy once and have it apply to all relevant tables based on your tags, remembering Immuta can auto-discover many relevant tags.
If you add a new user with a never before seen combination of Countries, in the RBAC model, you would have to remember to create a new policy for them to see data. In the ABAC model it will “just work” since everything is dynamic - future proofing your policies.
Look at the data again to prove the policy is working:
Notice that the admin user will only see US and JP, and the non-admin user only sees US.
RBAC is an anti-pattern because you conflate the who with the what. Again, it’s like writing code without being able to use variables. If you were to write this policy with Ranger you would end up with hundreds if not thousands of policies because you need to account for every unique country combination. Doing this with groups or roles in Databricks and Snowflake would be the same situation.
You also cannot specify row-level policies based on tags (only column masking policies), so not only do you need all those Roles described above, but you also need to recreate those policies over and over again for every relevant table.
We have seen this in real customer use cases. In one great example we required 1 policy in Immuta for the equivalent controls requiring 96 rules in Ranger. There’s also of course the independent study referenced at the start of this walkthrough as well.
Prerequisite: Before using this walkthrough, please ensure that you’ve done .
In the prior walkthroughs in this theme, we’ve spent a lot of time talking about attribute-based access controls and their benefits. However, in today’s world of modern privacy regulations, deciding what a single user can see is not just about who they are, but what they are doing. For example, the same user may not be able to see credit card information normally, if they are doing fraud detection work, they are allowed to.
This may sound silly because it’s the same person doing the analysis, why should we make this distinction? This gets into a larger discussion about controls. When most think about controls, we think about data controls - how do we hide enough information (hide rows, mask columns) to lower our risk. There’s a second control called contextual controls; what this amounts to is having a user agree they will only use data for a certain purpose, and not step beyond that purpose. Combining contextual controls with data controls is the most effective way to reduce your overall risk.
In addition to the data controls you’ve seen in Immuta, Immuta is also able to enforce contextual controls through what we term “purposes.” You are able to assign exceptions to policies, and those exceptions can be the purpose of the analysis in addition to who the user is (and what attributes they have). This is done through Immuta projects; projects contain data sources, have members, and are also protected by policy, but most importantly, projects can also have a purpose which can act as an exception to data policy. Projects can also be a self-service mechanism for users to access data for predetermined purposes without having to involve humans for ad hoc approvals.
Purpose-based exceptions reduce risk and align with many privacy regulations, such as GDPR and CCPA. They also allow policy to be created a priori for exceptions to rules based on anticipated use cases (purposes) in your business, thus removing time-consuming and ad hoc manual approvals.
Because of this, the business reaps
Increased revenue: accelerate data access / time-to-data, no waiting for humans to make decisions.
Decreased cost: operating efficiently at scale, agility at scale because humans are removed from the daily approval flows.
Decreased risk: align to privacy regulations and significantly reduce risk with the addition of contextual controls.
Assumptions: Your user has the following permission in Immuta (note you should have these by default if you were the initial user on the Immuta installation):
GOVERNANCE: in order to add a new purpose, build policy against any table in Immuta, and approve a purpose’s use in a user created project.
In this example we are going to hide credit card data unless acting under a certain purpose. Let’s start by creating the purpose as a user with GOVERNANCE permission.
Click the Governance icon in the left sidebar of the Immuta console.
On the Purposes tab, click + Add Purpose.
For Purpose Name put Fraud Analysis.
Leave the default Statement. However, this statement is important, it is what the user is agreeing to when they use this Purpose; you can edit this to whatever internal legal language you want.
Leave the Description empty.
Click Create.
Click Edit to the right of your Fraud Analysis purpose to edit it.
Click Add Sub Purpose.
For the nested purpose, enter Charges.
Click Save.
Now let’s build a policy:
Click the Policies icon in the left sidebar.
Click + Add New Data Policy.
Name it Mask Credit Card Numbers.
For action, select Mask.
Leave columns tagged.
Type in the tag Discovered.Entity.Credit Card Number
.
Change the masking type to by making null.
Leave everyone except.
Set when user is to acting under purpose.
Set Fraud Analysis.Charges
as the purpose.
Click Add.
Now let’s add a second action to this policy:
Click + Add Another Action.
Select Limit usage to purpose(s).
Select Fraud Analysis as the purpose. (Notice that we left off Charges, unlike above.)
Change for everyone except to for everyone.
Click Add.
Leave Where should this policy be applied? as is.
Click Create Policy and then Activate Policy.
Click the Data icon and select Projects in the left sidebar of the Immuta console.
Click + New Project.
Name the Project My Fraud Project.
Set the description as Immuta POV.
Leave the documentation as the default. (You could add markdown to describe your project here.)
Set your purpose to Fraud Analysis.
Ignore Native Workspace.
For Data Sources, select your Fake Credit Card Transactions table(s).
Click Affirm and Create -- note that you are affirming the acknowledgement statement that popped up in the previous section.
Click the Project Overview tab.
You will see the Fraud Analysis purpose there, but it is staged.
At this point you have the project, but until another user with GOVERNANCE or PROJECT_MANAGEMENT permissions approves that purpose on that project, you cannot act under it. This is because a human must confirm that the data sources you added to the project align with the purpose you are acting under and the project you are attempting to accomplish. Yes, this is a manual approval step; however, it is fully documented in the project and audited, allowing the approver to make a decision with all information required. This is not a policy decision - it is a decision on if the project legitimately aligns to the purpose. Let’s go ahead and do that with your other user.
Go to your Immuta window that has the admin user with GOVERNANCE permission logged in.
You should see a little red dot above the Requests icon in the upper right corner of the Immuta console.
If you click on the Requests icon, you will see you have There are 1 pending Purpose Approval request(s).
Click the Review button. This will drop you into your Requests window under your profile. You can review the request by visiting the project through the hyperlink, but since you already know about the project, just click the checkbox on the right to approve it.
Go back to the other non-admin user window and refresh the project screen.
You will be asked to acknowledge the purpose per the statement that was attached when the purpose was created. Click I Agree. (That will be audited in Immuta.)
Now that the Fraud Analysis purpose is active, click in the upper right corner of the console where it says No Current Project - that menu is how you switch your project contexts to act under a purpose.
Set your current project to the one you created: My Fraud Project.
You are now acting under the purpose: Fraud Analysis.
Ok, that was cool, but look, the credit card numbers are null. This is because we use a more specific purpose to exception you from the credit card masking policy, remember, it was Fraud Analysis.Charges
rather than just Fraud Analysis
. So let’s make our purpose more specific in the Project, re-approve it, and then show the credit card numbers are in the clear.
Using the non-admin user that created the project, click Manage above the purposes on the My Fraud Project Overview tab.
In the Purposes drop down, uncheck Fraud Analysis and then select Fraud Analysis.Charges
.
This will require you to affirm the new purpose.
Go back to your admin user and go through the flow of approving the purpose again; you will have another Requests notification. (You can just refresh the Requests screen if you are already there.)
Once approved, go back to the non-admin user and refresh their My Fraud Project window.
Click I Agree to the acknowledgement.
But wait, did you notice something? Why are you able to see the table at all? You aren’t working under the purpose of Fraud Analysis
anymore? This is because Fraud Analysis.Charges
is a more specific subset of Fraud Analysis
, so by acting under it you also are acting as any other Purposes further up the tree - the power of hierarchical purposes!
DO THIS: Ok, now we need to do some cleanup because we want to use that credit card data later in these walkthroughs and not have to act under a purpose to do so (this will let the other walkthroughs stand on their own without having to do this walkthrough).
With your admin user, click the Policies icon in the left sidebar.
Find the Mask Credit Card Numbers policy you created in this walkthrough.
Click the menu button to the right of it and select Delete.
Click Confirm.
With your non-admin user, switch your project toggle: Switch Current Project: None. Note: If you do not do this step, you will only be able to see the tables in the My Fraud Project and no tables outside the My Fraud Project when querying data.
Some may claim they can do purpose exceptions using - you guessed it - Roles! Sigh, as we’ve seen, this continues to exacerbate our role explosion problem.
Also, there are two kinds of RBAC models: flat and hierarchical. Flat means you can only work under one role at a time (Snowflake uses this model) which does align well if you wanted to do the anti-pattern and use roles. However, most databases (everything other than Snowflake) have hierarchical roles, meaning you act under all your roles at once. For hierarchical roles, using a role for purpose doesn’t work because at runtime you have no idea which purpose the user is actually acting under - why does that matter? Remember, the user acknowledged to only use the data for that certain purpose, if the user has no way to explicitly state which purpose they are acting under, how can we hold them accountable?
Lastly, there is no workflow for the user to acknowledge they will only use the data for the purpose if you are simply assigning them roles, nor is there a workflow to validate that the purpose is aligned to the project the user is working on.
For these reasons, Purpose needs to be its own object/concept in your access control model.
With a flat structure, if you build those policies they will be in conflict with one another. To avoid that problem you would have to order which policies take precedence, which can get extremely complex when you have many policies. This is in fact how many policy engines handle this problem. (We’ll discuss more in the section.)
Type in Immuta POV for the tag name. (Remember, this was the tag you created in .) Note that if you are a Data Owner of the tables without GOVERNANCE permission, the policy will be automatically limited to the tables you own.
You can also test out everything was masked correctly by following the guide.
Feel free to return to the to move on to your next topic.
Follow our to run a query against the Immuta Fake Credit Card Transactions data in your compute/warehouse of choice to see the data before we create the policy. You can query with both your admin and non-admin user (if you were able to create a non-admin user).
Type in Immuta POV for the tag name. (Remember, this was the tag you created in .) Note that if you are a Data Owner of the tables without GOVERNANCE permission, the policy will be automatically limited to the tables you own.
Follow our to run a query against the Immuta Fake Credit Card Transactions data in your compute/warehouse of choice to see the data after we created the policy. You can query with both your admin and non-admin user (if you were able to create a non-admin user).
For more reading on the RBAC anti-pattern:
More reading on RBAC vs ABAC in general:
And if you really want to go deep, NIST on ABAC:
Feel free to return to the to move on to your next topic.
You should see the Fraud Analysis purpose you created listed now. However, let’s add a hierarchy to this purpose, similar to what we did with tags in the walkthrough.
Following the guide, confirm that neither your admin user nor the non-admin user you created in Part 3 of the can see data in the “Fake Credit Card Transactions” table. This is because neither is acting under purpose Fraud Analysis. If they could query the table, they wouldn’t see the credit card numbers either, because they also aren’t acting under purpose Fraud Analysis.Charges
.
Ok, so how do we work under a purpose? Let’s use your non-admin user you created in Part 3 of the for this part to prove that this is completely self service for your users.
Log in to Immuta with your non-admin user you created in Part 3 of the in a private or incognito window.
To prove it, following the guide, confirm that non-admin user can see data in the “Fake Credit Card Transactions” table. Make sure you are querying as the non-admin user that just switched their project. Note that it may take 10 - 20 seconds or so before Immuta updates your current project in the enforcement point before you can see the data. (Immuta does some caching.)
You are now acting under the purpose Fraud Analysis.Charges
. Now query the data again with your non-admin user following the guide. The credit card numbers are in the clear because you are acting under the appropriate purpose!
Feel free to return to the to move on to your next topic.