LogoLogo
2025.1Book a demo
  • Immuta Documentation - 2025.1
  • Configuration
    • Deploy Immuta
      • Requirements
      • Install
        • Managed Public Cloud
        • Red Hat OpenShift
      • Upgrade
        • Migrating to the New Helm Chart
        • Upgrading IEHC
      • Guides
        • Ingress Configuration
        • TLS Configuration
        • Cosign Verification
        • Production Best Practices
        • Rotating Credentials
        • External Cache Configuration
        • Enabling Legacy Query Engine
        • Private Container Registries
        • Air-Gapped Environments
      • Disaster Recovery
      • Troubleshooting
      • Conventions
    • Connect Data Platforms
      • Data Platforms Overview
      • Amazon S3
      • AWS Lake Formation
        • Register an AWS Lake Formation Connection
        • AWS Lake Formation Reference Guide
      • Azure Synapse Analytics
        • Getting Started with Azure Synapse Analytics
        • Configure Azure Synapse Analytics Integration
        • Reference Guides
          • Azure Synapse Analytics Integration
          • Azure Synapse Analytics Pre-Configuration Details
      • Databricks
        • Databricks Spark
          • Getting Started with Databricks Spark
          • How-to Guides
            • Configure a Databricks Spark Integration
            • Manually Update Your Databricks Cluster
            • Install a Trusted Library
            • Project UDFs Cache Settings
            • Run R and Scala spark-submit Jobs on Databricks
            • DBFS Access
            • Troubleshooting
          • Reference Guides
            • Databricks Spark Integration Configuration
              • Installation and Compliance
              • Customizing the Integration
              • Setting Up Users
              • Spark Environment Variables
              • Ephemeral Overrides
            • Security and Compliance
            • Registering and Protecting Data
            • Accessing Data
              • Delta Lake API
        • Databricks Unity Catalog
          • Getting Started with Databricks Unity Catalog
          • How-to Guides
            • Register a Databricks Unity Catalog Connection
            • Configure a Databricks Unity Catalog Integration
            • Migrate to Unity Catalog
          • Databricks Unity Catalog Integration Reference Guide
      • Google BigQuery
      • Redshift
        • Getting Started with Redshift
        • How-to Guides
          • Configure Redshift Integration
          • Configure Redshift Spectrum
        • Reference Guides
          • Redshift Integration
          • Redshift Pre-Configuration Details
      • Snowflake
        • Getting Started with Snowflake
        • How-to Guides
          • Register a Snowflake Connection
          • Configure a Snowflake Integration
          • Snowflake Table Grants Migration
          • Edit or Remove Your Snowflake Integration
          • Integration Settings
            • Enable Snowflake Table Grants
            • Use Snowflake Data Sharing with Immuta
            • Configure Snowflake Lineage Tag Propagation
            • Enable Snowflake Low Row Access Policy Mode
              • Upgrade Snowflake Low Row Access Policy Mode
        • Reference Guides
          • Snowflake Integration
          • Snowflake Data Sharing
          • Snowflake Lineage Tag Propagation
          • Snowflake Low Row Access Policy Mode
          • Snowflake Table Grants
          • Warehouse Sizing Recommendations
        • Explanatory Guides
          • Phased Snowflake Onboarding
      • Starburst (Trino)
        • Getting Started with Starburst (Trino)
        • How-to Guides
          • Configure Starburst (Trino) Integration
          • Customize Read and Write Access Policies for Starburst (Trino)
        • Starburst (Trino) Integration Reference Guide
      • Queries Immuta Runs in Remote Platforms
      • Legacy Integrations
        • Securing Hive and Impala Without Sentry
        • Enabling ImmutaGroupsMapping
      • Connect Your Data
        • Connections
          • How-to Guides
            • Run Object Sync
            • Manage Connection Settings
            • Use the Connection Upgrade Manager
              • Troubleshooting
          • Reference Guides
            • Connections Reference Guide
            • Upgrading to Connections
              • Before You Begin
              • API Changes
              • FAQ
        • Data Sources
          • Data Sources in Immuta
          • Register Data Sources
            • Amazon S3 Data Source
            • Azure Synapse Analytics Data Source
            • Databricks Data Source
            • Google BigQuery Data Source
            • Redshift Data Source
            • Snowflake Data Source
              • Bulk Create Snowflake Data Sources
            • Starburst (Trino) Data Source
          • Data Source Settings
            • How-to Guides
              • Manage Data Sources and Data Source Settings
              • Manage Data Source Members
              • Manage Access Requests and Tasks
              • Manage Data Dictionary Descriptions
              • Disable Immuta from Sampling Raw Data
            • Data Source Health Checks Reference Guide
          • Schema Monitoring
            • How-to Guides
              • Run Schema Monitoring and Column Detection Jobs
              • Manage Schema Monitoring
            • Reference Guides
              • Schema Monitoring
              • Schema Projects
            • Why Use Schema Monitoring?
    • Manage Data Metadata
      • Connect External Catalogs
        • Getting Started with External Catalogs
        • Configure an External Catalog
        • Reference Guides
          • External Catalogs
          • Custom REST Catalogs
            • Custom REST Catalog Interface Endpoints
      • Data Identification
        • Introduction
        • Getting Started with Data Identification
        • How-to Guides
          • Use Identification
          • Manage Identifiers
          • Run and Manage Identification
          • Manage Identification Frameworks
          • Use Sensitive Data Discovery (SDD)
        • Reference Guides
          • How Competitive Criteria Analysis Works
          • Built-in Identifier Reference
            • Built-In Identifier Changelog
          • Built-in Discovered Tags Reference
      • Data Classification
        • How-to Guides
          • Activate Classification Frameworks
          • Adjust Identification and Classification Framework Tags
          • How to Use a Built-In Classification Framework with Your Own Tags
        • Classification Frameworks Reference Guide
      • Manage Tags
        • How-to Guides
          • Create and Manage Tags
          • Add Tags to Data Sources and Projects
        • Tags Reference Guide
    • Manage Users
      • Getting Started with Users
      • Identity Managers (IAMs)
        • How-to Guides
          • Okta LDAP Interface
          • OpenID Connect
            • OpenID Connect Protocol
            • Okta and OpenID Connect
            • OneLogin with OpenID Connect
          • SAML
            • SAML Protocol
            • Microsoft Entra ID
            • Okta SAML SCIM
        • Reference Guides
          • Identity Managers
          • SAML Single Logout
          • SAML Protocol Configuration Options
      • Immuta Users
        • How-to Guides
          • Managing Personas and Permissions
          • Manage Attributes and Groups
          • User Impersonation
          • External User ID Mapping
          • External User Info Endpoint
        • Reference Guides
          • Attributes and Groups in Immuta
          • Permissions and Personas
    • Organize Data into Domains
      • Getting Started with Domains
      • Domains Reference Guide
    • Application Settings
      • How-to Guides
        • App Settings
        • BI Tools
          • BI Tool Configuration Recommendations
          • Power BI Configuration Example
          • Tableau Configuration Example
        • Add a License Key
        • Add ODBC Drivers
        • Manage Encryption Keys
        • System Status Bundle
      • Reference Guides
        • Data Processing, Encryption, and Masking Practices
        • Metadata Ingestion
  • Governance
    • Introduction
      • Automate Data Access Control Decisions
        • The Two Paths: Orchestrated RBAC and ABAC
        • Managing User Metadata
        • Managing Data Metadata
        • Author Policy
        • Test and Deploy Policy
      • Compliantly Open More Sensitive Data for ML and Analytics
        • Managing User Metadata
        • Managing Data Metadata
        • Author Policy
    • Author Policies for Data Access Control
      • Introduction
        • Scalability and Evolvability
        • Understandability
        • Distributed Stewardship
        • Consistency
        • Availability of Data
      • Policies
        • Authoring Policies at Scale
        • Data Engineering with Limited Policy Downtime
        • Subscription Policies
          • How-to Guides
            • Author a Subscription Policy
            • Author an ABAC Subscription Policy
            • Subscription Policies Advanced DSL Guide
            • Author a Restricted Subscription Policy
            • Clone, Activate, or Stage a Global Policy
          • Reference Guides
            • Subscription Policies
            • Subscription Policy Access Types
            • Advanced Use of Special Functions
        • Data Policies
          • Overview
          • How-to Guides
            • Author a Masking Data Policy
            • Author a Minimization Policy
            • Author a Purpose-Based Restriction Policy
            • Author a Restricted Data Policy
            • Author a Row-Level Policy
            • Author a Time-Based Restriction Policy
            • Policy Certifications and Diffs
          • Reference Guides
            • Data Policy Types
            • Masking Policies
            • Row-Level Policies
            • Custom WHERE Clause Functions
            • Data Policy Conflicts and Fallback
            • Custom Data Policy Certifications
            • Orchestrated Masking Policies
      • Projects and Purpose-Based Access Control
        • Projects and Purpose Controls
          • Getting Started
          • How-to Guides
            • Create a Project
            • Create and Manage Purposes
            • Project Management
              • Manage Projects and Project Settings
              • Manage Project Data Sources
              • Manage Project Members
          • Reference Guides
            • Projects and Purposes
          • Why Use Purposes?
        • Equalized Access
          • Manage Project Equalization
          • Project Equalization Reference Guide
          • Why Use Project Equalization?
        • Masked Joins
          • Enable Masked Joins
          • Why Use Masked Joins?
        • Writing to Projects
          • How-to Guides
            • Create and Manage Snowflake Project Workspaces
            • Create and Manage Databricks Spark Project Workspaces
            • Write Data to the Workspace
          • Reference Guides
            • Project Workspaces
            • Project UDFs (Databricks)
    • Observe Access and Activity
      • Introduction
      • Audit
        • How-to Guides
          • Export Audit Logs to S3
          • Export Audit Logs to ADLS
          • Run Governance Reports
        • Reference Guides
          • Universal Audit Model (UAM)
            • UAM Schema
          • Query Audit Logs
            • Snowflake Query Audit Logs
            • Databricks Unity Catalog Query Audit Logs
            • Databricks Spark Query Audit Logs
            • Starburst (Trino) Query Audit Logs
          • Audit Export GraphQL Reference Guide
          • Governance Report Types
          • Unknown Users in Audit Logs
      • Dashboards
        • Use the Audit Dashboards How-To Guide
        • Audit Dashboards Reference Guide
      • Monitors
        • Manage Monitors and Observations
        • Monitors Reference Guide
    • Access Data
      • Subscribe to a Data Source
      • Query Data
        • Querying Snowflake Data
        • Querying Databricks Data
        • Querying Databricks SQL Data
        • Querying Starburst (Trino) Data
        • Querying Redshift Data
        • Querying Azure Synapse Analytics Data
        • Connect to a Database Tool to Run Ad Hoc Queries
      • Subscribe to Projects
  • Releases
    • Release Notes
      • Immuta v2025.1 Release Notes
        • User Interface Changes in v2025.1 LTS
      • Immuta LTS Changelog
      • Immuta Image Digests
      • Immuta CLI Release Notes
    • Immuta Release Lifecycle
    • Immuta Support Matrix Overview
    • Preview Features
      • Features in Preview
    • Deprecations and EOL
  • Developer Guides
    • The Immuta CLI
      • Install and Configure the Immuta CLI
      • Manage Your Immuta Tenant
      • Manage Data Sources
      • Manage Sensitive Data Discovery
        • Manage Sensitive Data Discovery Rules
        • Manage Identification Frameworks
        • Run Sensitive Data Discovery on Data Sources
      • Manage Policies
      • Manage Projects
      • Manage Purposes
      • Manage Audit
    • The Immuta API
      • Integrations API
        • Getting Started
        • How-to Guides
          • Configure an Amazon S3 Integration
          • Configure an Azure Synapse Analytics Integration
          • Configure a Databricks Unity Catalog Integration
          • Configure a Google BigQuery Integration
          • Configure a Redshift Integration
          • Configure a Snowflake Integration
          • Configure a Starburst (Trino) Integration
        • Reference Guides
          • Integrations API Endpoints
          • Integration Configuration Payload
          • Response Schema
          • HTTP Status Codes and Error Messages
      • Connections API
        • How-to Guides
          • Register a Connection
            • Register a Snowflake Connection
            • Register a Databricks Unity Catalog Connection
            • Register an AWS Lake Formation Connection
          • Manage a Connection
          • Deregister a Connection
        • Connection Registration Payloads Reference Guide
      • Immuta V2 API
        • Data Source Payload Attribute Details
        • Data Source Request Payload Examples
        • Create Policies API Examples
        • Create Projects API Examples
        • Create Purposes API Examples
      • Immuta V1 API
        • Authenticate with the API
        • Configure Your Instance of Immuta
          • Get Job Status
          • Manage Frameworks
          • Manage IAMs
          • Manage Licenses
          • Manage Notifications
          • Manage Tags
          • Manage Webhooks
          • Search Filters
          • Manage Identification
            • Identification Frameworks to Identifiers in Domains
            • Manage Sensitive Data Discovery (SDD)
        • Connect Your Data
          • Create and Manage an Amazon S3 Data Source
          • Create an Azure Synapse Analytics Data Source
          • Create an Azure Blob Storage Data Source
          • Create a Databricks Data Source
          • Create a Presto Data Source
          • Create a Redshift Data Source
          • Create a Snowflake Data Source
          • Create a Starburst (Trino) Data Source
          • Manage the Data Dictionary
        • Use Domains
        • Manage Data Access
          • Manage Access Requests
          • Manage Data and Subscription Policies
          • Manage Write Policies
            • Write Policies Payloads and Response Schema Reference Guide
          • Policy Handler Objects
          • Search Connection Strings
          • Search for Organizations
          • Search Schemas
        • Subscribe to and Manage Data Sources
        • Manage Projects and Purposes
          • Manage Projects
          • Manage Purposes
        • Generate Governance Reports
Powered by GitBook

Other versions

  • SaaS
  • 2025.1
  • 2024.3
  • 2024.2

Copyright © 2014-2024 Immuta Inc. All rights reserved.

On this page
  • ABAC vs RBAC
  • Separating policy definition from role definition
  • Policy boolean logic
  • Exception-based policy authoring
  • Hierarchical tag-based policy definitions
  • Subscription policies: benefits of attribute-based table GRANTs

Was this helpful?

Export as PDF
  1. Governance
  2. Author Policies for Data Access Control
  3. Introduction

Scalability and Evolvability

Last updated 1 month ago

Was this helpful?

ABAC vs RBAC

Do you find yourself spending too much time managing roles and defining permissions in your system? When there are new requests for data, or a policy change, does this cause you to spend an inordinate amount of time to make those changes? Scalability and evolvability will completely remove this burden. When you have a scalable and evolvable data policy management system, it allows you to make changes that impact hundreds if not thousands of tables at once, accurately. It also allows you to evolve your policies over time with minor changes or no changes at all, through future-proof policy logic.

Lack of scalability and evolvability are rooted in the fact that you are attempting to apply a coarse role-based access control (RBAC) model to your modern data architecture. Using Apache Ranger, a well known legacy RBAC system built for Hadoop, as an example, independent research has shown the explosion of management required to do the most basic of tasks with an RBAC system: .

In a scalable solution such as Immuta, that count of policy changes required will remain extremely low, providing the scalability and evolvability. GigaOm researched this exactly, comparing Immuta’s ABAC model to what they called Ranger’s RBAC with Object Tagging (OT-RBAC) model and showed a 75 times increase in policy management with Ranger.

  • Value to you: You have more time to spend on the complex tasks you should be spending time on and you don’t fear making a policy change.

  • Value to the business: Policies can be easily enforced and evolved, allowing the business to be more agile and decrease time-to-data across your organization and avoid errors.

Separating policy definition from role definition

When building access control into our database platforms, the concept of role-based access control (RBAC) is familiar. Roles both define who is in them, but also determine what those roles get access to. A good way to think about this is roles conflate the who and what: who is in them and what they have access to (but lack the why).

In contrast, attribute-based access control (ABAC) allows you to decouple your roles from what they have access to, essentially separating the what and why from the who, which also allows you to explicitly explain the “why” in the policy. This gives you an incredible amount of scalability and understandability in policy building. Note this does not mean you have to throw away your roles necessarily, you can make them more powerful and likely scale them back significantly.

If you remember this picture and article from the start of this introduction, most of the Ranger, Snowflake, Databricks, etc. access control scalability issues are rooted in the fact that it’s an RBAC model vs ABAC model.

Example: Building row-level security with an ABAC model

Consider that you have a table which contains a transaction_country column and you have data localization needs which requires you to limit specific countries to specific users.

With a classic RBAC approach, you would need to create a role for every permutation of country access. Remember that it's not necessarily just a role per country, because some users may need access to more than one country. Every time a new permutation of country combination is required, a new role must be managed to represent that access.

With Immuta's ABAC approach, since Immuta is able to decouple policy logic from users, you can simply assign users countries and Immuta will filter appropriately on the fly. This can be done with a single policy in Immuta which references the user country metadata. If you add a new user with a never before seen combination of countries, in the RBAC model, you would have to remember to create a new role and policy for them to see data. In the ABAC model it will “just work” since everything is dynamic - future proofing your policies.

Policy boolean logic

The only way to support AND boolean logic with a role-based model (RBAC) is by creating a new role that conflates the two or more roles you want to AND together.

For example, a governor wants users to only see certain data if they have security awareness training and have consumer privacy training. It would be natural to assume you need both separately as metadata attached to users to drive the policy. However, when you build policies in a role based model, it assumes roles are either OR’ed together in the policy logic or you can only act under one role at a time, and because of this, you will have to create a single role to represent this combination of requirements “users with security awareness training AND consumer privacy training.” This is completely silly and unmanageable - you need to account for every possible combination relevant to a policy, and you have no way of knowing that ahead of time.

With Immuta and its ABAC model, you are able to keep user attributes as meaningful separate facts about the users and then use boolean logic to combine those facts in policy logic. As an example, consider the country filtering policy described in the prior section: you could build the filtering, as described, but additionally add an exception such as "do this filtering for everyone except members of group security awareness training and members of group consumer privacy training" without the need to create a new role that represents those combined.

Exception-based policy authoring

This next section draws on an analogy: Imagine you are planning your wedding reception. It’s a rather posh affair, so you have a bouncer checking people at the door.

Do you tell your bouncer who’s allowed in? (exception-based) Or, do you tell the bouncer who to keep out? (rejection-based)

What this means in practice is that you should define what should be hidden from everyone, and then slowly peel back exceptions as needed.

How could your data leak if it wasn’t exception based?

What if you did two policies:

  • Mask Person Name using hashing for everyone who possesses attribute Department HR.

  • Mask Person Name using constant REDACTED for everyone who possesses attribute Department Analytics.

Now, some user comes along who is in Department Finance - guess what, they will see the Person Name columns in the clear because they were not accounted for, just like the bouncer would let them into your wedding because you didn’t think ahead of time to add them to your deny list.

There are two main issues with allowing bi-directional policies, which is why Immuta only allows exception-based policies, aligning to the industry standard of least privileged access:

  1. Ripe for data leaks: Rejection-based policies are extremely dangerous and why Immuta does not allow them except with a catch-all OTHERWISE statement at the end. Again this is because if a new role/attribute comes along that you haven’t accounted for, that data will be leaked. It is impossible for you to anticipate every possible user/attribute/group that could possibly exist ahead of time just like it’s impossible for you to anticipate any person off the street that could try to enter your posh wedding that you would have to account for on your deny list.

  2. Ripe for conflicts and confusion: Tools that specifically allow both rejection-based and exception-based policy building create a conflict disaster. Let’s walk through a simple example, noting this is very simple, imagine if you have hundreds of these policies:

    • Policy 1: mask name for everyone who is member of group A

    • Policy 2: mask name for everyone except members of group B

What happens if someone is in both groups A and B? The policy will have to fall back on policy ordering to avoid this conflict, which requires users to understand all other policies before building their policy and it is nearly impossible to understand what a single policy does without looking at all policies.

Hierarchical tag-based policy definitions

While many platforms support the concept of object tagging / sensitive data tagging, very few truly support hierarchical tag structures.

First, a quick overview of what hierarchical tag structure means:

This would be a flat tag structure:

  • SUV

  • Subaru

  • Truck

  • Jeep

  • Gladiator

  • Outback

Each tag stands on its own and is not associated with one another in any way; there’s no correlation between Jeep and Gladiator nor Subaru and Outback.

A hierarchical tagging structure establishes these relationships:

  • SUV.Subaru.Outback

  • Truck.Jeep.Gladiator

Support for a tagging hierarchy is more than just supporting the tag structure itself. More importantly, policy enforcement should respect the hierarchy as well. Let’s run through a quick contrived example; you want the following policies:

  • Mask by making null any SUV data

  • Mask using hashing any Outback data

With a flat structure, if you build those policies they will be in conflict with one another. To avoid that problem you would have to order which policies take precedence, which can get extremely complex when you have many policies.

Instead, if your policy engine truly supports a tagging hierarchy, like Immuta does, it will recognize that Outback is more specific than SUV, and have that policy take precedence.

  • Mask by making null any SUV data

  • Mask using hashing any SUV.Subaru.Outback data

Policies are applied correctly without any need for complex ordering of policies.

Yes, this does put some work on the business to correctly build specificity, or depth, into their tagging hierarchy. This is not necessarily easy; however, the logic will have to live somewhere, and having it in the tagging hierarchy rather than policy order again allows you to separate policy definition from data definition. This provides you scalability, evolvability, understandability, and, most importantly, correctness because policy conflicts can be caught at policy-authoring-time.

Subscription policies: benefits of attribute-based table GRANTs

There are a myriad of techniques and processes companies use to determine what users should have access to which tables. Some organizations previously had 7 people responding to an email chain for approval before a DBA runs a table GRANT statement, for example. Manual approvals are sometimes necessary, of course, but there’s a lot of power and consistency in establishing objective criteria for gaining access to a table rather than subjective human approvals.

Let’s take the “7 people approve with an email chain” example. Ask the question, “Why do any of you 7 say yes to the user gaining access?” If it’s objective criteria, you can completely automate this process. For example, if the approver says, “I approve them because they are in group x and work in the US,” that is user metadata that could allow the user to automatically gain access to the tables, either ahead of time or when requested. This removes a huge burden from your organization and avoids mistakes.

Being objective is always better than subjective: it increases accuracy, removes bias, eliminates errors, and proves compliance. If you can be objective and prescriptive about who should gain access to what tables - you should.

The anti-pattern is manual approvals. Although there are some regulatory requirements for this, if there’s any possible way to switch to objective approvals, you should do it. With subjective human-driven approvals, there is bias, larger chance for errors, and no consistency - this makes it very difficult to prove compliance and is simply passing the buck (and risk) to the approvers and wasting their valuable time.

One could argue that it’s subjective or biased to assign a user the Country.JP attribute. This is not true, because, remember, data policy is separated from user metadata. The act of giving a user the Country.JP attribute is simply defining that user - it is a fact about that user, there is no implied access given to the user from this act and that attribute will be objective - e.g., you know if they are in Japan or not.

The approach where an access decision is conflated with a role or group is common practice. So not only do you end up with manual approval flows, but you also end up with role explosion from so many roles to meet every combination of access.

For more discussion about this model, see the blog or the NIST article on ABAC, .

The answer to that question should be obvious, but many policy engines allow both exception- and rejection-based policy authoring, which causes a conflict nightmare. Exception-based policy authoring in our wedding analogy means the bouncer has a list of who should be let into the reception. This will always be a shorter list of users/roles if following the , which is the idea that any user, program, or process should have only the bare minimum privileges necessary to perform its function - you can’t go to the wedding unless invited. This aligns with the concept of , the foundation of the CPRA and GDPR, which states “Privacy as the default setting.”

https://gigaom.com/report/cloud-data-security/
Role-Based Access Control vs. Attribute-Based Access Control — Explained
Guide to Attribute Based Access Control (ABAC) Definition and Considerations
principle of least privilege
privacy by design
Apache Ranger Evaluation for Cloud Migration and Adoption Readiness
Cumulative Ranger Policy Changes
Cumulative Ranger Policy Changes