LogoLogo
2024.3
  • Immuta Documentation - 2024.3
  • What is Immuta?
  • Self-Managed Deployment
    • Requirements
    • Install
      • Managed Public Cloud
      • Red Hat OpenShift
    • Upgrade
      • Migrating to the New Helm Chart
      • Upgrading (IEHC)
      • Upgrading (IHC)
    • Guides
      • Ingress Configuration
      • TLS Configuration
      • Cosign Verification
      • Production Best Practices
      • Rotating Credentials
      • External Cache Configuration
      • Enabling Legacy Query Engine and Fingerprint
      • Private Container Registries
      • Air-Gapped Environments
    • Disaster Recovery
    • Troubleshooting
    • Conventions
  • Integrations
    • Immuta Integrations
    • Snowflake
      • Getting Started
      • How-to Guides
        • Configure a Snowflake Integration
        • Snowflake Table Grants Migration
        • Edit or Remove Your Snowflake Integration
        • Integration Settings
          • Enable Snowflake Table Grants
          • Use Snowflake Data Sharing with Immuta
          • Configure Snowflake Lineage Tag Propagation
          • Enable Snowflake Low Row Access Policy Mode
            • Upgrade Snowflake Low Row Access Policy Mode
      • Reference Guides
        • Snowflake Integration
        • Snowflake Data Sharing
        • Snowflake Lineage Tag Propagation
        • Snowflake Low Row Access Policy Mode
        • Snowflake Table Grants
        • Warehouse Sizing Recommendations
      • Phased Snowflake Onboarding Concept Guide
    • Databricks Unity Catalog
      • Getting Started
      • How-to Guides
        • Configure a Databricks Unity Catalog Integration
        • Migrate to Unity Catalog
      • Databricks Unity Catalog Integration Reference Guide
    • Databricks Spark
      • How-to Guides
        • Configuration
          • Simplified Databricks Spark Configuration
          • Manual Databricks Spark Configuration
          • Manually Update Your Databricks Cluster
          • Install a Trusted Library
        • DBFS Access
        • Limited Enforcement in Databricks Spark
        • Hide the Immuta Database in Databricks
        • Run spark-submit Jobs on Databricks
        • Configure Project UDFs Cache Settings
        • External Metastores
      • Reference Guides
        • Databricks Spark Integration
        • Databricks Spark Pre-Configuration Details
        • Configuration Settings
          • Databricks Spark Cluster Policies
            • Python & SQL
            • Python & SQL & R
            • Python & SQL & R with Library Support
            • Scala
            • Sparklyr
          • Environment Variables
          • Ephemeral Overrides
          • Py4j Security Error
          • Scala Cluster Security Details
          • Databricks Security Configuration for Performance
        • Databricks Change Data Feed
        • Databricks Libraries Introduction
        • Delta Lake API
        • Spark Direct File Reads
        • Databricks Metastore Magic
    • Starburst (Trino)
      • Getting Started
      • How-to Guides
        • Configure Starburst (Trino) Integration
        • Customize Read and Write Access Policies for Starburst (Trino)
      • Starburst (Trino) Integration Reference Guide
    • Redshift
      • Getting Started
      • How-to Guides
        • Configure Redshift Integration
        • Configure Redshift Spectrum
      • Reference Guides
        • Redshift Integration
        • Redshift Pre-Configuration Details
    • Azure Synapse Analytics
      • Getting Started
      • Configure Azure Synapse Analytics Integration
      • Reference Guides
        • Azure Synapse Analytics Integration
        • Azure Synapse Analytics Pre-Configuration Details
    • Amazon S3
    • Google BigQuery
    • Legacy Integrations
      • Securing Hive and Impala Without Sentry
      • Enabling ImmutaGroupsMapping
    • Catalogs
      • Getting Started with External Catalogs
      • Configure an External Catalog
      • Reference Guides
        • External Catalogs
        • Custom REST Catalogs
          • Custom REST Catalog Interface Endpoints
  • Data
    • Registering Metadata
      • Data Sources in Immuta
      • Register Data Sources
        • Create a Data Source
        • Create an Amazon S3 Data Source
        • Create a Google BigQuery Data Source
        • Bulk Create Snowflake Data Sources
      • Data Source Settings
        • How-to Guides
          • Manage Data Sources and Data Source Settings
          • Manage Data Source Members
          • Manage Access Requests and Tasks
          • Manage Data Dictionary Descriptions
          • Disable Immuta from Sampling Raw Data
        • Data Source Health Checks Reference Guide
      • Schema Monitoring
        • How-to Guides
          • Run Schema Monitoring and Column Detection Jobs
          • Manage Schema Monitoring
        • Reference Guides
          • Schema Monitoring
          • Schema Projects
        • Why Use Schema Monitoring?
    • Domains
      • Getting Started with Domains
      • Domains Reference Guide
    • Tags
      • How-to Guides
        • Create and Manage Tags
        • Add Tags to Data Sources and Projects
      • Tags Reference Guide
  • People
    • Getting Started
    • Identity Managers (IAMs)
      • How-to Guides
        • Okta LDAP Interface
        • OpenID Connect
          • OpenID Connect Protocol
          • Okta and OpenID Connect
          • OneLogin with OpenID
        • SAML
          • SAML Protocol
          • Microsoft Entra ID
          • Okta SAML SCIM
      • Reference Guides
        • Identity Managers
        • SAML Single Logout
        • SAML Protocol Configuration Options
    • Immuta Users
      • How-to Guides
        • Managing Personas and Permissions
        • Manage Attributes and Groups
        • User Impersonation
        • External User ID Mapping
        • External User Info Endpoint
      • Reference Guides
        • Attributes and Groups in Immuta
        • Permissions and Personas
  • Discover Your Data
    • Getting Started with Discover
    • Introduction
    • Data Discovery
      • How-to Guides
        • Enable Sensitive Data Discovery (SDD)
        • Manage Identification Frameworks
        • Manage Identifiers
        • Run and Manage SDD on Data Sources
        • Manage Sensitive Data Discovery Settings
        • Migrate From Legacy to Native SDD
      • Reference Guides
        • How Competitive Criteria Analysis Works
        • Built-in Identifier Reference
        • Built-in Discovered Tags Reference
    • Data Classification
      • How-to Guides
        • Activate Classification Frameworks
        • Adjust Identification and Classification Framework Tags
        • How to Use a Built-In Classification Framework with Your Own Tags
      • Built-in Classification Frameworks Reference Guide
  • Detect Your Activity
    • Getting Started with Detect
      • Monitor and Secure Sensitive Data Platform Query Activity
        • User Identity Best Practices
        • Integration Architecture
        • Snowflake Roles Best Practices
        • Register Data Sources
        • Automate Entity and Sensitivity Discovery
        • Detect with Discover: Onboarding Guide
        • Using Immuta Detect
      • General Immuta Configuration
        • User Identity Best Practices
        • Integration Architecture
        • Databricks Roles Best Practices
        • Register Data Sources
    • Introduction
    • Audit
      • How-to Guides
        • Export Audit Logs to S3
        • Export Audit Logs to ADLS
        • Run Governance Reports
      • Reference Guides
        • Universal Audit Model (UAM)
          • UAM Schema
        • Query Audit Logs
          • Snowflake Query Audit Logs
          • Databricks Unity Catalog Query Audit Logs
          • Databricks Spark Query Audit Logs
          • Starburst (Trino) Query Audit Logs
        • Audit Export GraphQL Reference Guide
        • Governance Report Types
        • Unknown Users in Audit Logs
      • Deprecated Audit Guides
        • Legacy to UAM Migration
        • Download Audit Logs
        • System Audit Logs
    • Dashboards
      • Use the Detect Dashboards How-To Guide
      • Detect Dashboards Reference Guide
    • Monitors
      • Manage Monitors and Observations
      • Detect Monitors Reference Guide
  • Secure Your Data
    • Getting Started with Secure
      • Automate Data Access Control Decisions
        • The Two Paths: Orchestrated RBAC and ABAC
        • Managing User Metadata
        • Managing Data Metadata
        • Author Policy
        • Test and Deploy Policy
      • Compliantly Open More Sensitive Data for ML and Analytics
        • Managing User Metadata
        • Managing Data Metadata
        • Author Policy
      • Federated Governance for Data Mesh and Self-Serve Data Access
        • Defining Domains
        • Managing Data Products
        • Managing Data Metadata
        • Apply Federated Governance
        • Discover and Subscribe to Data Products
    • Introduction
      • Scalability and Evolvability
      • Understandability
      • Distributed Stewardship
      • Consistency
      • Availability of Data
    • Authoring Policies in Secure
      • Authoring Policies at Scale
      • Data Engineering with Limited Policy Downtime
      • Subscription Policies
        • How-to Guides
          • Author a Subscription Policy
          • Author an ABAC Subscription Policy
          • Subscription Policies Advanced DSL Guide
          • Author a Restricted Subscription Policy
          • Clone, Activate, or Stage a Global Policy
        • Reference Guides
          • Subscription Policies
          • Subscription Policy Access Types
          • Advanced Use of Special Functions
      • Data Policies
        • Overview
        • How-to Guides
          • Author a Masking Data Policy
          • Author a Minimization Policy
          • Author a Purpose-Based Restriction Policy
          • Author a Restricted Data Policy
          • Author a Row-Level Policy
          • Author a Time-Based Restriction Policy
          • Certifications Exemptions and Diffs
          • External Masking Interface
        • Reference Guides
          • Data Policy Types
          • Masking Policies
          • Row-Level Policies
          • Custom WHERE Clause Functions
          • Data Policy Conflicts and Fallback
          • Custom Data Policy Certifications
          • Orchestrated Masking Policies
    • Projects and Purpose-Based Access Control
      • Projects and Purpose Controls
        • Getting Started
        • How-to Guides
          • Create a Project
          • Create and Manage Purposes
          • Adjust a Policy
          • Project Management
            • Manage Projects and Project Settings
            • Manage Project Data Sources
            • Manage Project Members
        • Reference Guides
          • Projects and Purposes
          • Policy Adjustments
        • Why Use Purposes?
      • Equalized Access
        • Manage Project Equalization
        • Project Equalization Reference Guide
        • Why Use Project Equalization?
      • Masked Joins
        • Enable Masked Joins
        • Why Use Masked Joins?
      • Writing to Projects
        • How-to Guides
          • Create and Manage Snowflake Project Workspaces
          • Create and Manage Databricks Spark Project Workspaces
          • Write Data to the Workspace
        • Reference Guides
          • Project Workspaces
          • Project UDFs (Databricks)
    • Data Consumers
      • Subscribe to a Data Source
      • Query Data
        • Querying Snowflake Data
        • Querying Databricks Data
        • Querying Databricks SQL Data
        • Querying Starburst (Trino) Data
        • Querying Redshift Data
        • Querying Azure Synapse Analytics Data
      • Subscribe to Projects
  • Application Settings
    • How-to Guides
      • App Settings
      • BI Tools
        • BI Tool Configuration Recommendations
        • Power BI Configuration Example
        • Tableau Configuration Example
      • Add a License Key
      • Add ODBC Drivers
      • Manage Encryption Keys
      • System Status Bundle
    • Reference Guides
      • Data Processing, Encryption, and Masking Practices
      • Metadata Ingestion
  • Releases
    • Immuta v2024.3 Release Notes
    • Immuta Release Lifecycle
    • Immuta LTS Changelog
    • Immuta Support Matrix Overview
    • Immuta CLI Release Notes
    • Immuta Image Digests
    • Preview Features
      • Features in Preview
    • Deprecations
  • Developer Guides
    • The Immuta CLI
      • Install and Configure the Immuta CLI
      • Manage Your Immuta Tenant
      • Manage Data Sources
      • Manage Sensitive Data Discovery
        • Manage Sensitive Data Discovery Rules
        • Manage Identification Frameworks
        • Run Sensitive Data Discovery on Data Sources
      • Manage Policies
      • Manage Projects
      • Manage Purposes
      • Manage Audit
    • The Immuta API
      • Integrations API
        • Getting Started
        • How-to Guides
          • Configure an Amazon S3 Integration
          • Configure an Azure Synapse Analytics Integration
          • Configure a Databricks Unity Catalog Integration
          • Configure a Google BigQuery Integration
          • Configure a Redshift Integration
          • Configure a Snowflake Integration
          • Configure a Starburst (Trino) Integration
        • Reference Guides
          • Integrations API Endpoints
          • Integration Configuration Payload
          • Response Schema
          • HTTP Status Codes and Error Messages
      • Immuta V2 API
        • Data Source Payload Attribute Details
        • Data Source Request Payload Examples
        • Create Policies API Examples
        • Create Projects API Examples
        • Create Purposes API Examples
      • Immuta V1 API
        • Authenticate with the API
        • Configure Your Instance of Immuta
          • Get Fingerprint Status
          • Get Job Status
          • Manage Frameworks
          • Manage IAMs
          • Manage Licenses
          • Manage Notifications
          • Manage Sensitive Data Discovery (SDD)
          • Manage Tags
          • Manage Webhooks
          • Search Filters
        • Connect Your Data
          • Create and Manage an Amazon S3 Data Source
          • Create an Azure Synapse Analytics Data Source
          • Create an Azure Blob Storage Data Source
          • Create a Databricks Data Source
          • Create a Presto Data Source
          • Create a Redshift Data Source
          • Create a Snowflake Data Source
          • Create a Starburst (Trino) Data Source
          • Manage the Data Dictionary
        • Manage Data Access
          • Manage Access Requests
          • Manage Data and Subscription Policies
          • Manage Domains
          • Manage Write Policies
            • Write Policies Payloads and Response Schema Reference Guide
          • Policy Handler Objects
          • Search Audit Logs
          • Search Connection Strings
          • Search for Organizations
          • Search Schemas
        • Subscribe to and Manage Data Sources
        • Manage Projects and Purposes
          • Manage Projects
          • Manage Purposes
        • Generate Governance Reports
Powered by GitBook

Other versions

  • SaaS
  • 2024.3
  • 2024.2

Copyright © 2014-2024 Immuta Inc. All rights reserved.

On this page
  • ABAC vs RBAC
  • Separating policy definition from role definition
  • Policy boolean logic
  • Exception-based policy authoring
  • Hierarchical tag-based policy definitions
  • Subscription policies: benefits of attribute-based table GRANTs

Was this helpful?

Export as PDF
  1. Secure Your Data
  2. Introduction

Scalability and Evolvability

Was this helpful?

ABAC vs RBAC

Do you find yourself spending too much time managing roles and defining permissions in your system? When there are new requests for data, or a policy change, does this cause you to spend an inordinate amount of time to make those changes? Scalability and evolvability will completely remove this burden. When you have a scalable and evolvable data policy management system, it allows you to make changes that impact hundreds if not thousands of tables at once, accurately. It also allows you to evolve your policies over time with minor changes or no changes at all, through future-proof policy logic.

Lack of scalability and evolvability are rooted in the fact that you are attempting to apply a coarse role-based access control (RBAC) model to your modern data architecture. Using Apache Ranger, a well known legacy RBAC system built for Hadoop, as an example, independent research has shown the explosion of management required to do the most basic of tasks with an RBAC system: .

In a scalable solution such as Immuta, that count of policy changes required will remain extremely low, providing the scalability and evolvability. GigaOm researched this exactly, comparing Immuta’s ABAC model to what they called Ranger’s RBAC with Object Tagging (OT-RBAC) model and showed a 75 times increase in policy management with Ranger.

  • Value to you: You have more time to spend on the complex tasks you should be spending time on and you don’t fear making a policy change.

  • Value to the business: Policies can be easily enforced and evolved, allowing the business to be more agile and decrease time-to-data across your organization and avoid errors.

Separating policy definition from role definition

When building access control into our database platforms, the concept of role-based access control (RBAC) is familiar. Roles both define who is in them, but also determine what those roles get access to. A good way to think about this is roles conflate the who and what: who is in them and what they have access to (but lack the why).

In contrast, attribute-based access control (ABAC) allows you to decouple your roles from what they have access to, essentially separating the what and why from the who, which also allows you to explicitly explain the “why” in the policy. This gives you an incredible amount of scalability and understandability in policy building. Note this does not mean you have to throw away your roles necessarily, you can make them more powerful and likely scale them back significantly.

If you remember this picture and article from the start of this introduction, most of the Ranger, Snowflake, Databricks, etc. access control scalability issues are rooted in the fact that it’s an RBAC model vs ABAC model.

Example: Building row-level security with an ABAC model

Consider that you have a table which contains a transaction_country column and you have data localization needs which requires you to limit specific countries to specific users.

With a classic RBAC approach, you would need to create a role for every permutation of country access. Remember that it's not necessarily just a role per country, because some users may need access to more than one country. Every time a new permutation of country combination is required, a new role must be managed to represent that access.

With Immuta's ABAC approach, since Immuta is able to decouple policy logic from users, you can simply assign users countries and Immuta will filter appropriately on the fly. This can be done with a single policy in Immuta which references the user country metadata. If you add a new user with a never before seen combination of countries, in the RBAC model, you would have to remember to create a new role and policy for them to see data. In the ABAC model it will “just work” since everything is dynamic - future proofing your policies.

Policy boolean logic

The only way to support AND boolean logic with a role-based model (RBAC) is by creating a new role that conflates the two or more roles you want to AND together.

For example, a governor wants users to only see certain data if they have security awareness training and have consumer privacy training. It would be natural to assume you need both separately as metadata attached to users to drive the policy. However, when you build policies in a role based model, it assumes roles are either OR’ed together in the policy logic or you can only act under one role at a time, and because of this, you will have to create a single role to represent this combination of requirements “users with security awareness training AND consumer privacy training.” This is completely silly and unmanageable - you need to account for every possible combination relevant to a policy, and you have no way of knowing that ahead of time.

With Immuta and its ABAC model, you are able to keep user attributes as meaningful separate facts about the users and then use boolean logic to combine those facts in policy logic. As an example, consider the country filtering policy described in the prior section: you could build the filtering, as described, but additionally add an exception such as "do this filtering for everyone except members of group security awareness training and members of group consumer privacy training" without the need to create a new role that represents those combined.

Exception-based policy authoring

This next section draws on an analogy: Imagine you are planning your wedding reception. It’s a rather posh affair, so you have a bouncer checking people at the door.

Do you tell your bouncer who’s allowed in? (exception-based) Or, do you tell the bouncer who to keep out? (rejection-based)

What this means in practice is that you should define what should be hidden from everyone, and then slowly peel back exceptions as needed.

How could your data leak if it wasn’t exception based?

What if you did two policies:

  • Mask Person Name using hashing for everyone who possesses attribute Department HR.

  • Mask Person Name using constant REDACTED for everyone who possesses attribute Department Analytics.

Now, some user comes along who is in Department Finance - guess what, they will see the Person Name columns in the clear because they were not accounted for, just like the bouncer would let them into your wedding because you didn’t think ahead of time to add them to your deny list.

There are two main issues with allowing bi-directional policies, which is why Immuta only allows exception-based policies, aligning to the industry standard of least privileged access:

  1. Ripe for data leaks: Rejection-based policies are extremely dangerous and why Immuta does not allow them except with a catch-all OTHERWISE statement at the end. Again this is because if a new role/attribute comes along that you haven’t accounted for, that data will be leaked. It is impossible for you to anticipate every possible user/attribute/group that could possibly exist ahead of time just like it’s impossible for you to anticipate any person off the street that could try to enter your posh wedding that you would have to account for on your deny list.

  2. Ripe for conflicts and confusion: Tools that specifically allow both rejection-based and exception-based policy building create a conflict disaster. Let’s walk through a simple example, noting this is very simple, imagine if you have hundreds of these policies:

    • Policy 1: mask name for everyone who is member of group A

    • Policy 2: mask name for everyone except members of group B

What happens if someone is in both groups A and B? The policy will have to fall back on policy ordering to avoid this conflict, which requires users to understand all other policies before building their policy and it is nearly impossible to understand what a single policy does without looking at all policies.

Hierarchical tag-based policy definitions

While many platforms support the concept of object tagging / sensitive data tagging, very few truly support hierarchical tag structures.

First, a quick overview of what hierarchical tag structure means:

This would be a flat tag structure:

  • SUV

  • Subaru

  • Truck

  • Jeep

  • Gladiator

  • Outback

Each tag stands on its own and is not associated with one another in any way; there’s no correlation between Jeep and Gladiator nor Subaru and Outback.

A hierarchical tagging structure establishes these relationships:

  • SUV.Subaru.Outback

  • Truck.Jeep.Gladiator

Support for a tagging hierarchy is more than just supporting the tag structure itself. More importantly, policy enforcement should respect the hierarchy as well. Let’s run through a quick contrived example; you want the following policies:

  • Mask by making null any SUV data

  • Mask using hashing any Outback data

With a flat structure, if you build those policies they will be in conflict with one another. To avoid that problem you would have to order which policies take precedence, which can get extremely complex when you have many policies.

Instead, if your policy engine truly supports a tagging hierarchy, like Immuta does, it will recognize that Outback is more specific than SUV, and have that policy take precedence.

  • Mask by making null any SUV data

  • Mask using hashing any SUV.Subaru.Outback data

Policies are applied correctly without any need for complex ordering of policies.

Yes, this does put some work on the business to correctly build specificity, or depth, into their tagging hierarchy. This is not necessarily easy; however, the logic will have to live somewhere, and having it in the tagging hierarchy rather than policy order again allows you to separate policy definition from data definition. This provides you scalability, evolvability, understandability, and, most importantly, correctness because policy conflicts can be caught at policy-authoring-time.

Subscription policies: benefits of attribute-based table GRANTs

There are a myriad of techniques and processes companies use to determine what users should have access to which tables. Some customers have had 7 people responding to an email chain for approval before a DBA runs a table GRANT statement, for example. Manual approvals are sometimes necessary, of course, but there’s a lot of power and consistency in establishing objective criteria for gaining access to a table rather than subjective human approvals.

Let’s take the “7 people approve with an email chain” example. Ask the question, “Why do any of you 7 say yes to the user gaining access?” If it’s objective criteria, you can completely automate this process. For example, if the approver says, “I approve them because they are in group x and work in the US,” that is user metadata that could allow the user to automatically gain access to the tables, either ahead of time or when requested. This removes a huge burden from your organization and avoids mistakes.

Being objective is always better than subjective: it increases accuracy, removes bias, eliminates errors, and proves compliance. If you can be objective and prescriptive about who should gain access to what tables - you should.

The anti-pattern is manual approvals. Although there are some regulatory requirements for this, if there’s any possible way to switch to objective approvals, you should do it. With subjective human-driven approvals, there is bias, larger chance for errors, and no consistency - this makes it very difficult to prove compliance and is simply passing the buck (and risk) to the approvers and wasting their valuable time.

One could argue that it’s subjective or biased to assign a user the Country.JP attribute. This is not true, because, remember, data policy is separated from user metadata. The act of giving a user the Country.JP attribute is simply defining that user - it is a fact about that user, there is no implied access given to the user from this act and that attribute will be objective - e.g., you know if they are in Japan or not.

The approach where an access decision is conflated with a role or group is common practice. So not only do you end up with manual approval flows, but you also end up with role explosion from so many roles to meet every combination of access.

For more discussion about this model, see the blog or the NIST article on ABAC, .

The answer to that question should be obvious, but many policy engines allow both exception- and rejection-based policy authoring, which causes a conflict nightmare. Exception-based policy authoring in our wedding analogy means the bouncer has a list of who should be let into the reception. This will always be a shorter list of users/roles if following the , which is the idea that any user, program, or process should have only the bare minimum privileges necessary to perform its function - you can’t go to the wedding unless invited. This aligns with the concept of , the foundation of the CPRA and GDPR, which states “Privacy as the default setting.”

https://gigaom.com/report/cloud-data-security/
Role-Based Access Control vs. Attribute-Based Access Control — Explained
Guide to Attribute Based Access Control (ABAC) Definition and Considerations
principle of least privilege
privacy by design
Apache Ranger Evaluation for Cloud Migration and Adoption Readiness
Cumulative Ranger Policy Changes
Cumulative Ranger Policy Changes