1 of 100

SaaS

Immuta Documentation - SaaS

One platform to optimize how you access and control data.

Immuta gives everyone fast, governed access to data with the built-in controls, collaboration workflows, automated provisioning, and continuous monitoring you need to keep risk low and compliance high.

Configure Immuta

Explore Immuta

Configuration

Connect Data Platforms

Immuta integrates with your data platforms so you can register your data and effectively manage access controls on that data.

This section includes concept, reference, and how-to guides for configuring your data platform integration and registering data sources so that you can discover, monitor, and protect sensitive data.

This reference guide outlines the features, policies, and audit capabilities of each data platform Immuta supports.

This page includes how-to and reference content for Amazon S3 and how it integrates with Immuta.

This section includes how-to and reference guides for AWS Lake Formation and how it connects with Immuta.

This section includes how-to and reference guides for Azure Synapse Analytics and how it integrates with Immuta.

This page includes how-to and reference content for Google BigQuery and how it integrates with Immuta.

This section includes how-to and reference guides for Redshift and how it integrates with Immuta.

This section includes how-to and reference guides for Snowflake and how it integrates with Immuta.

This section includes how-to and reference guides for Starburst (Trino) and how it integrates with Immuta.

This reference guide outlines the actions and features that trigger Immuta queries in your remote platform that may incur cost.

Immuta integrates with your data platforms so you can register your data and effectively manage access controls on that data. This section includes concept, reference, and how-to guides for registering and managing data sources.

Data Platforms Overview

Immuta does not require users to learn a new API or language to access protected data. Instead, Immuta integrates with existing tools and ongoing work while remaining invisible to downstream consumers.

The following data platforms integrate with Immuta:

Databricks:

Feature support

The table below outlines the features supported by each of Immuta's integrations.

Project workspaces

Tag ingestion

User impersonation

Query audit

Multiple integrations

Amazon S3

AWS Lake Formation

Azure Synapse Analytics

Databricks Spark

Databricks Unity Catalog

Google BigQuery

Redshift

Snowflake

Starburst

Data policy support

On Databricks data sources, joins will not be allowed on data protected with replace with NULL or constant policies.
Databricks Unity Catalog ARRAY, MAP, or STRUCT type columns only support masking with NULL.

Audit support for platform queries

The table below outlines what information is included in the query audit logs for each integration where query audit is supported.

Snowflake

Databricks Spark

Databricks Unity Catalog

Starburst (Trino)

Table and user coverage

Registered data sources and users

All tables and users

Registered data sources and users

Object queried

Columns returned

Query text

Unauthorized information

Policy details

User's entitlements

Column tags

Table tags

Legend:

Amazon S3 Integration

Private preview: This integration is available to select accounts. Contact your Immuta representative for details.

Getting started

Requirements

No location is registered in your S3 Access Grants instance before configuring the integration in Immuta

Permissions

APPLICATION_ADMIN Immuta permission to configure the integration
CREATE_S3_DATASOURCE Immuta permission to register S3 prefixes
The AWS account credentials or optional AWS IAM role you provide Immuta to configure the integration must
- - accessgrantslocation resource:
    s3:CreateAccessGrant
    s3:DeleteAccessGrantsLocation
    s3:GetAccessGrantsLocation
    s3:UpdateAccessGrantsLocation
  - accessgrantsinstance resource:
    s3:CreateAccessGrantsInstance
    s3:CreateAccessGrantsLocation
    s3:DeleteAccessGrantsInstance
    s3:GetAccessGrantsInstance
    s3:GetAccessGrantsInstanceForPrefix
    s3:GetAccessGrantsInstanceResourcePolicy
    s3:ListAccessGrants
    s3:ListAccessGrantsLocations
  - accessgrant resource:
    s3:DeleteAccessGrant
    s3:GetAccessGrant
  - bucket resource: s3:ListBucket
  - role resource:
    iam:GetRole
    iam:PassRole
  - all resources: s3:ListAccessGrantsInstances

Set up S3 Access Grants instance

- sts:AssumeRole
- sts:SetSourceIdentity

IAM role trust policy example

{
  "Version": "2012-10-17",
    "Statement": [
    {
      "Sid": "Stmt1234567891011",
      "Effect": "Allow",
      "Principal": {
        "Service":"access-grants.s3.amazonaws.com"
      },
      "Action": [
        "sts:AssumeRole", 
        "sts:SetSourceIdentity"
      ]
    }
  ]
}

s3:GetObject
s3:GetObjectVersion
s3:GetObjectAcl
s3:GetObjectVersionAcl
s3:ListMultipartUploadParts
s3:PutObject
s3:PutObjectAcl
s3:PutObjectVersionAcl
s3:DeleteObject
s3:DeleteObjectVersion
s3:AbortMultipartUpload
s3:ListBucket
s3:ListAllMyBuckets

IAM policy example

Replace <bucket_arn> in the example below with the ARN of the bucket scope that contains data you want to grant access to.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ObjectLevelReadPermissions",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:GetObjectAcl",
                "s3:GetObjectVersionAcl",
                "s3:ListMultipartUploadParts"
            ],
            "Resource": [
                <bucket arn>
            ]
        },
        {
            "Sid": "ObjectLevelWritePermissions",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:PutObjectAcl",
                "s3:PutObjectVersionAcl",
                "s3:DeleteObject",
                "s3:DeleteObjectVersion",
                "s3:AbortMultipartUpload"
            ],
            "Resource": [
                <bucket arn>
            ]
        },
        {
            "Sid": "BucketLevelReadPermissions",
            "Effect": "Allow",
            "Action": [
                "s3:ListAllMyBuckets",
                "s3:ListBucket"
            ],
            "Resource": [
                <bucket arn>
            ]
        }
    ]
}

If you use server-side encryption with AWS Key Management Service (AWS KMS) keys to encrypt your data, the following permissions are required for the IAM role in the policy. If you do not use this feature, do not include these permissions in your IAM policy:

kms:Decrypt
kms:GenerateDataKey

IAM policy example

Replace <role_arn> and <access_grants_instance_arn> in the example below with the ARNs of the role you created and your Access Grants instance, respectively. The Access Grants instance resource ARN should be scoped to apply to any future locations that will be created under this Access Grants instance. For example, "Resource": "arn:aws:s3:us-east-2:6********499:access-grants/default*" ensures that the role would have permissions for both of these locations:

arn:aws:s3:us-east-2:6********499:access-grants/default/newlocation1
arn:aws:s3:us-east-2:6********499:access-grants/default/newlocation2

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "RolePermissions",
            "Effect": "Allow",
            "Action": [
                "iam:GetRole",
                "iam:PassRole"
            ],
            "Resource": "<role_arn>"
        },
        {
            "Sid": "AccessGrants",
            "Effect": "Allow",
            "Action": [
                "s3:CreateAccessGrant",
                "s3:DeleteAccessGrantsLocation",
                "s3:GetAccessGrantsLocation",
                "s3:CreateAccessGrantsLocation",
                "s3:GetAccessGrantsInstance",
                "s3:GetAccessGrantsInstanceForPrefix",
                "s3:GetAccessGrantsInstanceResourcePolicy",
                "s3:ListAccessGrants",
                "s3:ListAccessGrantsLocations",
                "s3:ListAccessGrantsInstances",
                "s3:DeleteAccessGrant",
                "s3:GetAccessGrant"
            ],
            "Resource": [
                "<access_grants_instance_arn>"
            ]
        }
    ]
}

IAM policy example

<aws_account>: Your AWS account ID.

{
  "Sid": "sso",
  "Effect": "Allow",
  "Action": [
    "sso:DescribeInstance",
    "sso:DescribeApplication",
    "sso-directory:DescribeUsers"
  ],
  "Resource": [
    "<iam_identity_center_instance_arn>",
    "<iam_identity_center_application_arn_for_s3_access_grants>",
    "arn:aws:identitystore:::user/*",
    "arn:aws:identitystore::<aws_account>:identitystore/<identity_store_id>"
  ]
}, {
  "Sid": "idc",
  "Effect": "Allow",
  "Action": [
    "identitystore:DescribeUser",
    "identitystore:DescribeGroup"
  ],
  "Resource": [
    "<iam_identity_center_instance_arn>",
    "<iam_identity_center_application_arn_for_s3_access_grants>",
    "arn:aws:identitystore:::user/*",
    "arn:aws:identitystore::<aws_account>:identitystore/<identity_store_id>"
  ]
}

Configure the integration in Immuta

In Immuta, click the App Settings icon in the navigation menu and click the Integrations tab.
Click + Add Integration.
Select Amazon S3 from the dropdown menu and click Continue Configuration.
Complete the connection details fields, where
- Friendly Name is a name for the integration that is unique across all Amazon S3 integrations configured in Immuta.
- AWS Account ID is the ID of your AWS account.
- AWS Region is the AWS region to use.
- S3 Access Grants Location IAM Role ARN is the role the S3 Access Grants service assumes to vend credentials to the grantee. When a grantee accesses S3 data, the Access Grants service attaches session policies and assumes this role in order to vend credentials scoped to a prefix or bucket to the grantee. This role needs full access to all paths under the S3 location prefix.
- S3 Access Grants S3 Location Scope is the base S3 location that Immuta will use for this connection when registering S3 prefixes. This path must be unique across all S3 integrations configured in Immuta. During data source registration, this prefix is prepended to the data source prefixes to build the final path used to grant or revoke access to that data in S3. For example, a location prefix of s3://research-data would be prepended to the data source prefix /demographics to generate a final path of s3://research-data/demographics.
Select your authentication method:
- Access using AWS IAM role: Provide an AWS IAM Role that Immuta will assume when interacting with the AWS API. This option allows you to provide Immuta with an IAM role from your AWS account that is granted a trust relationship with Immuta's IAM role for providing S3 access grants operations. Immuta will assume this IAM role from Immuta's AWS account in order to perform any operations in your AWS account. Before proceeding, contact your Immuta representative for the AWS account to add to your trust policy. Then, complete the steps below.
  - Enter the role ARN in the AWS IAM Role field. Immuta will assume this role when interacting with AWS.
- Access using access key and secret access key: Provide your AWS Access Key ID and AWS Secret Access Key.
Click Verify Credentials.
Click Next to review and confirm your connection information, and then click Complete Setup.

Editing an integration

You can edit the following settings for an existing Amazon S3 integration on the app settings page:

friendly name
authentication type and values (access key, secret, and role)

Register S3 data

Protect data

Requirements: USER_ADMIN Immuta permission and either the GOVERNANCE or CREATE_S3_DATASOURCE Immuta permission

Map AWS IAM principals to each Immuta user to ensure Immuta properly enforces policies:
1. Click Identities in the navigation menu and select Users.
2. Navigate to the user's page and click the more actions icon next to their username.
3. Select Change S3 User or AWS IAM Role from the dropdown menu.
4. - Unset (fallback to Immuta username): When selecting this option, the S3 username is assumed to be the same as the Immuta username.
5. Click Save.

Access data

Requirement: User must be subscribed to the data source in Immuta

S3 integration overview

With this integration, users can avoid

hand-writing AWS IAM policies
managing AWS IAM role limits
manually tracking what user or role has access to what files in AWS S3 and verifying those are consistent with intent

S3 Access Grants components

To enforce controls on S3 data, Immuta interacts with several S3 Access Grants components:

Access Grants instance: An Access Grants instance is a logical container for individual grants that specify who can access what level of data in S3 in your AWS account and region. AWS supports one Access Grants instance per region per AWS account.
Location: A location specifies what data the Access Grants instance can grant access to. For example, registering a location with a scope of s3:// allows Access Grants to manage access to all S3 buckets in that AWS account and region, whereas setting the bucket s3://research-data as the scope limits Access Grants to managing access to that single bucket for that location. When you configure the S3 integration in Immuta, you specify a location's scope and IAM assumed role, and Immuta registers the location in your Access Grants instance and associates it with the provided IAM role for you. Each S3 integration you configure in Immuta is associated with one location, and Immuta manages all grants in that location. Therefore, grants cannot be manually created by users in an Access Grants instance location that Immuta has registered and manages. During data source registration, this location scope is prepended to the data source prefixes to build the final path used to grant or revoke access to that data in S3. For example, a location scope of s3://research-data would be prepended to the data source prefix /demographics to generate a final path of s3://research-data/demographics.
Individual grants: Individual permission grants in S3 Access Grants specify the identity that can access the data, the access level, and the location of the S3 data. Immuta creates a grant for each user subscribed to a prefix, bucket, or object by interacting with the Access Grants API. Each grant has its own ID and gives the user or role principle access to the data.
IAM assumed role: This is an IAM role you create in S3 that has full access to all prefixes, buckets, and objects in the Access Grants location registered by Immuta. This IAM role is used to vend temporary credentials to users or applications. When a grantee requests temporary credentials, the S3 Access Grants service assumes this role to vend credentials scoped to the prefix, bucket, or object specified in the grant to the grantee. The grantee then uses these credentials to access S3 data. When configuring the integration in Immuta, you specify this role, and then Immuta associates this role with the registered location in the Access Grants instance.

The diagram below illustrates how these S3 Access Grants components interact.

How does the integration work?

After an administrator creates an Access Grants instance and an assumed IAM role in their AWS account, an application administrator configures the Amazon S3 integration in Immuta. During configuration, the administrator provides the following connection information so that Immuta can create and register a location in that Access Grants instance:

AWS account ID and region
ARN for the existing Access Grants instance
ARN for the assumed IAM role

In the example below, an application administrator registers the following location prefix and IAM role for their Access Grants instance in AWS account 123456:

Location path: s3://. This path allows a single Amazon S3 integration to manage all objects in S3 in that AWS account and region. Data owners can scope down access further when registering specific S3 prefixes and applying policies.
Location IAM role: The arn:aws:iam::123456:role/access-grants-role IAM role will be used to vend temporary credentials to users and applications.

Immuta registers this location and associated IAM role in the user's Access Grants instance:

After the S3 integration is configured, a data owner can register S3 prefixes and buckets that are in the configured Access Grants location path to enforce access controls on resources. Immuta stores the connection information for the prefix so that the metadata can be used to create and enforce subscription policies on S3 data.

A data owner or governor can apply a subscription policy to a registered prefix, bucket, or object to control who can access objects beginning with that prefix or in that bucket after it is registered in Immuta. Once a subscription policy is created and Immuta users are subscribed to the prefix, bucket, or object, Immuta calls the Access Grants API to create a grant for each subscribed user, specifying the following parameters in the payload so that Access Grants can create and store a grant for each user:

Access Grants location
READ access
User or role principle
Registered prefix, bucket, or object

In the example below, a data owner registers the s3://research-data/* bucket, and Immuta stores the connection information in the Immuta metadata database. Once the user, Taylor, is subscribed to s3://research-data/*, Immuta calls the Access Grants API to create a grant for that user to allow them to read and write S3 data in that bucket:

Integration health status

The status of the integration is visible on the integrations tab of the Immuta application settings page. If errors occur in the integration, a banner will appear in the Immuta UI with guidance for remediating the error.

Accessing S3 data

In the example below, Taylor requests temporary credentials from S3 Access Grants. Access Grants looks up the grant ID (1) for that user, assumes the arn:aws:iam::123456:role/access-grants-role IAM role for the location, and vends temporary credentials to Taylor, who then uses the credentials to access the research-data bucket in S3:

Note that when accessing data through S3 Access Grants, the user or application interacts directly with the Access Grants API to request temporary credentials; Immuta does not act in this process at all. See the diagram below for an illustration of the process for accessing data through S3 Access Grants.

Policy enforcement

Immuta's S3 integration allows data owners and governors to apply object-level access controls on data in S3 through subscription policies. When a user is subscribed to a registered prefix, bucket, or object, Immuta calls the Access Grants API to create an individual grant that narrows the scope of access within the location to that registered prefix, bucket, or object. See the diagram below for a visualization of this process.

When a user's entitlements change or a subscription policy is added to, updated, or deleted from a prefix, Immuta performs one of the following processes for each user subscribed to the registered prefix:

User added to the prefix: Immuta specifies a permission (READ or READWRITE) for each user and uses the Access Grants API to create an individual grant for each user.
User updated: Immuta deletes the current grant ID and creates a new one using the Access Grants API.
User deleted: Immuta deletes the grant ID using the Access Grants API.

Read access policies manage who can get objects from S3.
Write access policies manage who can modify data in S3.

Data policies, which provide more granular controls by redacting or masking values in a table, are not supported for S3.

Prefix registration

Each prefix added in the data registration workflow is created as a single Immuta data source, and a subscription policy added to a data source applies to any objects in that bucket or beginning with that prefix:

Therefore, data owners should register prefixes or buckets at the lowest level of access control they need for that data. Using the example above, if the data owner needed to allow different users to access s3://yellow-bucket/research-data/* than those who should access s3://yellow-bucket/analyst-data/*, the data owner must register the research-data/* and analyst-data/* prefixes separately and then apply a subscription policy to those prefixes:

Deleting registered prefixes

When an S3 data source is deleted, Immuta deletes all the grants associated with that prefix, bucket, or object in that location.

User provisioning

However, if you manage access in AWS through IAM roles instead of users, user provisioning in Immuta must be done using IAM role principals. This means that if users share IAM roles, you could end up in a situation where you over-provision access to everyone in the IAM role.

See the guidelines below for the best practices to avoid this behavior if you currently use IAM roles to manage access.

Mapping IAM principals in Immuta

Names are case-sensitive

Immuta supports mapping an Immuta user to AWS in one of the following ways:

Existing S3 integrations

The Amazon S3 integration will not interfere with existing legacy S3 integrations, and multiple S3 integrations can exist in a single Immuta tenant.

Supported AWS services

Limitations

During private preview, Immuta supports up to 500 prefixes (data sources) and up to 20 Immuta users that are mapped to S3 identities principals. This is a preview limitation that will be removed in a future phase of the integration.
The following Immuta features are not currently supported by the integration in private preview:
- Audit
- Data policies
- Schema monitoring
- Tag ingestion

AWS Lake Formation

Design partner: This connection is available to select accounts. Contact your Immuta representative for details.

In the Lake Formation connection, Immuta orchestrates on data registered in the Glue Data Catalog. Then, Immuta users who have been granted access to the Glue Data Catalog table or view can query it using one of these analytic engines:

Amazon Athena
Amazon EMR Spark
Amazon Redshift Spectrum

This getting started guide outlines how to connect AWS Lake Formation with Immuta.

How-to guide

Reference guides

Getting Started with AWS Lake Formation

Design partner: This connection is available to select accounts. Contact your Immuta representative for details.

Connect your technology

These guides provide instructions on getting your data set up in Immuta for the Marketplace and Governance apps.

Register your users

These guides provide instructions on getting your users set up in Immuta for the Marketplace and Governance apps.

Start using Marketplace

Private preview: The Marketplace app is available to select accounts. Contact your Immuta representative for details.

These guides provide instructions on using Marketplace for the first time.

Add data metadata

These guides provide instructions on getting your data metadata set up in Immuta for the Governance app.

Start using the Governance app

These guides provide instructions on using the Governance app for the first time.

Register an AWS Lake Formation Connection

Design partner: This integration is available to select accounts. Contact your Immuta representative for details.

Requirements

No AWS Lake Formation connections configured in the same Immuta instance for the same Glue Data Catalog.

Permissions

APPLICATION_ADMIN Immuta permission to register the connection
- - glue:GetDatabase
  - glue:GetTables
  - glue:GetDatabases
  - glue:GetTable
- - lakeformation:ListPermissions
  - lakeformation:BatchGrantPermissions
  - lakeformation:BatchRevokePermissions
  - lakeformation:CreateLFTag
  - lakeformation:UpdateLFTag
  - lakeformation:DeleteLFTag
  - lakeformation:AddLFTagsToResource
  - lakeformation:RemoveLFTagsFromResource

Set up the Immuta service principal

The Immuta service principal is the to perform operations in your AWS account. This role must have all the necessary permissions in AWS Glue and AWS Lake Formation to allow Immuta to register data sources and apply policies.

Add the following IAM permissions to the service principal from the admin account. These permissions will allow the service principal to register data sources and apply policies on Immuta's behalf.
Grant the service principal permissions on any tables that will be registered in Immuta. There are two ways to give the service principal these permissions: either make a new LF-Tag that gives the appropriate permissions and apply it to all databases or tables that Immuta will manage, or make the role a superuser in Lake Formation.

This method follows the principle of least privilege and is the most flexible way of granting permissions to the service principal. LF-Tags cascade down from databases to tables, while allowing for exceptions. This means that when you apply this tag to a database, it will automatically apply to all tables within that database and allow you to remove it from any tables if those should be out of the scope of Immuta’s governance.

Create a new LF-Tag, giving yourself permissions to grant that tag to a user, which will ultimately be your service principal.
1. In the Lake Formation console, navigate to LF-Tags and permissions and click Add LF-Tag. You will need the Create LF-Tag permission to do this.
2. Create a tag key and value.
3. On the LF-Tag key-value pair, grant the ASSOCIATE LF-Tag permission to your own IAM principal.
Grant this tag to the Immuta service principal.
1. In the Lake Formation console, navigate to Data permissions and click Grant.
2. Enter the service principal’s IAM role.
3. Add the key-value pair of the tag you created in step 1.
4. Under Table Permissions, select the following grantable permissions: SELECT, DESCRIBE, INSERT, DELETE .
5. Click Grant.

This option enables all Lake Formation operations on all data in the Glue Data Catalog. This is highly privileged and runs the risk of managing permissions on data you did not intend to.

This method will grant all necessary permissions to the service principal, but grants more than the service principal needs without being as flexible, since it does not allow for exceptions like the LF-Tag method. You can make the service principal a superuser on the entire catalog or specify individual resources.

In the Lake Formation console, navigate to Data permissions and click Grant.
Enter your service principal’s IAM role.
Select Named Data Catalog resources, and input the Glue Data Catalog ID and any databases or tables you wish to specify.
Under Grantable permissions, select Super and click Grant.

Register an AWS Lake Formation connection

Click Data and select Connections in the navigation menu.
Click the + Add Connection button.
Select the AWS Lake Formation tile.
Enter the host connection information:
1. Display Name: This is the name of your new connection. This name will be used in the API (connectionKey), in data source names from the host, and on the connections page.
2. AWS Glue Catalog ARN: The Amazon resource name of the Glue Data Catalog that contains the data you want to register and protect.
3. AWS Account ID: The ID of the AWS account associated with the Glue Data Catalog.
4. AWS Region: The region of the AWS account associated with the Glue Data Catalog.
Click Next.
Select an authentication method from the dropdown menu.
2. AWS IAM Role (recommended): Immuta will assume this IAM role from Immuta's AWS account in order to perform any operations in your AWS account. Before proceeding, contact your Immuta representative for the AWS account to add to your trust policy. Then, complete the steps below.
  1. Enter the role ARN in the AWS IAM Role field. Immuta will assume this role when interacting with AWS.
Ensure that you have the correct permissions and click Validate Connection.
If the connection is successful, click Next. If there are any errors, check the connection details and credentials to ensure they are correct and try again.
Ensure all the details are correct in the summary and click Complete Setup.

Map users

Requirement: USER_ADMIN Immuta permission

Map AWS IAM principals to each Immuta user to ensure Immuta properly enforces policies.

Click People and select Users in the navigation menu.
Click the user's name to navigate to their page and scroll to the External User Mapping section.
Click Edit in the AWS User row.
- AWS Identity Center user IDs: You must use the numeric User ID value found in AWS IAM Identity Center, not the user's email address.
- Unset (fallback to Immuta username): When selecting this option, the AWS username is assumed to be the same as the Immuta username.
Click Save.

Reference Guides

Security and Compliance

Immuta offers several features to provide security for your users and to prove compliance and monitor for anomalies.

Security

Data processing and encryption

Authentication

Registering the connection

The Lake Formation connection supports the following authentication methods to register a connection:

Access using AWS IAM role (recommended): Immuta will assume this role when interacting with the AWS API. This option allows you to provide Immuta with an IAM role from your AWS account that is granted a trust relationship with Immuta's IAM role. Immuta will assume this IAM role from Immuta's AWS account in order to perform any operations in your AWS account.
Access using access key and secret access key: These credentials are used temporarily by Immuta to register the connection.

Identity providers for user authentication

The built-in Immuta IAM can be used as a complete solution for authentication and user entitlement. However, you can connect your existing identity management provider to Immuta to use that system for authentication and user entitlement instead.

Each of the supported identity providers includes a specific set of configuration options that enable Immuta to communicate with the IAM system and map the users, permissions, groups, and attributes into Immuta.

Auditing and compliance

Immuta provides governance reports so that data owners and governors can monitor users' access to data and detect anomalies in behavior.

Immuta governance reports allow users with the GOVERNANCE Immuta permission to use a natural language builder to instantly create reports that delineate user activity across Immuta. These reports can be based on various entity types, including users, groups, projects, data sources, purposes, policy types, or connection types.

Protecting Data

In the AWS Lake Formation connection, Immuta orchestrates on data registered in the Glue Data Catalog. Then, Immuta users who have been granted access to the Glue Data Catalog table or view can query it using one of these analytic engines:

Amazon Athena
Amazon EMR Spark
Amazon Redshift Spectrum

The sequence diagram below outlines the events that occur when an Immuta user who is subscribed to a data source submits a query in their AWS analytic engine.

Registering a connection

Once the Lake Formation connection is registered, you can author policies in Immuta to orchestrate Lake Formation access controls.

Protecting data

After Glue Data Catalog views and tables are registered in Immuta, you can author subscription policies in Immuta to orchestrate Lake Formation access controls. Once a subscription policy is applied, users can be subscribed to data sources in the following ways:

Consider the following example that illustrates how Immuta enforces a subscription policy that only allows users in the analysts group to access to yellow-table. When this policy is authored and applied to the data source, Immuta generates a Lake Formation (LF) tag that is applied to the Glue Data Catalog yellow-table and permissions on that tag are granted to all AWS users (registered in Immuta) that are part of the analysts group.

In the image above, the user in the analysts group accesses yellow-table , while the user who is a part of the research group is denied access.

Accessing Data

Once data is registered through the AWS Lake Formation connection, you will access your data in one of these AWS analytic engines as you normally would:

Amazon Athena
Amazon EMR Spark
Amazon Redshift Spectrum

When you submit a query, the analytic engine requests metadata from Glue Data Catalog, which then queries Lake Formation to determine what data you are allowed to see. Then, the analytic engine requests temporary access from Lake Formation, retrieves the data from S3, and filters the data to returns policy-enforced data to you.

The diagram below illustrates how the analytic engine interacts with Glue Data Catalog and Lake Formation to access data.

Getting Started with Azure Synapse Analytics

Requirement: A running Dedicated SQL pool

Connect your technology

These guides provide instructions on getting your data set up in Immuta for the Marketplace and Governance apps.

Register your users

These guides provide instructions on getting your users set up in Immuta for the Marketplace and Governance apps.

Start using Marketplace

Private preview: The Marketplace app is available to select accounts. Contact your Immuta representative for details.

These guides provide instructions on using Marketplace for the first time.

Add data metadata

These guides provide instructions on getting your data metadata set up in Immuta for the Governance app.

Start using the Governance app

These guides provide instructions on using the Governance app for the first time.

Configure Azure Synapse Analytics Integration

Requirement

A running Dedicated SQL pool is required.

Add an Azure Synapse Analytics integration

Click the App Settings icon in the navigation menu.
Click the Integrations tab.
Click the +Add Integration button and select Azure Synapse Analytics from the dropdown menu.
Complete the Host, Port, Immuta Database, and Immuta Schema fields.
Opt to check the Enable Impersonation box and customize the Impersonation Role name as needed. This will allow users to natively impersonate another user.
Opt to update the User Profile Delimiters. This will be necessary if any of the provided symbols are used in user profile information.

Select your configuration method

You have two options for configuring your Azure Synapse Analytic environment:

Automatic setup

Enter the username and password in the Privileged User Credentials section.

Manual setup

Select Manual.
Download, fill out the appropriate fields, and run the bootstrap master script and bootstrap script linked in the Setup section.
Enter the username and password in the Immuta System Account Credentials section. The username and password provided must be the credentials that were set in the bootstrap master script when you created the user.

Save the configuration

Click Save.

Register data

Edit an Azure Synapse Analytics integration

Click the App Settings icon in the navigation menu.
Navigate to the Integrations tab and click the down arrow next to the Azure Synapse Analytics Integration.
Edit the field you want to change. Note any field shadowed is not editable, and the integration must be disabled and re-installed to change it.
Enter Username and Password.
Click Save.

Immuta requires temporary, one-time use of credentials with specific permissions

When performing edits to an integration, Immuta requires temporary, one-time use of credentials of a Superuser or a user with the Manage GRANTS permission.

Alternatively, you can download the Edit Script from your Azure Synapse Analytics configuration on the Immuta app settings page and run it in Azure Synapse Analytics.

Remove an Azure Synapse Analytics integration

Click the App Settings icon in the navigation menu.
Navigate to the Integrations tab and click the down arrow next to the Azure Synapse Analytics Integration.
Click the checkbox to disable the integration.
Enter the username and password that were used to initially configure the integration.
Click Save.

Reference Guides

How-to Guides

Reference Guides

How-to Guides

Reference Guides

How-to Guides

Integration Settings

Reference Guides

Explanatory Guides

How-to Guides

Reference Guides

Amazon S3 Integration

Private preview: This integration is available to select accounts. Contact your Immuta representative for details.

Getting started

Immuta's Amazon S3 integration allows users to apply to data in S3 to restrict what prefixes, buckets, or objects users can access. To enforce access controls on this data, Immuta creates S3 grants that are administered by S3 Access Grants, an AWS feature that defines access permissions to data in S3.

Requirements

No location is registered in your S3 Access Grants instance before configuring the integration in Immuta
; contact your Immuta representative to get this feature enabled
: is the best approach for user provisioning because it treats users as users, not users as roles. Consequently, access controls are enforced for the querying user, nothing more. This approach eliminates over-provisioning and permits granular access control. Furthermore, IDC uses trusted identity propagation, meaning AWS propagates a user's identity wherever that user may operate within the AWS ecosystem. As a result, a user's identity always remains known and consistent as they navigate across AWS services, which is a key requirement for organizations to properly govern that user. Enabling IDC does not impact any existing access controls; it is additive. Immuta will manage the GRANTs for you using IDC if it is enabled and configured in Immuta. See the for instructions on mapping users from AWS IDC to user accounts in Immuta.

Permissions

APPLICATION_ADMIN Immuta permission to configure the integration
CREATE_S3_DATASOURCE Immuta permission to register S3 prefixes
The AWS account credentials or optional AWS IAM role you provide Immuta to configure the integration must
- have the to create locations and issue grants:
  - accessgrantslocation resource:
    s3:CreateAccessGrant
    s3:DeleteAccessGrantsLocation
    s3:GetAccessGrantsLocation
    s3:UpdateAccessGrantsLocation
  - accessgrantsinstance resource:
    s3:CreateAccessGrantsInstance
    s3:CreateAccessGrantsLocation
    s3:DeleteAccessGrantsInstance
    s3:GetAccessGrantsInstance
    s3:GetAccessGrantsInstanceForPrefix
    s3:GetAccessGrantsInstanceResourcePolicy
    s3:ListAccessGrants
    s3:ListAccessGrantsLocations
  - accessgrant resource:
    s3:DeleteAccessGrant
    s3:GetAccessGrant
  - bucket resource: s3:ListBucket
  - role resource:
    iam:GetRole
    iam:PassRole
  - all resources: s3:ListAccessGrantsInstances

Set up S3 Access Grants instance

. AWS supports one Access Grants instance per region per AWS account.
. You will add this role to your integration configuration in Immuta so that Immuta can register this role with your Access Grants location. The policy should include at least the following permissions, but might need additional permissions depending on other local setup factors. An example trust policy is provided below.
- sts:AssumeRole
- sts:SetSourceIdentity

IAM role trust policy example

{
  "Version": "2012-10-17",
    "Statement": [
    {
      "Sid": "Stmt1234567891011",
      "Effect": "Allow",
      "Principal": {
        "Service":"access-grants.s3.amazonaws.com"
      },
      "Action": [
        "sts:AssumeRole", 
        "sts:SetSourceIdentity"
      ]
    }
  ]
}

with the following permissions, and attach the policy to the IAM role you created to grant the permissions to the role. The policy should include the following permissions. An example policy is provided below.

s3:GetObject
s3:GetObjectVersion
s3:GetObjectAcl
s3:GetObjectVersionAcl
s3:ListMultipartUploadParts
s3:PutObject
s3:PutObjectAcl
s3:PutObjectVersionAcl
s3:DeleteObject
s3:DeleteObjectVersion
s3:AbortMultipartUpload
s3:ListBucket
s3:ListAllMyBuckets

IAM policy example

Replace <bucket_arn> in the example below with the ARN of the bucket scope that contains data you want to grant access to.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ObjectLevelReadPermissions",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:GetObjectAcl",
                "s3:GetObjectVersionAcl",
                "s3:ListMultipartUploadParts"
            ],
            "Resource": [
                <bucket arn>
            ]
        },
        {
            "Sid": "ObjectLevelWritePermissions",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:PutObjectAcl",
                "s3:PutObjectVersionAcl",
                "s3:DeleteObject",
                "s3:DeleteObjectVersion",
                "s3:AbortMultipartUpload"
            ],
            "Resource": [
                <bucket arn>
            ]
        },
        {
            "Sid": "BucketLevelReadPermissions",
            "Effect": "Allow",
            "Action": [
                "s3:ListAllMyBuckets",
                "s3:ListBucket"
            ],
            "Resource": [
                <bucket arn>
            ]
        }
    ]
}

kms:Decrypt
kms:GenerateDataKey

that Immuta can use to create Access Grants locations and issue grants. This role must have the S3 permissions listed in the . An example policy is provided below.

IAM policy example

arn:aws:s3:us-east-2:6********499:access-grants/default/newlocation1
arn:aws:s3:us-east-2:6********499:access-grants/default/newlocation2

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "RolePermissions",
            "Effect": "Allow",
            "Action": [
                "iam:GetRole",
                "iam:PassRole"
            ],
            "Resource": "<role_arn>"
        },
        {
            "Sid": "AccessGrants",
            "Effect": "Allow",
            "Action": [
                "s3:CreateAccessGrant",
                "s3:DeleteAccessGrantsLocation",
                "s3:GetAccessGrantsLocation",
                "s3:CreateAccessGrantsLocation",
                "s3:GetAccessGrantsInstance",
                "s3:GetAccessGrantsInstanceForPrefix",
                "s3:GetAccessGrantsInstanceResourcePolicy",
                "s3:ListAccessGrants",
                "s3:ListAccessGrantsLocations",
                "s3:ListAccessGrantsInstances",
                "s3:DeleteAccessGrant",
                "s3:GetAccessGrant"
            ],
            "Resource": [
                "<access_grants_instance_arn>"
            ]
        }
    ]
}

If you use AWS IAM Identity Center, associate . Then add the permissions listed in the sample policy below to your IAM policy, and attach the policy to the IAM role you created to grant the permissions to the role.

IAM policy example

Copy the JSON below and replace the following bracketed placeholder values with your own. For details about the actions and resource values, see the .

<iam_identity_center_instance_arn>: The that is configured with the application.
<iam_identity_center_application_arn_for_s3_access_grants>: The configured with IAM Identity Center.
<aws_account>: Your AWS account ID.
<identity_store_id>: The globally that is connected to the Identity Center instance. This value is generated when a new identity store is created.

{
  "Sid": "sso",
  "Effect": "Allow",
  "Action": [
    "sso:DescribeInstance",
    "sso:DescribeApplication",
    "sso-directory:DescribeUsers"
  ],
  "Resource": [
    "<iam_identity_center_instance_arn>",
    "<iam_identity_center_application_arn_for_s3_access_grants>",
    "arn:aws:identitystore:::user/*",
    "arn:aws:identitystore::<aws_account>:identitystore/<identity_store_id>"
  ]
}, {
  "Sid": "idc",
  "Effect": "Allow",
  "Action": [
    "identitystore:DescribeUser",
    "identitystore:DescribeGroup"
  ],
  "Resource": [
    "<iam_identity_center_instance_arn>",
    "<iam_identity_center_application_arn_for_s3_access_grants>",
    "arn:aws:identitystore:::user/*",
    "arn:aws:identitystore::<aws_account>:identitystore/<identity_store_id>"
  ]
}

Configure the integration in Immuta

In Immuta, click the App Settings icon in the navigation menu and click the Integrations tab.
Click + Add Integration.
Select Amazon S3 from the dropdown menu and click Continue Configuration.
Complete the connection details fields, where
- Friendly Name is a name for the integration that is unique across all Amazon S3 integrations configured in Immuta.
- AWS Account ID is the ID of your AWS account.
- AWS Region is the AWS region to use.
- S3 Access Grants Location IAM Role ARN is the role the S3 Access Grants service assumes to vend credentials to the grantee. When a grantee accesses S3 data, the Access Grants service attaches session policies and assumes this role in order to vend credentials scoped to a prefix or bucket to the grantee. This role needs full access to all paths under the S3 location prefix.
- S3 Access Grants S3 Location Scope is the base S3 location that Immuta will use for this connection when registering S3 prefixes. This path must be unique across all S3 integrations configured in Immuta. During data source registration, this prefix is prepended to the data source prefixes to build the final path used to grant or revoke access to that data in S3. For example, a location prefix of s3://research-data would be prepended to the data source prefix /demographics to generate a final path of s3://research-data/demographics.
Select your authentication method:
- Access using AWS IAM role: Provide an AWS IAM Role that Immuta will assume when interacting with the AWS API. This option allows you to provide Immuta with an IAM role from your AWS account that is granted a trust relationship with Immuta's IAM role for providing S3 access grants operations. Immuta will assume this IAM role from Immuta's AWS account in order to perform any operations in your AWS account. Before proceeding, contact your Immuta representative for the AWS account to add to your trust policy. Then, complete the steps below.
  - Enter the role ARN in the AWS IAM Role field. Immuta will assume this role when interacting with AWS.
  - Set the external ID provided in a condition on the trust relationship for the cross-account IAM specified above. See the for guidance.
- Access using access key and secret access key: Provide your AWS Access Key ID and AWS Secret Access Key.
Click Verify Credentials.
Click Next to review and confirm your connection information, and then click Complete Setup.

Editing an integration

You can edit the following settings for an existing Amazon S3 integration on the app settings page:

friendly name
authentication type and values (access key, secret, and role)

To edit settings for an existing integration via the API, see the .

Register S3 data

Follow the to register prefixes in Immuta.

To create an S3 data source using the API, see the .

Protect data

Requirements: USER_ADMIN Immuta permission and either the GOVERNANCE or CREATE_S3_DATASOURCE Immuta permission

in Immuta to enforce access controls.
Map AWS IAM principals to each Immuta user to ensure Immuta properly enforces policies:
1. Click Identities in the navigation menu and select Users.
2. Navigate to the user's page and click the more actions icon next to their username.
3. Select Change S3 User or AWS IAM Role from the dropdown menu.
4. Use the dropdown menu to select the User Type. Then complete the S3 field. User and role names are case-sensitive. See the for details.
  - : Only a single Immuta user can be mapped to an IAM role. This restriction prohibits enforcing policies on AWS users who could assume that role. Therefore, if using role principals, create a new user in Immuta that represents the role so that the role then has the permissions applied specifically to it.
  - AWS Identity Center user IDs: You must use the numeric User ID value found in AWS IAM Identity Center, not the user's email address. Ensure that you have added the content to your IAM policy JSON as outlined in the above to allow Immuta to use AWS Identity Center.
  - Unset (fallback to Immuta username): When selecting this option, the S3 username is assumed to be the same as the Immuta username.
5. Click Save.
See the for details about supported principals.

Access data

Requirement: User must be subscribed to the data source in Immuta

. If you're accessing S3 data through one of the supported (such as Amazon EMR on EC2), that application will make this request on your behalf, so you can skip this step.
.

S3 integration overview

With this integration, users can avoid

hand-writing AWS IAM policies
managing AWS IAM role limits
manually tracking what user or role has access to what files in AWS S3 and verifying those are consistent with intent

S3 Access Grants components

To enforce controls on S3 data, Immuta interacts with several S3 Access Grants components:

Access Grants instance: An Access Grants instance is a logical container for individual grants that specify who can access what level of data in S3 in your AWS account and region. AWS supports one Access Grants instance per region per AWS account.
Location: A location specifies what data the Access Grants instance can grant access to. For example, registering a location with a scope of s3:// allows Access Grants to manage access to all S3 buckets in that AWS account and region, whereas setting the bucket s3://research-data as the scope limits Access Grants to managing access to that single bucket for that location. When you configure the S3 integration in Immuta, you specify a location's scope and IAM assumed role, and Immuta registers the location in your Access Grants instance and associates it with the provided IAM role for you. Each S3 integration you configure in Immuta is associated with one location, and Immuta manages all grants in that location. Therefore, grants cannot be manually created by users in an Access Grants instance location that Immuta has registered and manages. During data source registration, this location scope is prepended to the data source prefixes to build the final path used to grant or revoke access to that data in S3. For example, a location scope of s3://research-data would be prepended to the data source prefix /demographics to generate a final path of s3://research-data/demographics.
Individual grants: Individual permission grants in S3 Access Grants specify the identity that can access the data, the access level, and the location of the S3 data. Immuta creates a grant for each user subscribed to a prefix, bucket, or object by interacting with the Access Grants API. Each grant has its own ID and gives the user or role principle access to the data.
IAM assumed role: This is an IAM role you create in S3 that has full access to all prefixes, buckets, and objects in the Access Grants location registered by Immuta. This IAM role is used to vend temporary credentials to users or applications. When a grantee requests temporary credentials, the S3 Access Grants service assumes this role to vend credentials scoped to the prefix, bucket, or object specified in the grant to the grantee. The grantee then uses these credentials to access S3 data. When configuring the integration in Immuta, you specify this role, and then Immuta associates this role with the registered location in the Access Grants instance.
Temporary credentials: These just-in-time access credentials provide access to a prefix, bucket, or object with a permission level of READ or READWRITE in S3. When a user or application requests temporary credentials to access S3 data, the S3 Access Grants instance evaluates the request against the grants Immuta has created for that user. If a matching grant exists, S3 Access Grants assumes the IAM role associated with the location of the matching grant and scopes the permissions of the IAM session to the S3 prefix, bucket, or object specified by the grant and vends these temporary credentials to the requester. These credentials have a default timeout of 1 hour, but .

The diagram below illustrates how these S3 Access Grants components interact.

For more details about these Access Grants concepts, see the .

How does the integration work?

AWS account ID and region
ARN for the existing Access Grants instance
ARN for the assumed IAM role

When Immuta registers this location, it associates the assumed IAM role with the location. This allows the IAM role to create temporary credentials with access scoped to a particular S3 prefix, bucket, or object in the location. The IAM role you create for this location must have all the object- and bucket-level permissions listed in the on all buckets and objects in the location; if it is missing permissions, the IAM role will not be able to grant those missing permissions to users or applications requesting temporary credentials.

In the example below, an application administrator registers the following location prefix and IAM role for their Access Grants instance in AWS account 123456:

Location path: s3://. This path allows a single Amazon S3 integration to manage all objects in S3 in that AWS account and region. Data owners can scope down access further when registering specific S3 prefixes and applying policies.
Location IAM role: The arn:aws:iam::123456:role/access-grants-role IAM role will be used to vend temporary credentials to users and applications.

Immuta registers this location and associated IAM role in the user's Access Grants instance:

Access Grants location
READ access
User or role principle
Registered prefix, bucket, or object

Integration health status

Accessing S3 data

Policy enforcement

User added to the prefix: Immuta specifies a permission (READ or READWRITE) for each user and uses the Access Grants API to create an individual grant for each user.
User updated: Immuta deletes the current grant ID and creates a new one using the Access Grants API.
User deleted: Immuta deletes the grant ID using the Access Grants API.

Read access policies manage who can get objects from S3.
Write access policies manage who can modify data in S3.

Data policies, which provide more granular controls by redacting or masking values in a table, are not supported for S3.

Prefix registration

Deleting registered prefixes

When an S3 data source is deleted, Immuta deletes all the grants associated with that prefix, bucket, or object in that location.

User provisioning

See the guidelines below for the best practices to avoid this behavior if you currently use IAM roles to manage access.

Mapping IAM principals in Immuta

Names are case-sensitive

Immuta supports mapping an Immuta user to AWS in one of the following ways:

Existing S3 integrations

The Amazon S3 integration will not interfere with existing legacy S3 integrations, and multiple S3 integrations can exist in a single Immuta tenant.

Supported AWS services

Limitations

During private preview, Immuta supports up to 500 prefixes (data sources) and up to 20 Immuta users that are mapped to S3 identities principals. This is a preview limitation that will be removed in a future phase of the integration.
The following Immuta features are not currently supported by the integration in private preview:
- Audit
- Data policies
- Schema monitoring
- Tag ingestion

Installation and Compliance

In the Databricks Spark integration, Immuta installs an Immuta-maintained Spark plugin on your Databricks cluster. When a user queries data that has been registered in Immuta as a data source, the plugin injects policy logic into the plan Spark builds so that the results returned to the user only include data that specific user should see.

The sequence diagram below breaks down this process of events when an Immuta user queries data in Databricks.

System requirements

A Databricks workspace with the Premium tier, which includes cluster policies (required to configure the Spark integration)
A cluster that uses one of these supported Databricks Runtimes:
- 9.1 LTS
- 10.4 LTS
- 11.3 LTS
- 14.3 (private preview) - Requires Immuta version 2025.1.x or newer
Supported languages
- Python
- R (not supported for Databricks Runtime 14.3)
- Scala (not supported for Databricks Runtime 14.3)
- SQL
A Databricks cluster that is one of these supported compute types:
Custom access mode
The Databricks Spark integration only works with Spark 3.

What does Immuta do in my Databricks environment?

Once the init script runs, the Spark application running on the Databricks cluster will have the appropriate artifacts on its CLASSPATH to use Immuta for authorization and policy enforcement.

Immuta adds the following artifacts to your Databricks environment:

Immuta-maintained Spark plugin

The Databricks Spark integration injects this Immuta-maintained Spark plugin into the SparkSQL stack at cluster startup time. Policy determinations are obtained from the connected Immuta tenant and applied before returning results to the user. The plugin includes wrappers and Immuta analysis hook plan rewrites to enforce policies.

Immuta Security Manager

The Immuta Security Manager ensures users can't perform unauthorized actions when using Scala and R, since those languages have features that allow users to circumvent policies without the Security Manager enabled. The Immuta Security Manager blocks users from executing code that could allow them to gain access to sensitive data by only allowing select code paths to access sensitive files and methods. These select code paths provide Immuta's code access to sensitive resources while blocking end users from these sensitive resources directly.

Performance

The Security Manager must inspect the call stack every time a permission check is triggered, which adds overhead to queries. To improve Immuta's query performance on Databricks, Immuta disables the Security Manager when Scala and R are not being used.

The cluster init script checks the cluster’s configuration and automatically removes the Security Manager configuration when

spark.databricks.repl.allowedlanguages is a subset of {python, sql}
IMMUTA_SPARK_DATABRICKS_PY4J_STRICT_ENABLED is true

When the cluster is configured this way, Immuta can rely on Databricks' process isolation and Py4J security to prevent user code from performing unauthorized actions.

Note: Immuta still expects the spark.driver.extraJavaOptions and spark.executor.extraJavaOptions to be set and pointing at the Security Manager.

Beyond disabling the Security Manager, Immuta will skip several startup tasks that are required to secure the cluster when Scala and R are configured, and fewer permission checks will occur on the Driver and Executors in the Databricks cluster, reducing overhead and improving performance.

Caveats

There are still cases that require the Security Manager; in those instances, Immuta creates a fallback Security Manager to check the code path, so the IMMUTA_INIT_ALLOWED_CALLING_CLASSES_URI environment variable must always point to a valid calling class file.
Databricks’ dbutils is blocked by their Py4J security; therefore, it can’t be used to access scratch paths.

immuta database

When a table is registered in Immuta as a data source, users can see that table in the native Databricks database and in the immuta database. This allows for an option to use a single database (immuta) for all tables.

The immuta database on Immuta-enabled clusters allows Immuta to track Immuta-managed data sources separately from remote Databricks tables so that policies and other security features can be applied. However, Immuta supports raw tables in Databricks, so table-backed queries do not need to reference this database.

When configuring a Databricks cluster, you can hide immuta from any calls to SHOW DATABASES so that users are not confused or misled by that database. Hiding the database does not disable access to it. Queries can still be performed against tables in the immuta database using the Immuta-qualified table name (e.g., immuta.my_schema_my_table) regardless of whether or not this database is hidden.

IMMUTA_SPARK_SHOW_IMMUTA_DATABASE=false

Then, Immuta will not show this database when a SHOW DATABASES query is performed.

Once the Immuta-enabled cluster is running, the following user actions spur various processes. The list below provides an overview of each process:

A policy is deleted: When a policy is deleted, the policy information is deleted from the Metadata Database. If users were granted access to the data source by that policy, their access is revoked.
Databricks user queries data: When a user queries the data in Databricks, Immuta intercepts the call from Spark down to the Metastore. Then, the Immuta-maintained Spark plugin retrieves the policy information, the user metadata, and the data source metadata from the Metadata Database and injects this information as policy logic into the Spark logical plan. Once the physical plan is applied, Databricks returns policy-enforced data to the user.

The image below illustrates these processes and how they interact.

Supported policies

The Databricks Spark integration allows users to author subscription and data policies to enforce access controls. See the corresponding pages for details about specific types of policies supported:

Databricks Runtime 14.3

Private preview: Support for this Databricks Runtime is in private preview and available to select accounts. Contact your Immuta representative for details.

Immuta supports clusters on Databricks Runtime 14.3. The integration for this Databricks Runtime differs from the integration for other supported Runtimes in the following ways:

Py4J security and process isolation automatically enabled: Immuta relies on Databricks process isolation and Py4J security to prevent user code from performing unauthorized actions. After selecting Runtime 14.3 during configuration, Immuta will automatically enable process isolation and Py4J security.
dbutils is unsupported: Immuta relies on Databricks process isolation and Py4J security to prevent user code from performing unauthorized actions. This means that dbutils is not supported for Databricks Spark integrations using Runtime 14.3.

Cluster security and compliance

Authentication methods

The Databricks Spark integration supports the following authentication methods to configure the integration:

Personal access token (PAT): This token gives Immuta temporary permission to push the cluster policies to the configured Databricks workspace and overwrite any cluster policy templates previously applied to the workspace when configuring the integration or to register securables as Immuta data sources.

Audit

Immuta captures the code or query that triggers the Spark plan in Databricks, making audit records more useful in assessing what users are doing. To audit what triggers the Spark plan, Immuta hooks into Databricks where notebook cells and JDBC queries execute and saves the cell or query text. Then, Immuta pulls this information into the audits of the resulting Spark jobs.

Protecting the Immuta configuration

Databricks secrets can be used in the environment variables configuration section for a cluster by referencing the secret path instead of the actual value of the environment variable. For example, if a user wanted to make the MY_SECRET_ENV_VAR=abcd_1234 value secret, they could instead create a Databricks secret and reference it as the value of that variable by following these steps:

Create the secret scope my_secrets and add a secret with the key my_secret_env_var containing the sensitive environment variable.
Reference the secret in the environment variables section as MY_SECRET_ENV_VAR={{secrets/my_secrets/my_secret_env_var}}.

At runtime, {{secrets/my_secrets/my_secret_env_var}} would be replaced with the actual value of the secret if the owner of the cluster has access to that secret.

Scala clusters

There are limitations to isolation among users in Scala jobs on a Databricks cluster, even when using Immuta’s Security Manager. When data is broadcast, cached (spilled to disk), or otherwise saved to SPARK_LOCAL_DIR, it's impossible to distinguish between which user’s data is composed in each file/block. If you are concerned about this vulnerability, Immuta suggests that you

limit Scala clusters to Scala jobs only and

When data is read in Spark using an Immuta policy-enforced plan, the masking and redaction of rows is performed at the leaf level of the physical Spark plan, so a policy such as "Mask using hashing the column social_security_number for everyone" would be implemented as an expression on a project node right above the FileSourceScanExec/LeafExec node at the bottom of the plan. This process prevents raw data from being shuffled in a Spark application and, consequently, from ending up in SPARK_LOCAL_DIR.

This policy implementation coupled with an equalized project guarantees that data being dropped into SPARK_LOCAL_DIR will have policies enforced and that those policies will be homogeneous for all users on the cluster. Since each user will have access to the same data, if they attempt to manually access other users' cached data, they will only see what they have access to via equalized permissions on the cluster. If project equalization is not turned on, users could dig through that directory and find data from another user with heightened access, which would result in a data leak.

Troubleshooting the installation

Spark Environment Variables

This page outlines configuration details for Immuta-enabled Databricks clusters. Databricks administrators should place the desired configuration in the Spark environment variables.

IMMUTA_INIT_ADDITIONAL_CONF_URI

If you add additional Hadoop configuration during the integration setup, this variable sets the path to that file.

The additional Hadoop configuration is where sensitive configuration goes for remote filesystems (if you are using a secret key pair to access S3, for example).

IMMUTA_EPHEMERAL_HOST_OVERRIDE

Default value: true

Set this to false if ephemeral overrides should not be enabled for Spark. When true, this will automatically override ephemeral data source httpPaths with the httpPath of the Databricks cluster running the user's Spark application.

IMMUTA_EPHEMERAL_HOST_OVERRIDE_HTTPPATH

This configuration item can be used if automatic detection of the Databricks httpPath should be disabled in favor of a static path to use for ephemeral overrides.

IMMUTA_EPHEMERAL_TABLE_PATH_CHECK_ENABLED

Default value: true

When querying Immuta data sources in Spark, the metadata from the Metastore is compared to the metadata for the target source in Immuta to validate that the source being queried exists and is queryable on the current cluster. This check typically validates that the target (database, table) pair exists in the Metastore and that the table’s underlying location matches what is in Immuta. This configuration can be used to disable location checking if that location is dynamic or changes over time. Note: This may lead to undefined behavior if the same table names exist in multiple workspaces but do not correspond to the same underlying data.

IMMUTA_INIT_ALLOWED_CALLING_CLASSES_URI

IMMUTA_SPARK_ACL_ALLOWLIST

This is a comma-separated list of Databricks users who can access any table or view in the cluster metastore without restriction.

IMMUTA_SPARK_ACL_PRIVILEGED_TIMEOUT_SECONDS

Default value: 3600

The number of seconds to cache privileged user status for the Immuta ACL. A privileged Databricks user is an admin or is allowlisted in IMMUTA_SPARK_ACL_ALLOWLIST.

IMMUTA_SPARK_AUDIT_ALL_QUERIES

Default value: false

Enables auditing all queries run on a Databricks cluster, regardless of whether users touch Immuta-protected data or not.

IMMUTA_SPARK_DATABRICKS_ALLOW_NON_IMMUTA_READS

Default value: false

IMMUTA_SPARK_DATABRICKS_ALLOW_NON_IMMUTA_WRITES

Default value: false

IMMUTA_SPARK_DATABRICKS_ALLOWED_IMPERSONATION_USERS

This is a comma-separated list of Databricks users who are allowed to impersonate Immuta users:

"spark_env_vars.IMMUTA_SPARK_DATABRICKS_ALLOWED_IMPERSONATION_USERS": {
  "type": "fixed",
  "value": "edixon@example.com,dakota@example.com"
}

IMMUTA_SPARK_DATABRICKS_DBFS_MOUNT_ENABLED

Default value: false

Exposes the DBFS FUSE mount located at /dbfs. Granular permissions are not possible, so all users will have read/write access to all objects therein. Note: Raw, unfiltered source data should never be stored in DBFS.

IMMUTA_SPARK_DATABRICKS_DISABLED_UDFS

IMMUTA_SPARK_DATABRICKS_JAR_URI

Default value: file:///databricks/jars/immuta-spark-hive.jar

The location of immuta-spark-hive.jar on the filesystem for Databricks. This should not need to change unless a custom initialization script that places immuta-spark-hive in a non-standard location is necessary.

IMMUTA_SPARK_DATABRICKS_LOCAL_SCRATCH_DIR_ENABLED

Default value: true

Creates a world-readable or writable scratch directory on local disk to facilitate the use of dbutils and 3rd party libraries that may write to local disk. Its location is non-configurable and is stored in the environment variable IMMUTA_LOCAL_SCRATCH_DIR. Note: Sensitive data should not be stored at this location.

IMMUTA_SPARK_DATABRICKS_LOG_LEVEL

Default value: INFO

The SLF4J log level to apply to Immuta's Spark plugins.

IMMUTA_SPARK_DATABRICKS_LOG_STDOUT_ENABLED

Default value: false

If true, writes logging output to stdout/the console as well as the log4j-active.txt file (default in Databricks).

IMMUTA_SPARK_DATABRICKS_SCRATCH_DATABASE

Additionally, this configuration will only display the scratch databases that are configured and will not validate that the configured databases exist in the Metastore. Therefore, it is up to the Databricks administrator to properly set this value and keep it current.

IMMUTA_SPARK_DATABRICKS_SCRATCH_PATHS

Comma-separated list of remote paths that Databricks users are allowed to directly read/write. These paths amount to unprotected "scratch spaces." You can create a scratch database by configuring its specified location (or configure dbfs:/user/hive/warehouse/<db_name>.db for the default location).

To create a scratch path to a location or a database stored at that location, configure

IMMUTA_SPARK_DATABRICKS_SCRATCH_PATHS=s3://path/to/the/dir

To create a scratch path to a database created using the default location,

IMMUTA_SPARK_DATABRICKS_SCRATCH_PATHS=s3://path/to/the/dir,dbfs:/user/hive/warehouse/any_db_name.db</value>

IMMUTA_SPARK_DATABRICKS_SCRATCH_PATHS_CREATE_DB_ENABLED

Default value: false

Enables non-privileged users to create or drop scratch databases.

IMMUTA_SPARK_DATABRICKS_SINGLE_IMPERSONATION_USER

Default value: false

When true, this configuration prevents users from changing their impersonation user once it has been set for a given Spark session. This configuration should be set when the BI tool or other service allows users to submit arbitrary SQL or issue SET commands.

IMMUTA_SPARK_DATABRICKS_SUBMIT_TAG_JOB

Default value: true

Denotes whether the Spark job will be run that "tags" a Databricks cluster as being associated with Immuta.

IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URIS

IMMUTA_SPARK_NON_IMMUTA_TABLE_CACHE_SECONDS

Default value: 3600

The number of seconds Immuta caches whether a table has been exposed as a data source in Immuta. This setting only applies when IMMUTA_SPARK_DATABRICKS_ALLOW_NON_IMMUTA_WRITES or IMMUTA_SPARK_DATABRICKS_ALLOW_NON_IMMUTA_READS is enabled.

IMMUTA_SPARK_REQUIRE_EQUALIZATION

Default value: false

Requires that users act through a single, equalized project. A cluster should be equalized if users need to run Scala jobs on it, and it should be limited to Scala jobs only via spark.databricks.repl.allowedLanguages.

IMMUTA_SPARK_RESOLVE_RAW_TABLES_ENABLED

Default value: true

Enables use of the underlying database and table name in queries against a table-backed Immuta data source. Administrators or allowlisted users can set IMMUTA_SPARK_RESOLVE_RAW_TABLES_ENABLED to false to bypass resolving raw databases or tables as Immuta data sources. This is useful if an admin wants to read raw data but is also an Immuta user. By default, data policies will be applied to a table even for an administrative user if that admin is also an Immuta user.

IMMUTA_SPARK_SESSION_RESOLVE_RAW_TABLES_ENABLED

Default value: true

IMMUTA_SPARK_SHOW_IMMUTA_DATABASE

Default value: true

This shows the immuta database in the configured Databricks cluster. When set to false Immuta will no longer show this database when a SHOW DATABASES query is performed. However, queries can still be performed against tables in the immuta database using the Immuta-qualified table name (e.g., immuta.my_schema_my_table) regardless of whether or not this feature is enabled.

IMMUTA_SPARK_VERSION_VALIDATE_ENABLED

Default value: true

Immuta checks the versions of its artifacts to verify that they are compatible with each other. When set to true, if versions are incompatible, that information will be logged to the Databricks driver logs and the cluster will not be usable. If a configuration file or the jar artifacts have been patched with a new version (and the artifacts are known to be compatible), this check can be set to false so that the versions don't get logged as incompatible and make the cluster unusable.

IMMUTA_USER_MAPPING_IAMID

Default value: bim

Denotes which IAM in Immuta should be used when mapping the current Spark user's username to a userid in Immuta. This defaults to Immuta's internal IAM (bim) but should be updated to reflect an actual production IAM.

Customizing the Integration

You can customize the Databricks Spark integration settings using these components Immuta provides:

Cluster policies

Scala

Scala clusters: This configuration is for Scala-only clusters.

Sparklyr

Single-user clusters recommended: Like Databricks, Immuta recommends single-user clusters for sparklyr when user isolation is required. A single-user cluster can either be a job cluster or a cluster with credential passthrough enabled. Note: spark-submit jobs are not currently supported.

Two cluster types can be configured with sparklyr: Single-User Clusters (recommended) and Multi-User Clusters (discouraged).

Single-user cluster configuration

1 - Enable sparklyr

In addition to the configuration for an Immuta cluster with R, add this environment variable to the Environment Variables section of the cluster:

This configuration makes changes to the iptables rules on the cluster to allow the sparklyr client to connect to the required ports on the JVM used by the sparklyr backend service.

2 - Set up a sparklyr connection in Databricks

Install and load libraries into a notebook. Databricks includes the stable version of sparklyr, so library(sparklyr) in an R notebook is sufficient, but you may opt to install the latest version of sparklyr from CRAN. Additionally, loading library(DBI) will allow you to execute SQL queries.
Set up a sparklyr connection:
Pass the connection object to execute queries:

3 - Configure a single-user cluster

Add the following items to the Spark Config section of the cluster:

The trustedFileSystems setting is required to allow Immuta’s wrapper FileSystem (used in conjunction with the Security Manager for data security purposes) to be used with credential passthrough. Additionally, the InstanceProfileCredentialsProvider must be configured to continue using the cluster’s instance profile for data access, rather than a role associated with the attached user.

Multi-user cluster configuration

Avoid deploying multi-user clusters with sparklyr configuration

It is possible, but not recommended, to deploy a multi-user cluster sparklyr configuration. Immuta cannot guarantee user isolation in a multi-user sparklyr configuration.

The configurations in this section enable sparklyr, require project equalization, map sparklyr sessions to the correct Immuta user, and prevent users from accessing Immuta native workspaces.

Add the following environment variables to the Environment Variables section of your cluster configuration:
Add the following items to the Spark Config section:

Limitations

Immuta’s integration with sparklyr does not currently support

spark-submit jobs
UDFs

Spark environment variables

Additional Hadoop configuration file (optional)

In some cases it is necessary to add sensitive configuration to SparkSession.sparkContext.hadoopConfiguration to allow Spark to read data.

For example, when accessing external tables stored in Azure Data Lake Gen2, Spark must have credentials to access the target containers or filesystems in Azure Data Lake Gen2, but users must not have access to those credentials. In this case, an additional configuration file may be provided with a storage account key that the cluster may use to access Azure Data Lake Gen2.

Configurable settings

Data source settings

Protected and unprotected tables

Generally, Immuta prevents users from seeing data unless they are explicitly given access, which blocks access to raw sources in the underlying databases.

Databricks non-privileged users will only see sources to which they are subscribed in Immuta, and this can present problems if organizations have a data lake full of non-sensitive data and Immuta removes access to all of it. The limited enforcement scope feature addresses this challenge by allowing Immuta users to access any tables that are not protected by Immuta (i.e., not registered as a data source or a table in a native workspace). Although this is similar to how privileged users in Databricks operate, non-privileged users cannot bypass Immuta controls.

Protected until made available by policy: This setting means all tables are hidden until a user is granted access through an Immuta policy. This is how most databases work and assumes least privileged access and also means you will have to register all tables with Immuta if this is disabled.
Available until protected by policy: This setting means all tables are open until explicitly registered and protected by Immuta. This makes sense if most of your tables are non-sensitive and you can pick and choose which to protect. This setting allows both non-Immuta reads and non-Immuta writes:

Ephemeral overrides

In Immuta, a Databricks data source is considered ephemeral, meaning that the compute resources associated with that data source will not always be available.

Ephemeral data sources allow the use of ephemeral overrides, user-specific connection parameter overrides that are applied to Immuta metadata operations.

When a user runs a Spark job in Databricks, the Immuta plugin automatically submits ephemeral overrides for that user to Immuta for all applicable data sources to use the current cluster as compute for all subsequent metadata operations for that user against the applicable data sources.

Restricting users' access with Immuta projects

Immuta projects combine users and data sources under a common purpose. Sometimes this purpose is for a single user to organize their data sources or to control an entire schema of data sources through a single projects screen; however, most often this is an Immuta purpose for which the data has been approved to be used and will restrict access to data and streamline team collaboration. Consequently, data owners can restrict access to data for a specified purpose through projects.

When a user is working within the context of a project, data users will only see the data in that project. This helps to prevent data leaks when users collaborate. Users can switch project contexts to access various data sources while acting under the appropriate purpose. Consider adjusting the following project settings to suit your organization's needs:

Databricks features

This section describes how Immuta interacts with common Databricks features.

Change data feed

The CDF can be read if the querying user is allowed to read the raw data and ONE of the following statements is true:

the table is in the current workspace
the table is in a scratch path
non-Immuta reads are enabled AND the table does not intersect with a workspace under which the current user is not acting
non-Immuta reads are enabled AND the table is not part of an Immuta data source

Databricks trusted libraries

Security vulnerability

Using this feature could create a security vulnerability, depending on the third-party library. For example, if a library exposes a public method named readProtectedFile that displays the contents of a sensitive file, then trusting that library would allow end users access to that file. Work with your Immuta support professional to determine if the risk does not apply to your environment or use case.

The trusted libraries feature allows Databricks cluster administrators to avoid . An administrator can specify an installed library as trusted, which will enable that library's code to bypass the Immuta security manager. This feature does not impact Immuta's ability to apply policies; trusting a library only allows code through that otherwise would have been blocked by the Security Manager.

The following types of libraries are supported when installing a third-party library using the Databricks UI or the Databricks Libraries API:

Library source is Upload, DBFS or DBFS/S3 and the Library Type is Jar.
Library source is Maven.

Limitations

Installing trusted libraries outside of the Databricks Libraries API (e.g., ADD JAR ...) is not supported.
Databricks installs libraries right after a cluster has started, but there is no guarantee that library installation will complete before a user's code is executed. If a user executes code before a trusted library installation has completed, Immuta will not be able to identify the library as trusted. This can be solved by either
- waiting for library installation to complete before running any third-party library commands or
- executing a Spark query. This will force Immuta to wait for any trusted Immuta libraries to complete installation before proceeding.
When installing a library using Maven as a library source, Databricks will also install any transitive dependencies for the library. However, those transitive dependencies are installed behind the scenes and will not appear as installed libraries in either the Databricks UI or using the Databricks Libraries API. Only libraries specifically listed in the IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URIS environment variable will be trusted by Immuta, which does not include installed transitive dependencies. This effectively means that any code paths that include a class from a transitive dependency but do not include a class from a trusted third-party library can still be blocked by the Immuta security manager. For example, if a user installs a trusted third-party library that has a transitive dependency of a file-util library, the user will not be able to directly use the file-util library to read a sensitive file that is normally protected by the Immuta security manager.
In many cases, it is not a problem if dependent libraries aren't trusted because code paths where the trusted library calls down into dependent libraries will still be trusted. However, if the dependent library needs to be trusted, there is a workaround:
2. In the above example, where slf4j is the transitive dependency, you would add the path dbfs:/FileStore/jars/maven/org/slf4j/slf4j-api-1.7.25.jar to the IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URIS environment variable and restart your cluster.

External catalogs

External metastores

Local mode: The metastore client running inside a cluster connects to the underlying metastore database directly via JDBC.
Remote mode: Instead of connecting to the underlying database directly, the metastore client connects to a separate metastore service via the Thrift protocol. The metastore service connects to the underlying database. When running a metastore in remote mode, DBFS is not supported.

Configure external Hive metastore

If using DBR 7.x with Hive 2.3.x, either

Set spark.sql.hive.metastore.version to 2.3.7 and spark.sql.hive.metastore.jars to builtin or
Download the metastore jars and set spark.sql.hive.metastore.jars to /databricks/hive_metastore_jars/* as before.

Configure AWS Glue Data Catalog

Notebook-scoped libraries on machine learning clusters

Scratch paths

Scratch paths are cluster-specific remote file paths that Databricks users are allowed to directly read from and write to without restriction. The creator of a Databricks cluster specifies the set of remote file paths that are designated as scratch paths on that cluster when they configure a Databricks cluster. Scratch paths are useful for scenarios where non-sensitive data needs to be written out to a specific location using a Databricks cluster protected by Immuta.

Configure a Databricks Unity Catalog Integration

Permissions

APPLICATION_ADMIN Immuta permission
The Databricks user must have the following privileges:
- Account admin
- CREATE CATALOG privilege on the Unity Catalog metastore to create an Immuta-owned catalog and tables
- (only required if enabling query audit)

Requirements

Before you configure the Databricks Unity Catalog integration, ensure that you have fulfilled the following requirements:

Unity Catalog enabled on your Databricks cluster or SQL warehouse. All SQL warehouses have Unity Catalog enabled if your workspace is attached to a Unity Catalog metastore. Immuta recommends linking a SQL warehouse to your Immuta tenant rather than a cluster for both performance and availability reasons.
If you select single user access mode for your cluster, you must
- enable serverless compute for your workspace.

Unity Catalog best practices

Ensure your integration with Unity Catalog goes smoothly by following these guidelines:

Use a Databricks SQL warehouse to configure the integration. Databricks SQL warehouses are faster to start than traditional clusters, require less management, and can run all the SQL that Immuta requires for policy administration. A serverless warehouse provides nearly instant startup time and is the preferred option for connecting to Immuta.
Move all data into Unity Catalog before configuring Immuta with Unity Catalog. The default catalog used once Unity Catalog support is enabled in Immuta is the hive_metastore, which is not supported by the Unity Catalog integration. Data sources in the Hive Metastore must be managed by the Databricks Spark integration. Existing data sources will need to be re-created after they are moved to Unity Catalog and the Unity Catalog integration is configured.

Migrate data to Unity Catalog

Ensure that all Databricks clusters that have Immuta installed are stopped and the Immuta configuration is removed from the cluster. Immuta-specific cluster configuration is no longer needed with the Databricks Unity Catalog integration.

Create the Databricks service principal

USE CATALOG and MANAGE on all catalogs containing securables registered as Immuta data sources.
USE SCHEMA on all schemas containing securables registered as Immuta data sources.
MODIFY and SELECT on all securables you want registered as Immuta data sources.

MANAGE and MODIFY are required so that the service principal can apply row filters and column masks on the securable; to do so, the service principal must also have SELECT on the securable as well as USE CATALOG on its parent catalog and USE SCHEMA on its parent schema. Since privileges are inherited, you can grant the service principal the MODIFY and SELECT privilege on all catalogs or schemas containing Immuta data sources, which automatically grants the service principal the MODIFY and SELECT privilege on all current and future securables in the catalog or schema. The service principal also inherits MANAGE from the parent catalog for the purpose of applying row filters and column masks, but that privilege must be set directly on the parent catalog in order for grants to be fully applied.

Opt to enable query audit for Unity Catalog

- USE CATALOG on the system catalog
- USE SCHEMA on the system.access schema
- SELECT on the following system tables:
  - system.access.audit
  - system.access.table_lineage
  - system.access.column_lineage

Configure the Databricks Unity Catalog integration

You have two options for configuring your Databricks Unity Catalog integration:

Automatic setup

Click the App Settings icon in the navigation menu.
Click the Integrations tab.
Click + Add Integration and select Databricks Unity Catalog from the dropdown menu.
Complete the following fields:
- Server Hostname is the hostname of your Databricks workspace.
- HTTP Path is the HTTP path of your Databricks cluster or SQL warehouse.
- Immuta Catalog is the name of the catalog Immuta will create to store internal entitlements and other user data specific to Immuta. This catalog will only be readable for the Immuta service principal and should not be granted to other users. The catalog name may only contain letters, numbers, and underscores and cannot start with a number.
If using a proxy server with Databricks Unity Catalog, click the Enable Proxy Support checkbox and complete the Proxy Host and Proxy Port fields. The username and password fields are optional.
Opt to fill out the Exemption Group field with the name of a group in Databricks that will be excluded from having data policies applied and must not be changed from the default value. Create this account-level group for privileged users and service accounts that require an unmasked view of data before configuring the integration in Immuta.
Opt to scope the query audit ingestion by entering in Unity Catalog Workspace IDs. Enter a comma-separated list of the workspace IDs that you want Immuta to ingest audit records for. If left empty, Immuta will audit all tables and users in Unity Catalog.
2. Enter how often, in hours, you want Immuta to ingest audit events from Unity Catalog as an integer between 1 and 24.
3. Continue with your integration configuration.
Select your authentication method from the dropdown:
- OAuth machine-to-machine (M2M):
  - AWS Databricks:
    Fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.cloud.databricks.com/oidc/v1/token.
    Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
  - Azure Databricks:
    Within Immuta, fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.azuredatabricks.net/oidc/v1/token.
    Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
Click Save.

Manual setup

Click the App Settings icon in the navigation menu.
Click the Integrations tab.
Click + Add Integration and select Databricks Unity Catalog from the dropdown menu.
Complete the following fields:
- Server Hostname is the hostname of your Databricks workspace.
- HTTP Path is the HTTP path of your Databricks cluster or SQL warehouse.
- Immuta Catalog is the name of the catalog Immuta will create to store internal entitlements and other user data specific to Immuta. This catalog will only be readable for the Immuta service principal and should not be granted to other users. The catalog name may only contain letters, numbers, and underscores and cannot start with a number.
If using a proxy server with Databricks Unity Catalog, click the Enable Proxy Support checkbox and complete the Proxy Host and Proxy Port fields. The username and password fields are optional.
Opt to fill out the Exemption Group field with the name of a group in Databricks that will be excluded from having data policies applied and must not be changed from the default value. Create this account-level group for privileged users and service accounts that require an unmasked view of data before configuring the integration in Immuta.
Opt to scope the query audit ingestion by entering in Unity Catalog Workspace IDs. Enter a comma-separated list of the workspace IDs that you want Immuta to ingest audit records for. If left empty, Immuta will audit all tables and users in Unity Catalog.
2. Enter how often, in hours, you want Immuta to ingest audit events from Unity Catalog as an integer between 1 and 24.
3. Continue with your integration configuration.
Select your authentication method from the dropdown:
- OAuth machine-to-machine (M2M):
  - AWS Databricks:
    Fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.cloud.databricks.com/oidc/v1/token.
    Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
  - Azure Databricks:
    Within Immuta, fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.azuredatabricks.net/oidc/v1/token.
    Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
Select the Manual toggle and copy or download the script. You can modify the script to customize your storage location for tables, schemas, or catalogs.
Run the script in Databricks.
Click Save.

Map Databricks users to Immuta

If the usernames in Immuta do not match usernames in Databricks, map each Databricks username to each Immuta user account to ensure Immuta properly enforces policies using one of the methods linked below:

Opt to enable Databricks Unity Catalog tag ingestion

Design partner preview

This feature is only available to select accounts. Reach out to your Immuta representative to enable this feature.

Requirements:

A configured Databricks Unity Catalog integration or connection
Fewer than 2,500 Databricks Unity Catalog data sources registered in Immuta

To allow Immuta to automatically import table and column tags from Databricks Unity Catalog, enable Databricks Unity Catalog tag ingestion in the external catalog section of the Immuta app settings page.

Navigate to the App Settings page.
Scroll to 2 External Catalogs, and click Add Catalog.
Enter a Display Name and select Databricks Unity Catalog from the dropdown menu.
Click Save and confirm your changes.

Register data

AWS Lake Formation

Design partner: This connection is available to select accounts. Contact your Immuta representative for details.

Amazon Athena
Amazon EMR Spark
Amazon Redshift Spectrum

The sequence diagram below outlines the events that occur when an Immuta user who is subscribed to a data source submits a query in their AWS analytic engine.

What does Immuta do in my AWS environment?

Registering a connection

Once the connection is registered in Immuta, Immuta ingests and stores connection metadata in the Immuta metadata database.

In the example below, the Immuta application administrator connects the Glue Data Catalog that contains marketing-data , research-data , and cs-data metadata. Immuta these tables as data sources and stores the table metadata in the Immuta metadata database.

Immuta presents a hierarchical view of your data that reflects the hierarchy of objects in the Glue Data Catalog. Beyond making the registration of your data more intuitive, connections provides more control. Instead of performing operations on individual schemas or tables, you can perform operations (such as object sync) at the connection level.

Applying policies

The table below outlines how two different automatic subscription policies authored in Immuta are orchestrated in Lake Formation.

Immuta actions

Example 1

Example 2

Governor authors a global policy in Immuta.

"Users may subscribe to data sources tagged Research when they are members of group Research."

"Users may subscribe to data sources tagged CS when they have the attribute training.complete."

Immuta calculates data sources affected

research-data
marketing-data

cs-data

Immuta calculates users affected.

Alex
Taylor
Deepu

Casey
Mary
Catherine

Immuta generates a group identifier for the users and data sources affected.

1234

5678

Immuta creates an LF-Tag key and value.

Immuta_policy=1234

Immuta_policy=5678

Immuta assigns the LF-Tag to the AWS resource in Lake Formation.

Assign tag Immuta_policy=1234 to research-data and marketing-data

Assign tag Immuta_policy=5678 to cs-data

Immuta grants the LF-Tag to users in Lake Formation.

GRANT (SELECT) on tag Immuta_policy=1234TO arn:aws:iam::123456:user/Alex

GRANT (SELECT) on tag Immuta_policy=1234 TO arn:aws:iam::123456:user/Taylor

GRANT (SELECT) on tag Immuta_policy=1234 TO arn:aws:iam::123456:user/Deepu

GRANT (SELECT) on tag Immuta_policy=5678 TO arn:aws:iam::123456:user/Casey

GRANT (SELECT) on tag Immuta_policy=5678 TO arn:aws:iam::123456:user/Mary

GRANT (SELECT) on tag Immuta_policy=5678 TO arn:aws:iam::123456:user/Catherine

Maintaining state with Lake Formation

The following user actions spur various processes in the Lake Formation connection so that Immuta data remains synchronous with data in Lake Formation. The list below provides an overview of each process:

Data source created: Immuta registers data source metadata and stores that metadata in the Immuta metadata database.
Data source deleted: Immuta deletes the data source metadata from the metadata database and removes LF-Tags from that AWS resource.
Automatic subscription policy applied to or updated on a data source: Immuta calculates the users and data sources affected by the policy change and generates an LF-Tag key and value.
User manually subscribed to a data source: When a user is manually added to a data source by a data owner, Immuta grants the user direct access to the table in Lake Formation.
Automatic subscription policy deleted: Immuta deletes the LF-Tag key and values.
User removed from a data source: Immuta revokes the user's access to the table or the LF-Tag.

The image below illustrates these processes.

Security and compliance

Authentication methods

The Lake Formation connection supports the following authentication methods to register a connection:

Access using AWS IAM role (recommended): Immuta will assume this IAM role from Immuta's AWS account when interacting with the AWS API to perform any operations in your AWS account. This option allows you to provide Immuta with an IAM role from your AWS account that is granted a trust relationship with Immuta's IAM role. Contact your Immuta representative for the AWS account to add to your trust policy.

Supported policies

User provisioning

See the guidelines below for the best practices to avoid this behavior if you currently use IAM roles to manage access.

Mapping IAM principals in Immuta

Names are case-sensitive

Immuta supports mapping an Immuta user to AWS in one of the following ways:

Existing S3 integrations

Existing S3 integrations have no impact on Lake Formation connections; they can be used in tandem.

Limitations and known issues

Temporary limitation: Immuta will only synchronize policies on a 10-minute schedule, so it could be up to 10 minutes from you taking an action in Immuta until Immuta starts synchronizing policies. Note that this 10-minute schedule refers to Immuta starting to synchronize, not the time it will take to complete synchronization.
You cannot use the AWS Lake Formation connection if you are using data polices on Redshift Spectrum data sources, since granting access to the underlying Glue table via the AWS Lake Formation connection would allow a user to bypass the row- and column-level security of the Immuta-managed view by querying the Glue table directly.
User query audit
AWS Lake Formation has the following limitations:
- 50 tag limit per resource
- 1000 tag limit total
- 1000 values per tag

Google BigQuery Integration

Private preview: This integration is available to select accounts. Contact your Immuta representative for details.

The Google BigQuery integration allows users to query policy protected data directly in BigQuery as secure views within an Immuta-created dataset. Immuta controls who can see what within the views, allowing data governors to create complex ABAC policies and data users to query the right data within the BigQuery console.

Configuration

Google BigQuery is configured through the Immuta console and a script provided by Immuta. While you can complete some steps within the BigQuery console, it is easiest to install using gcloud and the Immuta script.

Protect your data

Once Google BigQuery has been configured, BigQuery admins can start creating subscription and data policies to meet compliance requirements and users can start querying policy protected data directly in BigQuery.

Revoke user access to the original datasets and grant users access to the Immuta created datasets in BigQuery.
Users query data from the Immuta created datasets directly in BigQuery.

FAQs

What permissions will Immuta have in my BigQuery environment?
What integration features will Immuta support for BigQuery?
- For private preview, Immuta supports a basic version of the BigQuery integration where Immuta can enforce specific policies on data in a single BigQuery project. At this time, workspaces, tag ingestion, user impersonation, query audit, and multiple integrations are not supported.

Google BigQuery integration conceptual overview

In this policy push integration, Immuta creates views that contain all policy logic. Each view has a 1-to-1 relationship with the original table. Access controls are applied in the view, allowing users to leverage Immuta’s powerful set of attribute-based policies and query data directly in BigQuery.

BigQuery is organized by projects (which can be thought of as databases), datasets (which can be compared to schemas), tables, and views. When you enable the integration, an Immuta dataset is created in BigQuery that contains the Immuta-required user entitlements information. These objects within the Immuta dataset are intended to only be used and altered by the Immuta application.

After data sources are registered, Immuta uses the custom user and role, created before the integration is enabled, to push the Immuta data sources as views into a mirrored dataset of the original table. Immuta manages grants on the created view to ensure only users subscribed to the Immuta data source will see the data.

Secure views

Managing access

Following the principle of least privilege, Immuta does not have permission to manage Google Cloud Platform users, specifically in granting or denying access to a project and its datasets. This means that data governors should limit user access to original datasets to ensure data users are accessing the data through the Immuta created views and not the backing tables. The only users who need to have access to the backing tables are the credentials used to register the tables in Immuta.

Additionally, a data governor must grant users access to the mirrored datasets that Immuta will create and populate with views. Immuta and BigQuery’s best practice recommendation is to grant access via groups in Google Cloud Platform. Because users still must be registered in Immuta and subscribed to an Immuta data source to be able to query Immuta views, all Immuta users can be granted access to the mirrored datasets that Immuta creates.

Integration health status

Limitations

This integration can only be enabled through a manual bootstrap using the Immuta API.
This integration can only be enabled to work in a single region.

Supported policies

This integration supports the following policy types:

Column masking
- Mask using hashing (SHA256())
- Mask by making NULL
- Mask using constant
- Mask using a regular expression
- Mask by date rounding
- Mask by numeric rounding
- Mask using custom functions
Row-level masking
Row visibility based on user attributes and/or object attributes
Only show rows that fall within a given time window
Minimize rows
Filter rows using custom WHERE clause
Always hide rows

Additional resources

See the resources below to start implementing and using the BigQuery integration:

Configure the Google BigQuery integration

Follow this guide to connect your Google BigQuery data warehouse to Immuta.

Prerequisites

Immuta SaaS or Immuta v2023.1 or newer with Google BigQuery integration (PrPr) enabled.

Google Cloud service account and role used by Immuta to connect to Google BigQuery

The Google BigQuery integration requires you to create a Google Cloud service account and role that will be used by Immuta to

create a Google BigQuery dataset that will be used to store a table of user entitlements, UDFs for policy enforcement, etc.
manage the table of user entitlements via updates when entitlements change in Immuta.
create datasets and secure views with access control policies enforced, which mirror tables inside of datasets you ingest as Immuta data sources.

You have two options to create the required Google Cloud service account and role:

The Immuta script

The bootstrap.sh script is a shell script provided by Immuta that creates prerequisite Google Cloud IAM objects for the integration to connect. When you run this script from your command line, it will create the following items, :

A new Google Cloud IAM role
A new Google Cloud service account, which will be granted the newly-created role
A JSON keyfile for the newly-created service account

Google Cloud IAM roles required to run the script

To execute bootstrap.sh from your command line, you must be authenticated to the gcloud CLI utility as a user with all of the following roles:

roles/iam.roleAdmin
roles/iam.serviceAccountAdmin
roles/serviceusage.serviceUsageAdmin

Having these three roles is the least-privilege set of Google Cloud IAM roles required to successfully run the bootstrap.sh script from your command line. However, having either of the following Google Cloud IAM roles will also allow you to run the script successfully:

roles/editor
roles/owner

Create a service account and role by running the script provided by Immuta

Set the account property in the core section for Google Cloud CLI to the account gcloud should use for authentication. (You can run gcloud auth list to see your currently available accounts):
```
gcloud config set account ACCOUNT
```
In Immuta, navigate to the App Settings page and click the Integrations tab.
Click Add Integration and select Google BigQuery from the dropdown menu.
Click Select Authentication Method and select Key File.
Click Download Script(s).
Before you run the script, update your permissions to execute it:
```
chmod 755 <path to downloaded script>
```
Run the script, where
- PROJECT_ID is the Google Cloud Platform project to operate on.
- ROLE_ID is the name of the custom role to create.
- NAME will create a service account with the provided name.
- OUTPUT_FILE is the path where the resulting private key should be written. File system write permission will be checked on the specified path prior to the key creation.
- undelete-role (optional) will undelete the custom role from the project. Roles that have been deleted for a long time can't be undeleted. This option can fail for the following reasons:
  - The role specified does not exist.
  - The active user does not have permission to access the given role.
- enable-api (optional) provided you’ve been granted access to enable the Google BigQuery API, will enable the service.
```
$ bootstrap.sh \
    --project PROJECT_ID \
    --role ROLE_ID \
    --service_account NAME \
    --keyfile OUTPUT_FILE \
    [--undelete-role] \
    [--enable-api]
```

Create a service account and role by using Google Cloud console

Alternatively, you may use the Google Cloud Console to create the prerequisite role, service account, and private key file for the integration to connect to Google BigQuery.

- bigquery.datasets.create
- bigquery.datasets.delete
- bigquery.datasets.get
- bigquery.datasets.update
- bigquery.jobs.create
- bigquery.jobs.get
- bigquery.jobs.list
- bigquery.jobs.listAll
- bigquery.routines.create
- bigquery.routines.delete
- bigquery.routines.get
- bigquery.routines.list
- bigquery.routines.update
- bigquery.tables.create
- bigquery.tables.delete
- bigquery.tables.export
- bigquery.tables.get
- bigquery.tables.getData
- bigquery.tables.list
- bigquery.tables.setCategory
- bigquery.tables.update
- bigquery.tables.updateData
- bigquery.tables.updateTag

Enable the Google BigQuery integration

In Immuta, navigate to the App Settings page and click the Integrations tab.
Click Add Integration and select Google BigQuery from the dropdown menu.
Click Select Authentication Method and select Key File.
- Project Id: The Google Cloud Platform project to operate on, where your Google BigQuery data warehouse is located. A new dataset will be provisioned in this Google BigQuery project to store the integration configuration.
Complete the following fields:
- Immuta Dataset: The name of the Google BigQuery dataset to provision inside of the project. Important: if you are using multiple environments in the same Google BigQuery project, this dataset to provision must be unique across environments.
- Dataset Suffix: The suffix that will be postfixed to the name of each dataset created to store secure views, one per dataset that you ingest a table for as a data source in Immuta. Important: if you are using multiple environments in the same Google BigQuery project, this suffix must be unique across environments.
- GCP Location: The dataset’s location. After a dataset is created, the location can't be changed. Note that
  - If you choose EU for the dataset location, your Core BigQuery Customer Data resides in the EU.
Click Test Google BigQuery Integration.
Click Save.

GCP location must match dataset region

The region set for the GCP location must match the region of your datasets. Set GCP location to a general region (for example, US) to include child regions.

Disable the Google BigQuery integration

You can disable the Google BigQuery integration automatically or manually.

Automatically disable integration

Click the App Settings icon, and then click the Integrations tab.
Select the Google BigQuery integration you would like to disable, and select the Disable Integration checkbox.
Click Save.

Manually disable integration

The privileges required to run the cleanup script are the same as the Google Cloud IAM roles required to run the bootstrap.sh script.

Click the App Settings icon, and then click the Integrations tab.
Select the Google BigQuery integration you would like to disable, and click Download Scripts.
Click Save. Wait until Immuta has finished saving your configuration changes before proceeding.
Before you run the script, update your permissions to execute it:
```
chmod 755 <path to downloaded script>
```
Run the cleanup script.

Next steps

Databricks Unity Catalog Integration Reference Guide

Immuta’s integration with Unity Catalog allows you to enforce fine-grained access controls on Unity Catalog securable objects with Immuta policies. Instead of manually creating UDFs or granting access to each table in Databricks, you can author your policies in Immuta and have Immuta manage and orchestrate Unity Catalog access-control policies on your data in Databricks clusters or SQL warehouses:

Subscription policies: Immuta subscription policies automatically grant and revoke access to specific Databricks securable objects.

Unity Catalog object model

Unity Catalog uses the following hierarchy of data objects:

Metastore: Created at the account level and is attached to one or more Databricks workspaces. The metastore contains metadata of all the catalogs, schemas, and tables available to query. All clusters on that workspace use the configured metastore and all workspaces that are configured to use a single metastore share those objects.
Catalog: Sits on top of schemas (also called databases) and tables to manage permissions across a set of schemas
Schema: Organizes tables and views
Table-etc: Table (managed or external tables), view, volume, model, and function

Feature support

The Databricks Unity Catalog integration supports

- applying column masks and row filters on specific securable objects
- applying subscription polices on tables and views
enforcing Unity Catalog access controls, even if Immuta becomes disconnected
allowing non-Immuta reads and writes
using Photon
using a proxy server

What does Immuta do in my Databricks environment?

Immuta uses this service principal to run queries that set up user-defined functions (UDFs) and other data necessary for policy enforcement. Upon enabling the integration, Immuta will create a catalog that contains these schemas:

immuta_system: Contains internal Immuta data.
immuta_policies_n: Contains policy UDFs.

When policies require changes to be pushed to Unity Catalog, Immuta updates the internal tables in the immuta_system schema with the updated policy information. If necessary, new UDFs are pushed to replace any out-of-date policies in the immuta_policies_n schemas and any row filters or column masks are updated to point at the new policies. Many of these operations require compute on the configured Databricks cluster or SQL warehouse, so compute must be available for these policies to succeed.

Workspace-catalog binding

Use cases

Typical use cases for binding a catalog to specific workspaces include

Ensuring users can only access production data from a production workspace environment.
For example, you may have production data in a prod_catalog, as well as a production workspace you are introducing to your organization. Binding the prod_catalog to the prod_workspace ensures that workspace admins and users can only access prod_catalog from the prod_workspace environment.
Ensuring users can only process sensitive data from a specific workspace. Limiting the environments from which users can access sensitive data helps better secure your organization’s data. Limiting access to one workspace also simplifies any monitoring, auditing, and understanding of which users are accessing specific data. This would entail a similar setup as the example above.
Giving users read-only access to production data from a developer workspace.
This enables your organization to effectively conduct development and testing, while minimizing risk to production data. All user access to this catalog from this workspace can be specified as read-only, ensuring developers can access the data they need for testing without risk of any unwanted updates.

Additional workspace connections

Limitations

Each additional workspace connection must be in the same metastore as the primary workspace used to set up the integration.
No two additional workspace connections can be responsible for the same catalog.

Databricks Unity Catalog privileges

The privileges the Databricks Unity Catalog integration requires align to the least privilege security principle. The table below describes each privilege required in Databricks Unity Catalog for the and the .

Databricks Unity Catalog privilege

User requiring the privilege

Explanation

Account admin

Setup user

This privilege allows the setup user to grant the Immuta service principal the necessary permissions to orchestrate Unity Catalog access controls and maintain state between Immuta and Databricks Unity Catalog.

CREATE CATALOG on the Unity Catalog metastore

Setup user

This privilege allows the setup user to create an Immuta-owned catalog and tables.

Metastore admin

Setup user

USE CATALOG and MANAGE on all catalogs containing securables registered as Immuta data sources
USE SCHEMA on all schemas containing securables registered as Immuta data sources

Immuta service principal

These privileges allow the service principal to apply row filters and column masks on the securable.

MODIFY and SELECT on all securables registered as Immuta data sources

Immuta service principal

OWNER on the Immuta catalog

Immuta service principal

The Immuta service principal must own the catalog Immuta creates during setup that stores the Immuta policy information. The Immuta setup script grants ownership of this catalog to the Immuta service principal when you configure the integration.

USE CATALOG on the system catalog

USE SCHEMA on the system.access schema

SELECT on the following system tables:
- system.access.audit
- system.access.table_lineage
- system.access.column_lineage

Immuta service principal

These privileges allow Immuta to audit user queries in Databricks Unity Catalog.

Policy enforcement

Immuta’s Unity Catalog integration applies Databricks table-, row-, and column-level security controls that are enforced natively within Databricks. Immuta's management of these Databricks security controls is automated and ensures that they synchronize with Immuta policy or user entitlement changes.

Row-level security: Immuta applies SQL UDFs to restrict access to rows for querying users.
Column-level security: Immuta applies column-mask SQL UDFs to tables for querying users. These column-mask UDFs run for any column that requires masking.

The Unity Catalog integration supports the following policy types:

- Conditional masking
- Constant
- Custom masking
- Hashing
- Null (including on ARRAY, MAP, and STRUCT type columns)
- Rounding (date and numeric rounding)
- Matching (only show rows where)
  - Custom WHERE
  - Never
  - Where user
  - Where value in column
- Minimization
- Time-based restrictions

Project-scoped purpose exceptions for Databricks Unity Catalog

Databricks Unity Catalog views

If you are using views in Databricks Unity Catalog, one of the following must be true for project-scoped purpose exceptions to apply to the views in Databricks:

The view and underlying table are registered as Immuta data sources and added to a project: If a view and its underlying table are both added as Immuta data sources, both of these assets must be added to the project for the project-scoped purpose exception to apply. If a view and underlying table are both added as data sources but the table is not added to an Immuta project, the purpose exception will not apply to the view because Databricks does not support fine-grained access controls on views.
Only the underlying table is registered as an Immuta data source and added to a project: If only the underlying table is registered as an Immuta data source but the view is not registered, the purpose exception will apply to both the table and corresponding view in Databricks. Views are the only Databricks object that will have Immuta policies applied to them even if they're not registered as Immuta data sources (as long as their underlying tables are registered).

Masked joins for Databricks Unity Catalog

Policy exemption groups

Some users may need to be exempt from masking and row-level policy enforcement. When you add user accounts to the configured exemption group in Databricks, Immuta will not enforce policies for those users. Exemption groups are created when the Unity Catalog integration is configured, and no policies will apply to these users' queries, despite any policies enforced on the tables they query.

The principal used to register data sources in Immuta will be automatically added to this exemption group for that Databricks table. Consequently, users added to this list and used to register data sources in Immuta should be limited to service accounts.

Policy support with `hive_metastore`

When enabling Unity Catalog support in Immuta, the catalog for all Databricks data sources will be updated to point at the default hive_metastore catalog. Internally, Databricks exposes this catalog as a proxy to the workspace-level Hive metastore that schemas and tables were kept in before Unity Catalog. Since this catalog is not a real Unity Catalog catalog, it does not support any Unity Catalog policies. Therefore, Immuta will ignore any data sources in the hive_metastore in any Databricks Unity Catalog integration, and policies will not be applied to tables there.

Authentication methods

The Databricks Unity Catalog integration supports the following authentication methods to configure the integration and create data sources:

Integration health status

Immuta data sources in Unity Catalog

Supported object types

Table
View
Materialized view
Streaming table
External table
Foreign table

External data connectors and query-federated tables

Query audit

Access requirements

For Databricks Unity Catalog audit to work, Immuta must have, at minimum, the following access.

USE CATALOG on the system catalog
USE SCHEMA on the system.access schema
SELECT on the following system tables:
- system.access.audit
- system.access.table_lineage
- system.access.column_lineage

Tag ingestion

Design partner preview: This feature is available to select accounts. Reach out to your Immuta representative to enable this feature.

You can enable tag ingestion to allow Immuta to ingest Databricks Unity Catalog table and column tags so that you can use them in Immuta policies to enforce access controls. When you enable this feature, Immuta uses the credentials and connection information from the Databricks Unity Catalog integration to pull tags from Databricks and apply them to data sources as they are registered in Immuta. If Databricks data sources preexist the Databricks Unity Catalog tag ingestion enablement, those data sources will automatically sync to the catalog and tags will apply. Immuta checks for changes to tags in Databricks and syncs Immuta data sources to those changes every 24 hours.

Syncing tag changes

When syncing data sources to Databricks Unity Catalog tags, Immuta pulls the following information:

Table tags: These tags apply to the table and appear on the data source details tab. Databricks tags' key and value pairs are reflected in Immuta as a hierarchy with each level separated by a . delimiter. For example, the Databricks Unity Catalog tag Location: US would be represented as Location.US in Immuta.
Column tags: These tags are applied to data source columns and appear on the columns listed in the data dictionary tab. Databricks tags' key and value pairs are reflected in Immuta as a hierarchy with each level separated by a . delimiter. For example, the Databricks Unity Catalog tag Location: US would be represented as Location.US in Immuta.
Table comments field: This content appears as the data source description on the data source details tab.
Column comments field: This content appears as dictionary column descriptions on the data dictionary tab.

Limitations

Only tags that apply to Databricks data sources in Immuta are available to build policies in Immuta. Immuta will not pull tags in from Databricks Unity Catalog unless those tags apply to registered data sources.
Cost implications: Tag ingestion in Databricks Unity Catalog requires compute resources. Therefore, having many Databricks data sources or frequently manually syncing data sources to Databricks Unity Catalog may incur additional costs.
Databricks Unity Catalog tag ingestion only supports tenants with fewer than 2,500 data sources registered.

Configuration requirements

Unity Catalog caveats

Row access policies with more than 1023 columns are unsupported. This is an underlying limitation of UDFs in Databricks. Immuta will only create row access policies with the minimum number of referenced columns. This limit will therefore apply to the number of columns referenced in the policy and not the total number in the table.
If you disable table grants, Immuta revokes the grants. Therefore, if users had access to a table before enabling Immuta, they’ll lose access.
You must use the global regex flag (g) when creating a regex masking policy in this integration, and you cannot use the case insensitive regex flag (i) when creating a regex masking policy in this integration. See the examples below for guidance:
- regex with a global flag (supported): /^ssn|social ?security$/g
- regex without a global flag (unsupported): /^ssn|social ?security$/
- regex with a case insensitive flag (unsupported): /^ssn|social ?security$/gi
- regex without a case insensitive flag (supported): /^ssn|social ?security$/g

Azure Databricks Unity Catalog limitation

If a registered data source is owned by a Databricks group at the table level, then the Unity Catalog integration cannot apply data masking policies to that table in Unity Catalog.

Therefore, set all table-level ownership on your Unity Catalog data sources to an individual user or service principal instead of a Databricks group. Catalogs and schemas can still be owned by a Databricks group, as ownership at that level doesn't interfere with the integration.

Feature limitations

The following features are currently unsupported:

Databricks change data feed support
Immuta project workspaces
Multiple IAMs on a single cluster
Column masking policies on views
Mixing masking policies on the same column
Row-redaction policies on views
R and Scala cluster support
Scratch paths
User impersonation
Policy enforcement on raw Spark reads
Python UDFs for advanced masking functions
Direct file-to-SQL reads
Data policies (except for masking with NULL) on ARRAY, MAP, or STRUCT type columns
Shallow clones

Known issue

Snippets for Databricks data sources may be empty in the Immuta UI.

Snowflake Integration

Snowflake Enterprise Edition required

Like with all Immuta integrations, Immuta can inject its ABAC model into policy building and administration to remove policy management burden and significantly reduce role explosion.

How the integration works

Data flow

Immuta creates a database inside the configured Snowflake warehouse that contains Immuta policy definitions and user entitlements.
The Immuta web service calls a stored procedure that modifies the user entitlements or policies.
A Snowflake user who is subscribed to the data source in Immuta queries the corresponding table directly in Snowflake and sees policy-enforced data.

Policy enforcement

For a user to query Immuta-protected data, they must meet two qualifications:

They must be subscribed to the Immuta data source.

After a user has met these qualifications they can query Snowflake tables directly.

Comply with column length and precision requirements in a Snowflake masking policy

Consider these columns in a data source that have the following masking policies applied:

Column A (VARCHAR(6)): Mask using hashing for everyone
Column B (VARCHAR(5)): Mask using a constant REDACTED for everyone
Column C (VARCHAR(6)): Mask by making null for everyone
Column D (NUMBER(3, 0)): Mask by rounding to the nearest 10 for everyone

Querying this data source in Snowflake would return the following values:

5w4502

REDAC

null

990

6e3611

REDAC

null

750

9s7934

REDAC

null

380

Hashing collisions

Hashing collisions are more likely to occur across or within Snowflake columns restricted to short lengths, since Immuta truncates the hashed value to the limit of the column. (Hashed values truncated to 5 characters have a higher risk of collision than hashed values truncated to 20 characters.) Therefore, avoid applying hashing policies to Snowflake columns with such restrictions.

Query performance

Snowflake privileges

The privilege grants the Snowflake integration requires align to the least privilege security principle. The table below describes each privilege required in Snowflake for the , the user, or the . The references to IMMUTA_DB , IMMUTA_WH, and IMMUTA_IMPERSONATOR_ROLE in the table can be replaced with what you chose for the name of your Immuta database, warehouse, and impersonation role when setting up the integration, respectively.

Snowflake privilege

User requiring privilege

Features

Explanation

CREATE DATABASE ON ACCOUNT WITH GRANT OPTION

Setup user

All

The setup script this user runs creates an Immuta database in your organization's Snowflake account where all Immuta managed objects (UDFs, masking policies, row access policies, and user entitlements) will be written and stored.

CREATE ROLE ON ACCOUNT WITH GRANT OPTION

Setup user

All

The setup script this user runs creates a ROLE for Immuta that will be used to manage the integration once it has been initialized.

CREATE USER ON ACCOUNT WITH GRANT OPTION

Setup user

All

The setup script this user runs creates the IMMUTA_SYSTEM_ACCOUNT user that Immuta will use to manage the integration.

MANAGE GRANTS ON ACCOUNT

Setup user

All

The user configuring the integration must be able to GRANT global privileges and access to objects within the Snowflake account. All privileges that are documented here are granted to the IMMUTA_SYSTEM_ACCOUNT user by this setup user.

OWNERSHIP ON ROLE IMMUTA_IMPERSONATOR_ROLE

IMMUTA_SYSTEM_ACCOUNT user

Impersonation

If impersonation is enabled, Immuta must be able to manage the Snowflake roles used for impersonation, which is created when the setup script runs, in order to manage the impersonation feature.

ALL PRIVILEGES ON DATABASE IMMUTA_DB

ALL PRIVILEGES ON ALL SCHEMAS IN DATABASE IMMUTA_DB

USAGE ON FUTURE PROCEDURES IN SCHEMA IMMUTA_DB.IMMUTA_PROCEDURES

IMMUTA_SYSTEM_ACCOUNT user

All

The setup script grants the Immuta system account user these privileges because Immuta must have full ownership of the Immuta database where Immuta objects are managed.

USAGE ON WAREHOUSE IMMUTA_WH

IMMUTA_SYSTEM_ACCOUNT user

All

To make changes to state in the Immuta database, Immuta requires access to compute (a Snowflake warehouse). Some state changes are DDL operations, and others are DML and require compute.

IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE

IMMUTA_SYSTEM_ACCOUNT user

Audit

APPLY MASKING POLICY ON ACCOUNT

APPLY ROW ACCESS POLICY ON ACCOUNT

IMMUTA_SYSTEM_ACCOUNT user

Snowflake integration with governance features enabled

MANAGE GRANTS ON ACCOUNT

IMMUTA_SYSTEM_ACCOUNT user

Table grants

Immuta must be able to MANAGE GRANTS on objects throughout your organization's Snowflake account.

CREATE ROLE ON ACCOUNT

IMMUTA_SYSTEM_ACCOUNT user

Table grants

When using the table grants feature, Immuta must be able to create roles as targets for Immuta subscription policy permissions in your organization’s Snowflake account.

USAGE on all databases and schemas with registered data sources

REFERENCES on all tables and views registered in Immuta

Metadata registration user

Data source registration

Immuta must be able to see metadata on securables to register them as data sources and populate the data dictionary.

SELECT on all tables and views registered in Immuta

Metadata registration user

Sensitive data discovery and specialized masking policies that require fingerprinting

APPLY TAG ON ACCOUNT

Metadata registration user

Tag ingestion

IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE

Metadata registration user

Tag ingestion

USAGE ON DATABASE IMMUTA_DB

USAGE ON SCHEMA IMMUTA_DB.IMMUTA_PROCEDURES

USAGE ON SCHEMA IMMUTA_DB.IMMUTA_FUNCTIONS

USAGE ON FUTURE FUNCTIONS IN SCHEMA IMMUTA_DB.IMMUTA_FUNCTIONS

USAGE ON SCHEMA IMMUTA_DB.IMMUTA_SYSTEM

SELECT ON IMMUTA_DB.IMMUTA_SYSTEM.USER_PROFILE

PUBLIC role

All

Immuta has stored procedures and functions that are used for policy enforcement and do not expose or contain any sensitive information. These objects must be accessible by all users to facilitate the use and creation of policies or views to enforce Immuta policies in Snowflake.

SELECT ON IMMUTA_DB.IMMUTA_SYSTEM.ALLOW_LIST

PUBLIC role

All

Immuta retains a list of excepted roles and users when using the Snowflake integration. The roles and users in this list will be exempt from policies applied to tables in Snowflake to give organizations flexibility in case there are entities that should not be bound to Immuta policies in Snowflake (for example, a system or application role or user).

Integration health status

Registering data sources

Register Snowflake data sources using a dedicated Snowflake role. Avoid using individual user accounts for data source onboarding. Instead, create a service account (Snowflake user account TYPE=SERVICE) with SELECT access for onboarding data sources. No policies will apply to that account, ensuring that your integration works with the following use cases:

Snowflake bulk data source creation

Private preview: This feature is available to select accounts. Contact your Immuta representative to enable this feature.

Bulk data source creation is the more efficient process when loading more than 5000 data sources from Snowflake and allows for data sources to be registered in Immuta before running sensitive data discovery or applying policies.

Resource allocations

Based on performance tests that create 100,000 data sources, Immuta recommends a SaaS XL environment.

Limitations

Performance gains are limited when enabling sensitive data discovery at the time of data source creation.
External catalog integrations are not recognized during bulk data source creation. Users must manually trigger a catalog sync for tags to appear on the data source through the data source's health check.

Excepted roles/users

Excepted roles and users are assigned when the integration is installed, and no policies will apply to these users' queries, despite any Immuta policies enforced on the tables they are querying. Credentials used to register a data source in Immuta will be automatically added to this excepted list for that Snowflake table. Consequently, roles and users added to this list and used to register data sources in Immuta should be limited to service accounts.

Immuta excludes the listed roles and users from policies by wrapping all policies in a CASE statement that will check if a user is acting under one of the listed usernames or roles. If a user is, then the policy will not be acted on the queried table. If the user is not, then the policy will be executed like normal. Immuta does not distinguish between role and username, so if you have a role and user with the exact same name, both the user and any user acting under that role will have full access to the data sources and no policies will be enforced for them.

Authentication methods

The Snowflake integration supports the following authentication methods to configure the integration and create data sources:

Username and password: Users can authenticate with their Snowflake username and password.

Snowflake External OAuth

Workflow

An Immuta application administrator configures the Snowflake integration or creates a data source.
Immuta creates a custom token and sends it to the authorization server.
The authorization server confirms the information sent from Immuta and issues an access token to Immuta.
Immuta sends the access token it received from the authorization server to Snowflake.
Snowflake authenticates the token and grants access to the requested resources from Immuta.
The integration is connected and users can query data.

Supported Snowflake features

The Immuta Snowflake integration supports the following Snowflake features:

Supported Immuta features

The Snowflake integration supports the Immuta features outlined below. Click the links provided for more details.

Immuta project workspaces

Immuta system account required Snowflake privileges

CREATE [OR REPLACE] PROCEDURE
DROP ROLE
REVOKE ROLE

Caveat

To use project workspaces with the Snowflake integration, the default role of the account used to create data sources in the project must be added to the "Excepted Roles/Users List." If the role is not added, you will not be able to query the equalized view using the project role in Snowflake.

Tag ingestion

You can enable Snowflake tag ingestion so that Immuta will ingest Snowflake object tags from your Snowflake instance into Immuta and add them to the appropriate data sources.

The Snowflake tags' key and value pairs will be reflected in Immuta as two levels: the key will be the top level and the value the second. As Snowflake tags are hierarchical, Snowflake tags applied to a database will also be applied to all of the schemas in that database, all of the tables within those schemas, and all of the columns within those tables. For example: If a database is tagged PII, all of the tables and columns in that database will also be tagged PII.

Credentials

Caveats

Query audit

Multiple Snowflake instances

Caveats

There can only be one integration connection with Immuta per host.
The host of the data source must match the host of the integration for the view to be created.
Projects can only be configured to use one Snowflake host.

Limitations

Once a Snowflake integration is disabled in Immuta, the user must remove the access that was granted in Snowflake. If that access is not revoked, users will be able to access the raw table in Snowflake.
Migration must be done using the credentials and credential method (automatic or bootstrap) used to configure the integration.
When configuring one Snowflake instance with multiple Immuta tenants, the user or system account that enables the integration on the app settings page must be unique for each Immuta tenant.
You cannot add a masking policy to an external table column while creating the external table because a masking policy cannot be attached to a virtual column.
Snowflake tables from imported databases are not supported. Instead, create a view of the table and register that view as a data source.

Custom WHERE clause limitations

Requirements for a custom WHERE policy

All column names must be fully qualified: Any column names that are unqualified (i.e., just the column name) will default to a column of the data source the policy is being applied to (if one matches the name).
The Immuta system account must have SELECT privileges on all tables/views referenced in a subquery: The Immuta system role name is specified by the user, and the role is created when the Snowflake instance is integrated.

Subquery limitations

Any subqueries that error in Snowflake will also error in Immuta.

Including one or more subqueries in the Immuta policy condition may cause errors in Snowflake. If an error occurs, it may happen during policy creation or at query-time. To avoid these errors, limit the number of subqueries, limit the number of JOIN operations, and simplify WHERE clause conditions.
For more information on the Snowflake subquery limitations see

Starburst (Trino) Integration Reference Guide

Starburst and Trino

The Starburst (Trino) integration allows you to access policy-enforced data directly in your Starburst catalogs without rewriting queries or changing workflows. Instead of generating policy-enforced views and adding them to an Immuta catalog that users have to query (like in the legacy Starburst (Trino) integration), Immuta policies are translated into Starburst (Trino) rules and permissions and applied directly to tables within users’ existing catalogs.

Architecture

Rotating the Immuta API key

When you configure the integration, Immuta generates an API key for you to add to your Immuta access control properties file for API authentication between Starburst (Trino) and Immuta. You can rotate this shared secret to mitigate potential security risks and comply with your organizational policies.

Policy enforcement

When a user queries a table in Starburst, the Trino Execution Engine reaches out to the Immuta plugin to determine what the user is allowed to see:

masking policies: For each column, Starburst (Trino) requests a view expression from the Immuta plugin. If there is a masking policy on the column, the Immuta plugin returns the corresponding view expression for that column. Otherwise, nothing is returned.
row-level policies: For each table, Starburst (Trino) requests the rows a user can see in a table from Immuta. If there is a WHERE clause policy on the data source, Immuta returns the corresponding view expression as a WHERE clause. Otherwise, nothing is returned.

The Immuta plugin then requests policy information about the tables being queried from the Immuta Web Service and sends this information to the Trino Execution Engine. Finally, the Trino Execution Engine constructs the SQL statement, executes it on the backing tables to apply the policies, and returns the response to the user.

System access control providers

Users cannot bypass Immuta controls by changing roles in their system access control provider.

Multiple system access control providers can be configured in the Starburst (Trino) integration. This approach allows Immuta to work with existing Starburst (Trino) installations that already have an access control provider configured.

Immuta does not manage all permissions in Starburst (Trino) and will default to allowing access to anything Immuta does not manage so that the Starburst (Trino) integration complements existing controls. For example, if the Starburst (Trino) integration is configured to allow users write access to tables that are not protected by Immuta, you can still lock down write access for specific non-Immuta tables using an additional access control provider.

If you have multiple access control providers configured, those providers interact in the following ways:

For a user to have access to a resource (catalog, schema, or a table), that user must have access in all of the configured access control providers.
In catalog, schema, or table filtering (such as show catalogs, show schemas, or show tables), the user will see the intersection of all access control providers. For example, if a Starburst (Trino) environment includes the catalogs public, demo, and restricted and one provider restricts a user from accessing the restricted catalog and another provider restricts the user from accessing the demo catalog, running show catalogs will only return the public catalog for that user.
Only one column masking policy can be applied per column across all system access control providers. If two or more access control providers return a mask for a column, Starburst (Trino) will throw an error at query time.
For row filtering policies, the expression for each system access control provider is applied one after the other.

Starburst (Trino) query passthrough

Starburst (Trino) query passthrough is available in most connectors using the query table function or raw_query in the Elasticsearch connector. Consequently, Immuta blocks functions named raw_query or query, as those table functions would completely bypass Immuta’s access controls.

For example, without blocking those functions, this query would access the public.customer table directly:

select * from table(postgres.system.query(query => 'select * from public.customer limit 10'));

Data flow

An Immuta Application Administrator configures the Starburst (Trino) integration, adding the ImmutaSystemAccessControl plugin on their Starburst (Trino) node.
Data source metadata, tags, user metadata, and policy definitions are stored in Immuta's Metadata Database.
The Trino Execution Engine calls various methods on the interface to ask the ImmutaSystemAccessControl plugin where the policies should be applied. The masking and row-level security methods apply the actual policy expressions.
The Immuta System Access Control plugin calls the Immuta Web Service to retrieve policy information for that data source for the querying user, using the querying user's project, purpose, and entitlements.
The Immuta System Access Control plugin provides the SQL view expression (for masked columns) or WHERE clause SQL view expression (for row filtering) to the Trino Execution Engine.
The Trino Execution Engine constructs and executes the SQL statement on the backing catalogs and retrieves the data with appropriate policy enforcement.
User sees policy-enforced data.

Authentication methods

The Starburst (Trino) integration supports the following authentication methods to create data sources in Immuta:

Username and password: You can authenticate with your Starburst (Trino) username and password.

OAuth authentication for creating data sources

Configure JWT authentication method in Starburst (Trino)

When using OAuth authentication to create data sources in Immuta, configure your Starburst (Trino) cluster to use JWT authentication, not OpenID Connect or OAuth.

When users query a Starburst data source, Immuta sends a username with the view SQL so that policies apply in the right context. Since OAuth authentication does not require a username to be associated with a data source upon data source creation, Immuta does not send a username and Starburst queries fail. To avoid this error, you must configure a global admin username.

If you are using OAuth or asynchronous authentication to create Starburst data sources, work with your Immuta representative to configure the globalAdminUsername property.

Integration health status

Supported Starburst (Trino) feature

Starburst (Trino)-created logical view support

The descriptions below provide guidance for applying policies to Starburst (Trino)-created logical views in the

However, there are other approaches you can use to apply policies to Starburst (Trino)-created logical views. The examples below are the simplest approaches.

Views created in the `DEFINER` security mode

For views created using the DEFINER security mode,

ensure the user who created the view is configured as an admin user in the Immuta plugin so that policies are never applied to the underlying tables.
create Immuta data sources and apply policies to logical views exposing those tables.
lock down access to the underlying tables in Starburst (Trino) so that all end user access is provided through the views.

Views created in the `INVOKER` security mode

Applying policies to views or tables

Avoid creating data policies for both a logical view and its underlying tables. Instead, apply policies to the logical view or the underlying tables.

For views created using the INVOKER security mode, the querying user needs access to the logical view and underlying tables.

If non-Immuta table reads are disabled, provide access to the views and tables through Immuta. To do so, create Immuta data sources for the view and underlying tables, and grant access to the querying user in Immuta. If creating data policies, apply the policies to either the view or underlying tables, not both.
If non-Immuta table reads are enabled, the user already has access to the table and view. Create Immuta data sources and apply policies to the underlying table; this approach will enforce access controls for both the table and view in Starburst (Trino).

Supported Immuta features

Query audit

In addition to the information included on the Starburst (Trino) Audit Logs page, the audit logs payload in the Starburst (Trino) integration includes immutaPlanningDuration, which represents the planning overhead in Immuta.

Multiple Starburst (Trino) integrations

You can configure multiple Starburst (Trino) integrations with a single Immuta tenant and use them dynamically. Configure the integration once in Immuta to use it in multiple Starburst (Trino) clusters. However, consider the following limitations:

Names of catalogs cannot overlap because Immuta cannot distinguish among them.
A combination of cluster types on a single Immuta tenant is supported unless your Trino cluster is configured to use a proxy. In that case, you can only connect either Trino clusters or Starburst clusters to the same Immuta tenant.

Policy caveat

Limit your masked joins to columns with matching column types. Starburst truncates the result of the masking expression to conform to the column type when performing the join, so joining two masked columns with different data types produces invalid results when one of the columns' lengths is less than the length of the masked value.

For example, if the value of a hashed column is 64 characters, joining a hashed varchar(50) and a hashed varchar(255) column will not be joined correctly, since the varchar(50) value is truncated and doesn’t match the varchar(255) value.

Customize Read and Write Access Policies for Starburst (Trino)

Private preview: Write policies are available to select accounts. Contact your Immuta representative to enable this feature.

Requirements

Starburst (Trino) version 438 or newer
Write policies for Starburst (Trino) enabled. Contact your Immuta representative to get this feature enabled on your account.

Configuration options

Immuta web service: Configure write policies in the Immuta web service to allow all Starburst (Trino) clusters targeting that Immuta tenant to receive the same write policy configuration for data sources. This configuration will only affect tables or views registered as Immuta data sources.

Immuta web service configuration

Contact your Immuta representative to configure read and write access in the Immuta web service if all Starburst (Trino) data source operations should be affected identically across Starburst (Trino) clusters connected to your Immuta tenant. A configuration example is provided below.

Configuration example

The following example maps WRITE to READ, WRITE and OWN permissions and READ to just READ. Both READ and WRITE permissions should always include READ:

Starburst cluster configuration

Configure the integration to allow read and write policies to apply to any data source (registered or unregistered in Immuta) on a Starburst cluster.

Create the Immuta access control configuration file in the Starburst configuration directory (/etc/starburst/immuta-access-control.properties for Docker installations or <starburst_install_directory>/etc/immuta-access-control.properties for standalone installations).
Modify one or both properties below to customize the behavior of read or write access policies for all users:
- immuta.allowed.immuta.datasource.operations: This property governs objects (catalogs, schemas, tables, etc.) that are registered as data sources in Immuta. These permissions apply to all querying users except for administrators defined in immuta.user.admin (who get all permissions).
  - READ: Grants SELECT on tables or views; grants SHOW on tables, views, or columns
  - WRITE: Grants INSERT, UPDATE, DELETE, MERGE, or TRUNCATE on tables; grants REFRESH on materialized views.
  - OWN: Grants ALTER and DROP on tables; grants SET on comments and properties
- immuta.allowed.non.immuta.datasource.operations: This property governs objects (catalogs, schemas, tables, etc.) that are not registered as data sources in Immuta. Use all or a combination of the following access values:
  - READ: Grants SELECT on tables or views; grants SHOW on tables, views, or columns
  - WRITE: Grants INSERT, UPDATE, DELETE, MERGE, or TRUNCATE on tables; grants REFRESH on materialized views.
  - OWN: Grants ALTER and DROP on tables; grants SET on comments and properties
  - CREATE: Grants CREATE on catalogs, schema, tables, and views. This is the only property that can allow CREATE permissions, since CREATE is enforced on new objects that do not exist in Starburst or Immuta yet (such as a new table being created with CREATE TABLE).
For example, the following configuration allows READ, WRITE, and OWN operations to be authorized on data sources registered in Immuta and all operations are permitted on data that is not registered in Immuta:
Enable the Immuta access control plugin in the Starburst cluster's configuration file (/etc/starburst/config.properties for Docker installations or <starburst_install_directory>/etc/config.properties for standalone installations). For example,

Trino cluster configuration

Create the Immuta access control configuration file in the Trino configuration directory (/etc/trino/config.properties for Docker installations or <trino_install_directory>/etc/config.properties for standalone installations).
Modify one or both properties below to customize the behavior of read or write access policies for all users:
- immuta.allowed.immuta.datasource.operations: This property governs objects (catalogs, schemas, tables, etc.) that are registered as data sources in Immuta. These permissions apply to all querying users except for administrators defined in immuta.user.admin (who get all permissions).
  - READ: Grants SELECT on tables or views; grants SHOW on tables, views, or columns
  - WRITE: Grants INSERT, UPDATE, DELETE, MERGE, or TRUNCATE on tables; grants REFRESH on materialized views.
  - OWN: Grants ALTER and DROP on tables; grants SET on comments and properties
- immuta.allowed.non.immuta.datasource.operations: This property governs objects (catalogs, schemas, tables, etc.) that are not registered as data sources in Immuta. Use all or a combination of the following access values:
  - READ: Grants SELECT on tables or views; grants SHOW on tables, views, or columns
  - WRITE: Grants INSERT, UPDATE, DELETE, MERGE, or TRUNCATE on tables; grants REFRESH on materialized views.
  - OWN: Grants ALTER and DROP on tables; grants SET on comments and properties
  - CREATE: Grants CREATE on catalogs, schema, tables, and views. This is the only property that can allow CREATE permissions, since CREATE is enforced on new objects that do not exist in Starburst or Immuta yet (such as a new table being created with CREATE TABLE).
For example, the following configuration allows READ, WRITE, and OWN operations to be authorized on data sources registered in Immuta and all operations are permitted on data that is not registered in Immuta:
Enable the Immuta access control plugin in Trino's configuration file (/etc/trino/config.properties for Docker installations or <trino_install_directory>/etc/config.properties for standalone installations). For example,

Configure Starburst (Trino) Integration

The plugin comes pre-installed with Starburst Enterprise, so this page provides separate sets of guidelines for configuration:

Starburst Cluster Configuration

Requirements

Starburst does not support using Starburst built-in access control (BIAC) concurrently with any other access control providers such as Immuta. If Starburst BIAC is in use, it must be disabled to allow Immuta to enforce policies on cluster.

1 - Enable the Integration

Click the App Settings icon in the navigation menu.
Click the Integrations tab.
Click Add Integration and select Trino from the Integration Type dropdown menu.
Click Save.

OAuth Authentication

If you are using OAuth or asynchronous authentication to create Starburst data sources, work with your Immuta representative to configure the globalAdminUsername property.

2 - Configure the Immuta System Access Control Plugin in Starburst

Default configuration property values

If you use the default property values in the configuration file described in this section,

you will give users read and write access to tables that are not registered in Immuta and
results for SHOW queries will not be filtered on table metadata.

These default settings help ensure that a new Starburst integration installation is minimally disruptive for existing Starburst deployments, allowing you to then add Immuta data sources and update configuration to enforce more controls as you see fit.

However, the access-control.config-files property can be configured to allow Immuta to work with existing Starburst installations that have already configured an access control provider. For example, if the Starburst integration is configured to allow users write access to tables that are not protected by Immuta, you can still lock down write access for specific non-Immuta tables using an additional access control provider.

Create the Immuta access control configuration file in the Starburst configuration directory (/etc/starburst/immuta-access-control.properties for Docker installations or <starburst_install_directory>/etc/immuta-access-control.properties for standalone installations).
The table below describes the properties that can be set during configuration.
Property
Starburst version
Required or optional
Description
access-control.name
392 and newer
Required
This property enables the integration.
access-control.config-files
392 and newer
Optional
immuta.allowed.immuta.datasource.operations
413 and newer
Optional
immuta.allowed.non.immuta.datasource.operations
392 and newer
Optional
immuta.apikey
392 and newer
Required
immuta.ca-file
392 and newer
Optional
This property allows you to specify a path to your CA file.
immuta.cache.views.seconds
392 and newer
Optional
Amount of time in seconds for which a user's specific representation of an Immuta data source will be cached for. Changing this will impact how quickly policy changes are reflected for users actively querying Starburst. By default, cache expires after 30 seconds.
immuta.cache.datasource.seconds
392 and newer
Optional
Amount of time in seconds for which a user's available Immuta data sources will be cached for. Changing this will impact how quickly data sources will be available due to changing projects or subscriptions. By default, cache expires after 30 seconds.
immuta.endpoint
392 and newer
Required
The protocol and fully qualified domain name (FQDN) for the Immuta tenant used by Starburst (for example, https://my.immuta.tenant.io). This should be set to the endpoint displayed when enabling the integration on the app settings page.
immuta.filter.unallowed.table.metadata
392 and newer
Optional
When set to false, Immuta won't filter unallowed table metadata, which helps ensure Immuta remains noninvasive and performant. If this property is set to true, running show catalogs, for example, will reflect what that user has access to instead of returning all catalogs. By default, this property is set to false.
immuta.group.admin
420 and newer
Required if immuta.user.admin is not set
This property identifies the Starburst group that is the Immuta administrator. The users in this group will not have Immuta policies applied to them. Therefore, data sources should be created by users in this group so that they have access to everything. This property can be used in conjunction with the immuta.user.admin property, and regex filtering can be used (with a | delimiter at the end of each expression) to assign multiple groups as the Immuta administrator. Note that you must escape regex special characters (for example, john\\.doe+svcacct@immuta\\.com).
immuta.user.admin
392 and newer
Required if immuta.group.admin is not set
This property identifies the Starburst user who is an Immuta administrator (for example, immuta.user.admin=immuta_system_account). This user will not have Immuta policies applied to them because this account will run the subqueries. Therefore, data sources should be created by this user so that they have access to everything. This property can be used in conjunction with the immuta.group.admin property, and regex filtering can be used (with a | delimiter at the end of each expression) to assign multiple users as the Immuta administrator. Note that you must escape regex special characters (for example, john\\.doe+svcacct@immuta\\.com).
Enable the Immuta access control plugin in Starburst's configuration file (/etc/starburst/config.properties for Docker installations or <starburst_install_directory>/etc/config.properties for standalone installations). For example,
```
access-control.config-files=/etc/starburst/immuta-access-control.properties
```

Example Immuta System Access Control Configuration

# Enable the Immuta System Access Control (v2) implementation.
access-control.name=immuta

# The Immuta endpoint that was displayed when enabling the Starburst integration in Immuta.
immuta.endpoint=http://service.immuta.com:3000

# The Immuta API key that was displayed when enabling the Starburst integration in Immuta.
immuta.apikey=45jdljfkoe82b13eccfb9c

# The administrator user regex. Starburst usernames matching this regex will not be subject to
# Immuta policies. This regex should match the user name provided at Immuta data source
# registration.
immuta.user.admin=immuta_system_account

# Optional argument (default is shown).
# A CSV list of operations allowed on schemas/tables registered as Immuta data sources.
immuta.allowed.immuta.datasource.operations=READ

# Optional argument (default is shown).
# A CSV list of operations allowed on schemas/tables not registered as Immuta data sources.
# Set to empty to allow no operations on non-Immuta data sources.
immuta.allowed.non.immuta.datasource.operations=READ,WRITE

# Optional argument (default is shown).
# Controls table metadata filtering for inaccessible tables.
#   - When this property is enabled and non-Immuta reads are also enabled, a user performing
#     'show catalogs/schemas/tables' will not see metadata for a table that is registered as
#     an Immuta data source but the user does not have access to through Immuta.
#   - When this property is enabled and non-Immuta reads and writes are disabled, a user
#     performing 'show catalogs/schemas/tables' will only see metadata for tables that the
#     user has access to through Immuta.
#   - When this property is disabled, a user performing 'show catalogs/schemas/tables' can see
#     all metadata.
immuta.filter.unallowed.table.metadata=false

3 - Add Starburst Users to Immuta

- All Starburst users must map to Immuta users or match the immuta.user.admin regex configured on the cluster, and their Starburst username must be mapped to Immuta so they can query policy-enforced data.
- A user impersonating a different user in Starburst requires the IMPERSONATE_USER permission in Immuta. Both users must be mapped to an Immuta user, or the querying user must match the configured immuta.user.admin regex.

4 - Register data

Trino Cluster Configuration

1 - Enable the Integration

Click the App Settings icon in the navigation menu.
Click the Integrations tab.
Click Add Integration and select Trino from the dropdown menu.
Click Save.

OAuth Authentication

If you are using OAuth or asynchronous authentication to create Starburst data sources, work with your Immuta representative to configure the globalAdminUsername property.

2 - Configure the Immuta System Access Control Plugin in Trino

Default configuration property values

If you use the default property values in the configuration file described in this section,

you will give users read and write access to tables that are not registered in Immuta and
results for SHOW queries will not be filtered on table metadata.

These default settings help ensure that a new Starburst integration installation is minimally disruptive for existing Trino deployments, allowing you to then add Immuta data sources and update configuration to enforce more controls as you see fit.

However, the access-control.config-files property can be configured to allow Immuta to work with existing Trino installations that have already configured an access control provider. For example, if the Starburst (Trino) integration is configured to allow users write access to tables that are not protected by Immuta, you can still lock down write access for specific non-Immuta tables using an additional access control provider.

Download the assets for the release.
Enable Immuta on your cluster:
- Docker (Trino 413 and older)
  2. Create the Immuta access control configuration file in the Trino configuration directory: /etc/trino/immuta-access-control.properties.
- 1. Pull the image and start the container. The example below specifies the Immuta Trino plugin version 414 with the 414 tag, but any supported Trino version newer than 414 can be used:
    docker run registry.immuta.com/immuta/immuta-trino:414
  2. Create the Immuta access control configuration file in the Trino configuration directory: /etc/trino/immuta-access-control.properties.
- Standalone installations
  2. Create the Immuta access control configuration file in the Trino configuration directory: <trino_install_directory>/etc/immuta-access-control.properties.
Configure the properties described in the table below.
Property
Trino version
Required or optional
Description
access-control.name
392 and newer
Required
This property enables the integration.
access-control.config-files
392 and newer
Optional
Trino allows you to enable multiple system access control providers at the same time. To do so, add providers to this property as comma-separated values. This approach allows Immuta to work with existing Trino installations that have already configured an access control provider. Immuta does not manage all permissions in Trino and will default to allowing access to anything Immuta does not manage so that the Starburst (Trino) integration complements existing controls. For example, if the Starburst (Trino) integration is configured to allow users write access to tables that are not protected by Immuta, you can still lock down write access for specific non-Immuta tables using an additional access control provider.
immuta.allowed.immuta.datasource.operations
413 and newer
Optional
immuta.allowed.non.immuta.datasource.operations
392 and newer
Optional
immuta.apikey
392 and newer
Required
immuta.ca-file
392 and newer
Optional
This property allows you to specify a path to your CA file.
immuta.cache.views.seconds
392 and newer
Optional
Amount of time in seconds for which a user's specific representation of an Immuta data source will be cached for. Changing this will impact how quickly policy changes are reflected for users actively querying Trino. By default, cache expires after 30 seconds.
immuta.cache.datasource.seconds
392 and newer
Optional
Amount of time in seconds for which a user's available Immuta data sources will be cached for. Changing this will impact how quickly data sources will be available due to changing projects or subscriptions. By default, cache expires after 30 seconds.
immuta.endpoint
392 and newer
Required
The protocol and fully qualified domain name (FQDN) for the Immuta tenant used by Trino (for example, https://my.immuta.tenant.io). This should be set to the endpoint displayed when enabling the integration on the app settings page.
immuta.filter.unallowed.table.metadata
392 and newer
Optional
When set to false, Immuta won't filter unallowed table metadata, which helps ensure Immuta remains noninvasive and performant. If this property is set to true, running show catalogs, for example, will reflect what that user has access to instead of returning all catalogs. By default, this property is set to false.
immuta.group.admin
420 and newer
Required if immuta.user.admin is not set
This property identifies the Trino group that is the Immuta administrator. The users in this group will not have Immuta policies applied to them. Therefore, data sources should be created by users in this group so that they have access to everything. This property can be used in conjunction with the immuta.user.admin property, and regex filtering can be used (with a | delimiter at the end of each expression) to assign multiple groups as the Immuta administrator. Note that you must escape regex special characters (for example, john\\.doe+svcacct@immuta\\.com).
immuta.user.admin
392 and newer
Required if immuta.group.admin is not set
This property identifies the Trino user who is an Immuta administrator (for example, immuta.user.admin=immuta_system_account). This user will not have Immuta policies applied to them because this account will run the subqueries. Therefore, data sources should be created by this user so that they have access to everything. This property can be used in conjunction with the immuta.group.admin property, and regex filtering can be used (with a | delimiter at the end of each expression) to assign multiple users as the Immuta administrator. Note that you must escape regex special characters (for example, john\\.doe+svcacct@immuta\\.com).
Enable the Immuta access control plugin in Trino's configuration file (/etc/trino/config.properties for Docker installations or <trino_install_directory>/etc/config.properties for standalone installations). For example,
```
access-control.config-files=/etc/trino/immuta-access-control.properties
```

Example Immuta System Access Control Configuration

# Enable the Immuta System Access Control (v2) implementation.
access-control.name=immuta

# The Immuta endpoint that was displayed when enabling the Starburst integration in Immuta.
immuta.endpoint=http://service.immuta.com:3000

# The Immuta API key that was displayed when enabling the Starburst integration in Immuta.
immuta.apikey=45jdljfkoe82b13eccfb9c

# The administrator user regex. Starburst usernames matching this regex will not be subject to
# Immuta policies. This regex should match the user name provided at Immuta data source
# registration.
immuta.user.admin=immuta_system_account

# Optional argument (default is shown).
# A CSV list of operations allowed on schemas/tables registered as Immuta data sources.
immuta.allowed.immuta.datasource.operations=READ

# Optional argument (default is shown).
# A CSV list of operations allowed on schemas/tables not registered as Immuta data sources.
# Set to empty to allow no operations on non-Immuta data sources.
immuta.allowed.non.immuta.datasource.operations=READ,WRITE

# Optional argument (default is shown).
# Controls table metadata filtering for inaccessible tables.
#   - When this property is enabled and non-Immuta reads are also enabled, a user performing
#     'show catalogs/schemas/tables' will not see metadata for a table that is registered as
#     an Immuta data source but the user does not have access to through Immuta.
#   - When this property is enabled and non-Immuta reads and writes are disabled, a user
#     performing 'show catalogs/schemas/tables' will only see metadata for tables that the
#     user has access to through Immuta.
#   - When this property is disabled, a user performing 'show catalogs/schemas/tables' can see
#     all metadata.
immuta.filter.unallowed.table.metadata=false

3 - Add Trino Users to Immuta

- All Trino users must map to Immuta users or match the immuta.user.admin regex configured on the cluster, and their Trino username must be mapped to Immuta so they can query policy-enforced data.
- A user impersonating a different user in Trino requires the IMPERSONATE_USER permission in Immuta. Both users must be mapped to an Immuta user, or the querying user must match the configured immuta.user.admin regex.