arrow-left

Only this pageAll pages
gitbookPowered by GitBook
triangle-exclamation
Couldn't generate the PDF for 435 pages, generation stopped at 100.
Extend with 50 more pages.
1 of 100

2024.2

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Immuta Documentation - 2024.2

Your guide to discovering, securing, and monitoring your data with Immuta.

Cover

Cover

Cover

Cover

Discover your data
Secure your data
Detect your data
Snowflake
Databricks Unity Catalog
Starburst (Trino)
All integrations
Understand Discover
Understand Secure
Understand Detect
Open support ticketarrow-up-right
Trainingarrow-up-right
Release information
Developer guides

Release Notes

hashtag
Immuta Enterprise Helm chart

hashtag
2024.2.0

hashtag
Added

  • Initial release of the chart.

hashtag
2024.2.1

hashtag
Changed

  • Updated to 2024.2.1

Reference Guides

How-to Guides

Integration Settings

How-to Guides

Reference Guides

Configuration Settings

How-to Guides

Cluster Policies

What is Immuta?

Immuta helps you achieve the following outcomes in your data platform:

  • Simplify Operations: Immuta’s dynamic access control and policy management require 93x fewer data policies to manage access control in your data platform according to the GigaOm studyarrow-up-right. It is simple and scalable, which improves change management and lowers the total cost of ownership of cloud data management.

  • Improve data security: Immuta helps prove compliance with rules and regulations, even when securing hundreds of thousands of tables. An Immuta customer, Swedbankarrow-up-right, migrated all critical analytics workloads to the cloud in less than 12 months, including over 100 terabytes from more than 2,500 sources.

  • Unlock data’s value: Immuta helps organizations get access to more data 100x faster, which translates to improved productivity. An Immuta customer, enabled faster access to data, resulting in a 60x increase in data usage and greater productivity.

hashtag
How does Immuta do it?

Immuta provides three modules to create a full data security platform suite.

hashtag
Sensitive data discovery and classification:

Discover sensitive data from millions of fields without manual effort. With over 60 pre-built and domain-specific identifiers, you can tailor data classification to your unique business needs based on your desired confidence level.

hashtag
Continuous data security monitoring:

Leverage timely insights into data access and user activity with anomaly indicators for faster analysis and proactive actions.

hashtag
Data security and access control:

Immuta’s attribute-based access control (ABAC) delivers scalable data access without role explosion, and dynamic data masking ensures the right users can access the right data.

hashtag
How to get started

After , begin with . This section will guide you through Immuta configuration and leverage the capabilities of Immuta Discover to provide insights into where you have gaps in security and a complete understanding of your data ecosystem.

From there, you can move on to to mitigate (and constantly mitigate) those findings from Immuta Detect. This section includes three separate use cases, which are common across customers and includes recommendations for how to best solve those use cases. Consult the use cases to determine which path is best for you.

Install

The guides in this section illustrate how to install and deploy Immuta in your Kubernetes environment. If your distribution is not listed below (such as K3sarrow-up-right or RKE2arrow-up-right), follow the generic installation instructions.

  • Managed public cloud: This guide includes instructions for

    • Amazon Elastic Kubernetes Service (EKS)

    • Google Kubernetes Engine (GKE)

    • Microsoft Azure Kubernetes Service (AKS)

Upgrade Immuta

This guide demonstrates how to upgrade Immuta. The Immuta Enterprise Helm chart (IEHC) shares the same version with the Immuta product, so upgrading the Immuta version entails upgrading the IEHC. Failure to upgrade the underlying Helm chart will lead to an unsupported configuration.

circle-info

Kubernetes namespace

The following steps presume the IEHC was deployed into namespace immuta, and that the current namespace is immuta.

hashtag
Immuta Enterprise Helm chart

circle-exclamation

Helm chart deprecation notice

As of Immuta version 2024.2, the Immuta Helm chart (IHC) has been deprecated in favor of the IEHC. The immuta-values.yaml Helm values files are not cross-compatible.

Upgrade Immuta.

Upgrade Snowflake Low Row Access Policy Mode

hashtag
Prerequisites

This upgrade step is necessary if you meet both of the following criteria:

  • You have the Snowflake low row access policy mode enabled in private preview.

  • You have user impersonation enabled.

If you do not meet this criteria, follow the instructions on the .

hashtag
Upgrade to Snowflake low row access policy mode

To upgrade to the generally available version of the feature, on the app settings page and then re-enable it.

Configure

Introduced in 2024.2, the Immuta Enterprise Helm chart (IEHC) is an entirely new Helm chart used to deploy Immuta. This section guides you through configuring the IEHC to finish and prepare your installation for a production environment.

hashtag

Verify artifacts hosted on the ocir.immuta.com OCI registry.

Cosign Verification

This guide demonstrates how to verify signed artifacts (i.e., container images, Helm charts) hosted on ocir.immuta.com using from .

circle-info

Cosign installation

To verify a signed artifact or blob, install before proceeding.

Upgrade

Introduced in 2024.2, the Immuta Enterprise Helm chart (IEHC) is an entirely new Helm chart used to deploy Immuta. Unlike the previous Immuta Helm chart (IHC), the IEHC shares the same version as the Immuta product. Each version of the chart supports a singular version of Immuta. Upgrading the Immuta version now entails upgrading the underlying Helm chart. Failure to do so will lead to an unsupported configuration.

Chart name
Common name
Immuta versions
Registry
Description

Disaster Recovery

circle-info

Planning a disaster recovery strategy

As of 2024.2 LTS, there is no longer a backup/restore mechanism built into the Immuta Enterprise Helm chart. Customers are now solely responsible for creating and enacting an effective disaster recovery strategy.

All application state is stored in the PostgreSQL metadata database; therefore, recovering from a disaster event only entails restoring the aforementioned PostgreSQL database. Consult each cloud provider's point-in-time recovery (PITR) documentation for guidance:

Data and Integrations

Immuta integrates with your data platforms and external catalogs so you can register your data and effectively manage access controls on that data.

This section includes concept, reference, and how-to guides for configuring your data platform integration, registering data sources, and connecting your external catalog so that you can discover, monitor, and protect sensitive data.

hashtag
Integrations

Enable Snowflake Table Grants

  1. Navigate to the App Settings page.

  2. Scroll to the Global Integrations Settings section.

  3. Opt to change the Role Prefix. Snowflake table grants creates a new Snowflake role for each Immuta user. To ensure these Snowflake role names do not collide with existing Snowflake roles, each Snowflake role created for Snowflake table grants requires a common prefix. When using multiple Immuta accounts within a single Snowflake account, the Snowflake table grants role prefix should be unique for each Immuta account. The prefix must adhere to

Snowflake Data Sharing

Immuta is compatible with . Using both Immuta and Snowflake, organizations can share the policy-protected data of their Snowflake database with other Snowflake accounts with Immuta policies enforced in real time. This integration gives data consumers a live connection to the data and relieves data providers of the legal and technical burden of creating static data copies that leave their Snowflake environment.

Requirements:

  • Snowflake Enterprise Edition or higher

  • Immuta's

Install a Trusted Library

hashtag
1 - Install the Library

  1. In the Databricks Clusters UI, install your third-party library .jar or Maven artifact with Library Source Upload, DBFS, DBFS/S3

Scala

circle-info

Scala clusters: This configuration is for Scala-only clusters.

Where Scala language support is needed, this configuration can be used in the Custom .

According to Databricks’ cluster type support documentation, Scala clusters are intended for . However, nothing inherently prevents a Scala cluster from being configured for multiple users. Even with the Immuta SecurityManager enabled, there are limitations to user isolation within a Scala job.

For a secure configuration, it is recommended that clusters intended for Scala workloads are limited to Scala jobs only and are made homogeneous through the use of or externally via convention/cluster ACLs. (In homogeneous clusters, all users are at the same level of groups/authorizations; this is enforced externally, rather than directly by Immuta.)

Hide the Immuta Database in Databricks

circle-info

Hiding the database does not disable access to it

Queries can still be performed against tables in the immuta database using the Immuta-qualified table name (e.g., immuta.my_schema_my_table) regardless of whether or not this feature is enabled.

The immuta database on Immuta-enabled clusters allows Immuta to track Immuta-managed data sources separately from remote Databricks tables so that policies and other security features can be applied. However, Immuta supports raw tables in Databricks, so table-backed queries do not need to reference this database. When configuring a Databricks cluster, you can hide

Starburst (Trino)

In this integration, Immuta policies are translated into Starburst rules and permissions and applied directly to tables within users’ existing catalogs.

hashtag

This guide outlines how to integrate Starburst with Immuta.

External Metastores

hashtag
Local or remote mode

Immuta supports the use of external metastores in , following the same configuration detailed in the .

hashtag

Amazon RDS for PostgreSQLarrow-up-right

  • Azure Database for PostgreSQLarrow-up-right

  • Google Cloud SQL for PostgreSQLarrow-up-right

  • For more details about point-in-time recovery, see the PostgreSQL documentationarrow-up-right.

    Immuta integrations: This reference guide outlines the features, policies, and audit capabilities supported by each integration.
  • Snowflake: This section includes how-to and reference guides for Snowflake and how it integrates with Immuta.

  • Databricks Unity Catalog: This section includes how-to and reference guides for Databricks Unity Catalog and how it integrates with Immuta.

  • Databricks Spark: This section includes how-to and reference guides for Databricks Spark and how it integrates with Immuta.

  • Starburst (Trino): This section includes how-to and reference guides for Starburst (Trino) and how it integrates with Immuta.

  • Redshift: This section includes how-to and reference guides for Redshift and how it integrates with Immuta.

  • Azure Synapse Analytics: This section includes how-to and reference guides for Azure Synapse Analytics and how it integrates with Immuta.

  • Amazon S3: This page includes how-to and reference content for Amazon S3 and how it integrates with Immuta.

  • Google BigQuery: This page includes how-to and reference content for Google BigQuery and how it integrates with Immuta.

  • hashtag
    Registering data

    This section covers concepts related to registering your data with Immuta.

    hashtag
    Catalogs

    This section covers the various data catalogs Immuta integrates with.

    hashtag
    Tags

    This section covers concepts related to tags and how to use them in Immuta.

    Thomson Reutersarrow-up-right
    Immuta Discover
    Immuta Detect
    Immuta Secure
    installing Immuta
    Immuta Detect
    Immuta Secure
    Red Hat OpenShift
    Generic installation
    Immuta in an air-gapped environment
    configuration guide
    disable your Snowflake integration

    hashtag
    Configuration

    This method requires that the data consumer account is registered as an Immuta user with the Snowflake user name equal to the consuming account.

    At that point, the user that represents the account being shared with can have the appropriate attributes and groups assigned to them, relevant to the data policies that need to be enforced. Once that user has access to the share in the consuming account (not managed by Immuta), they can query the share with the data policies from the producer account enforced because Immuta is treating that account as if they are a single user in Immuta.

    For a tutorial on this workflow, see the Using Snowflake Data Sharing page.

    hashtag
    Benefits

    Using Immuta with Snowflake Data Sharing allows the sharer to

    • Only need limited knowledge of the context or goals of the existing policies in place: Because the sharer is not editing or creating policies to share their data, they only need a limited knowledge of how the policies work. Their main responsibility is making sure they properly represent the attributes of the data consumer (the account being shared to).

    • Leave policies untouched.

    Snowflake Secure Data Sharingarrow-up-right
    table grants feature

    For full details on Databricks’ best practices in configuring clusters, please read their governance documentationarrow-up-right.

    access modearrow-up-right
    single users onlyarrow-up-right
    project equalization
    hashtag
    How-to guides
    • Starburst (Trino) integration configuration guide: Configure the integration in Immuta.

    • Map read and write access policies to Starburst (Trino) privileges: Configure how read and write access subscription policies translate to Starburst (Trino) privileges and apply to Starburst (Trino) data sources.

    hashtag
    Reference guide

    Starburst (Trino) integration reference guide: This guide describes the design and components of the integration.

    Getting started
    Configure external Hive metastore

    Download the metastore jars and point to them as specified in Databricks documentationarrow-up-right. Metastore jars must end up on the cluster's local disk at this explicit path: /databricks/hive_metastore_jars.

    If using DBR 7.x with Hive 2.3.x, either

    • Set spark.sql.hive.metastore.version to 2.3.7 and spark.sql.hive.metastore.jars to builtin or

    • Download the metastore jars and set spark.sql.hive.metastore.jars to /databricks/hive_metastore_jars/* as before.

    hashtag
    Configure AWS Glue Data Catalog

    To use AWS Glue Data Catalog as the metastore for Databricks, see the Databricks documentationarrow-up-right.

    local or remote modearrow-up-right
    Databricks documentationarrow-up-right

    Databricks Security Configuration for Performance

    This page describes how the Security Manager is disabled for Databricks clusters that do not allow R or Scala code to be executed. Databricks Administrators should place the desired configuration in the Spark environment variables (recommended) or immuta_conf.xml (not recommended).

    hashtag
    Automatic Disabling of the Security Manager

    The Immuta Security Manager is an essential element of the Databricks deployment that ensures users can't perform unauthorized actions when using Scala and R, since those languages have features that allow users to circumvent policies without the Security Manager enabled. However, the Security Manager must inspect the call stack every time a permission check is triggered, which adds overhead to queries. To improve Immuta's query performance on Databricks, Immuta disables the Security Manager when Scala and R are not being used.

    The cluster init script checks the cluster’s configuration and automatically removes the Security Manager configuration when

    • spark.databricks.repl.allowedlanguages is a subset of {python, sql}

    • IMMUTA_SPARK_DATABRICKS_PY4J_STRICT_ENABLED is true

    When the cluster is configured this way, Immuta can rely on Databricks' process isolation and Py4J security to prevent user code from performing unauthorized actions.

    Note: Immuta still expects the spark.driver.extraJavaOptions and spark.executor.extraJavaOptions to be set and pointing at the Security Manager.

    Beyond disabling the Security Manager, Immuta will skip several startup tasks that are required to secure the cluster when Scala and R are configured, and fewer permission checks will occur on the Driver and Executors in the Databricks cluster, reducing overhead and improving performance.

    hashtag
    Caveats

    • There are still cases that require the Security Manager; in those instances, Immuta creates a fallback Security Manager to check the code path, so the IMMUTA_INIT_ALLOWED_CALLING_CLASSES_URI environment variable must always point to a valid calling class file.

    • Databricks’ dbutils.fs is blocked by their PY4J security; therefore, it can’t be used to access scratch paths.

    Py4j Security Error

    • Error Message: py4j.security.Py4JSecurityException: Constructor <> is not whitelisted

    • Explanation: This error indicates you are being blocked by Py4j security rather than the Immuta Security Manager. Py4j security is strict and generally ends up blocking many ML libraries.

    • Solution: Turn off Py4j security on the offending cluster by setting IMMUTA_SPARK_DATABRICKS_PY4J_STRICT_ENABLED=false in the environment variables section. Additionally, because there are limitations to the security mechanisms Immuta employs on-cluster when Py4j security is disabled, ensure that all users on the cluster have the same level of access to data, as users could theoretically see (policy-enforced) data that other users have queried.

    Reference Guides

    How-to Guides

    How-to Guides

    immuta-enterprise

    Immuta Enterprise Helm chart (IEHC)

    2024.2

    ocir.immuta.com

    Version shared with the Immuta product

    circle-exclamation

    Helm chart deprecation notice

    As of Immuta version 2024.2, the IHC has been deprecated in favor of the IEHC. The immuta-values.yaml Helm values files are not cross-compatible.

    This section provides upgrade guides for two scenarios:

    • Upgrade to Immuta v2024.2 LTS: Follow this guide if you are upgrading from Immuta 2024.1.x or older. Since these older versions of Immuta use the legacy IHC, this page instructs you to upgrade from the IHC to the IEHC.

    • Upgrade Immuta: Follow this guide if you are upgrading from the 2024.2 LTS to a newer version. This page instructs you to upgrade the IEHC.

    immuta

    Immuta Helm chart (IHC)

    <2024.2

    ocir.immuta.com

    Version independent of the Immuta product

    and be less than 50 characters. Once the configuration is saved, the prefix cannot be modified; however, the Snowflake table grants feature can be disabled and re-enabled to change the prefix.
  • Finish configuring your integration by following one of these guidelines:

    • New Snowflake integration: Set up a new Snowflake integration by following the configuration guide.

    • Existing Snowflake integration (automatic setup): You will be prompted to enter connection information for a Snowflake user. Immuta will execute the migration to Snowflake table grants using a connection established with this Snowflake user. The Snowflake user you provide here must have Snowflake privileges to run these privilege grants.

    • Existing Snowflake integration (manual setup): Immuta will display a link to a migration script you must run in Snowflake and a link to a rollback script for use in the event of a failed migration. Important: Execute the migration script in Snowflake before clicking Save on the app settings page.

  • circle-info

    Snowflake table grants private preview migration

    To migrate from the private preview version of Snowflake table grants (available before September 2022) to the generally available version of Snowflake table grants, follow the steps in the migration guide.

    Snowflake identifier requirementsarrow-up-right
    helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.20
    hashtag
    Verify
    1. Create a file named immuta-cosign.pub with the following content:

    2. Verify artifact signature.

    hashtag
    Frequently asked question

    hashtag
    How can I list all container images referenced in the IEHC?

    circle-info

    Yq installation

    The following step presumes command-line tool yqarrow-up-right is installed.

    List all container images by rendering the chart templates locally.

    Cosignarrow-up-right
    Sigstorearrow-up-right
    Cosignarrow-up-right
    immuta
    from any calls to
    SHOW DATABASES
    so that users are not confused or misled by that database.

    hashtag
    Hide the immuta Database

    When configuring a Databricks cluster, hide immuta by using the following environment variable in the Spark cluster configuration:

    Then, Immuta will not show this database when a SHOW DATABASES query is performed.

    IMMUTA_SPARK_SHOW_IMMUTA_DATABASE=false
    -----BEGIN PUBLIC KEY-----
    MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEIGUDdu5dgqxQTlbNt0bCIl+zCN65
    JC/PmmaC08Eb/UbpkSDmcn/t9Jh+w6Chwkkcp1olcOS1BqCaWrbtViu6Xg==
    -----END PUBLIC KEY-----
    cosign verify \
        --key ./immuta-cosign.pub \
        ocir.immuta.com/stable/<artifact-name>:2024.2.20
    helm template <release-name> oci://ocir.immuta.com/stable/immuta-enterprise \
        --values immuta-values.yaml \
        --version 2024.2.20 \
    | yq '..|.image? | select(.)' | sort -u
    hashtag
    Ingress configuration

    Configure Ingressarrow-up-right to complete your installation and access your Immuta application.

    hashtag
    TLS configuration

    Configure TLS termination for an Ingress resource.

    hashtag
    Immuta in production

    Follow these best practices when deploying Immuta in your production environment.

    hashtag
    External cache configuration

    Configure an external key-value cache (such as Redis or Memcached) with the Immuta Enterprise Helm chart.

    hashtag
    Rotating credentials

    Update the credentials referenced in the Immuta Enterprise Helm chart.

    hashtag
    Enabling legacy query engine and fingerprint

    Enable these legacy services for your deployment if they are required for your business use case:

    • If you are using any of the data platforms below, you must enable the query engine:

      • Amazon Redshift

      • Azure Synapse Analytics

      • Google BigQuery

    • If you are using the legacy sensitive data discovery (SDD) feature, you must enable the query engine and fingerprint services.

    Cosign verification
    , or
    Maven
    . Alternatively, use the Databricks libraries API.
  • In the Databricks Clusters UI, add the IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URIS property as a Spark environment variable and set it to your artifact's URI:

  • For Maven artifacts, the URI is maven:/<maven_coordinates>, where <maven_coordinates> is the Coordinates field found when clicking on the installed artifact on the Libraries tab in the Databricks Clusters UI. Here's an example of an installed artifact:

    In this example, you would add the following Spark environment variable:

    For jar artifacts, the URI is the Source field found when clicking on the installed artifact on the Libraries tab in the Databricks Clusters UI. For artifacts installed from DBFS or S3, this ends up being the original URI to your artifact. For uploaded artifacts, Databricks will rename your .jar and put it in a directory in DBFS. Here's an example of an installed artifact:

    In this example, you would add the following Spark environment variable:

    Once you've finished making your changes, restart the cluster.

    circle-info

    Specifying more than one trusted library

    To specify more than one trusted library, comma delimit the URIs:

    hashtag
    2 - Execute a Command in a Notebook

    Once the cluster is up, execute a command in a notebook. If the trusted library installation is successful, you should see driver log messages like this:

    Snowflake

    Immuta manages access to Snowflake tables by administering Snowflake row access policiesarrow-up-right and column masking policiesarrow-up-right on those tables, allowing users to query tables directly in Snowflake while dynamic policies are enforced.

    hashtag
    Getting started

    This getting started guide outlines how to integrate your Snowflake account with Immuta.

    hashtag
    How-to guides

    • : Configure the Snowflake integration.

    • : Migrate to using Snowflake table grants in your Snowflake integration.

    • : Manage integration settings or delete your existing Snowflake integration.

    hashtag
    Reference guides

    • : A phased onboarding approach to configuring the Snowflake integration ensures that your users will not be immediately affected by changes as you add data sources and policies. This guide describes the settings and requirements for implementing this phased approach.

    • : This reference guide describes the design and features of the Snowflake integration.

    • : Organizations can share the policy-protected data of their Snowflake database with other Snowflake accounts with Immuta policies enforced in real time. This guide describes the components of using Immuta with Snowflake data shares.

    Use Snowflake Data Sharing with Immuta

    Immuta is compatible with Snowflake Secure Data Sharingarrow-up-right. Using both Immuta and Snowflake, organizations can share the policy-protected data of their Snowflake database with other Snowflake accounts with Immuta policies enforced in real time.

    Prerequisites:

    • Snowflake integration enabled

    • Snowflake tables registered in Immuta as data sources

    hashtag
    Create Immuta Policies to Protect the Data

    Required Permission: Immuta: GOVERNANCE

    to fit your organization's compliance requirements.

    It's important to understand that subscription policies are not relevant to Snowflake data shares, because the act of sharing the data is the subscription policy. Data policies can be enforced on the consuming account from the producer account on a share following these instructions.

    hashtag
    Register the Snowflake Data Consumer with Immuta

    Required Permission: Immuta: USER_ADMIN

    To register the Snowflake data consumer in Immuta,

    1. .

    2. to match the account ID for the data consumer. This value is the output on the data consumer side when SELECT CURRENT_ACCOUNT() is run in Snowflake.

    3. for your organization's policies.

    hashtag
    Create the Snowflake Data Share

    Required Permission: Snowflake ACCOUNTADMIN

    To share the policy-protected data source,

    1. of the Snowflake table that has been registered in Immuta.

    2. Grant reference usage on the Immuta database to the share you created:

      Replace the content in angle brackets above with the name of your Immuta database and Snowflake data share.

    Snowflake Low Row Access Policy Mode

    The Snowflake low row access policy mode improves query performance in Immuta's Snowflake integration by decreasing the number of Snowflake row access policiesarrow-up-right Immuta creates and by using table grants to manage user access.

    Immuta manages access to Snowflake tables by administering Snowflake row access policiesarrow-up-right and column masking policiesarrow-up-right on those tables, allowing users to query them directly in Snowflake while policies are enforced.

    Without Snowflake low row access policy mode enabled, row access policies are created and administered by Immuta in the following scenarios:

    • Table grants are disabled and a subscription policy that does not automatically subscribe everyone to the data source is applied. Immuta administers Snowflake row access policies to filter out all the rows to restrict access to the entire table when the user doesn't have privileges to query it. However, if table grants are disabled and a subscription policy is applied that grants everyone access to the data source automatically, Immuta does not create a row access policy in Snowflake. See the for details about these policy types.

    • is applied to a data source. A row access policy filters out all the rows of the table if users aren't acting under the purpose specified in the policy when they query the table.

    • is applied to a data source. A row access policy filters out rows querying users don't have access to.

    • is enabled. A row access policy is created for every Snowflake table registered in Immuta.

    circle-info

    Deprecation notice

    Support for using the Snowflake integration with low row access policy mode disabled has been deprecated. You must enable this feature and table grants for your integration to continue working in future releases. See the for EOL dates.

    hashtag
    Reducing row access policies

    Snowflake low row access policy mode is enabled by default to reduce the number of row access policies Immuta creates and improve query performance. Snowflake low row access policy mode requires

    • .

    • user impersonation to be disabled. User impersonation diminishes the performance of interactive queries because of the number of row access policies Immuta creates when it's enabled.

    hashtag
    Requirements

    hashtag
    Project-scoped purpose exceptions for Snowflake with low row access policy mode enabled

    circle-info

    Public preview: This feature is available to select accounts. Reach out to your Immuta representative to enable this feature.

    Project-scoped purpose exceptions for Snowflake integrations allow you to apply to Snowflake data sources in a project. As a result, users can only access that data when they are working within that specific project.

    hashtag
    Limitations and considerations

    • Project workspaces are not compatible with this feature.

    • Impersonation is not supported when the Snowflake low row access policy mode is enabled.

    • When a project member acts under a project's purposes, any matching purpose exceptions on tables will be honored, even if those tables exist outside the project. Project managers cannot assume approving a purpose means that the purposes of that project are limited to the tables in the project. Enable the to remove this limitation.

    Conventions

    The following conventions are used throughout the installation material.

    hashtag
    Angle brackets ( < and > )

    Phrases wrapped in angle brackets (i.e., <, >) are placeholders used to indicate values that must be substituted with user-provided values. Placeholders are typically written in either kebab casearrow-up-right, or snake casearrow-up-right; the following placeholders are equivalent:

    • <the-quick-brown-fox>

    • <the_quick_brown_fox>

    hashtag
    Example

    hashtag
    Input

    hashtag
    Output

    Manually Update Your Databricks Cluster

    If a Databricks cluster needs to be manually updated to reflect changes in the Immuta init script or cluster policies, you can remove and set up your integration again to get the updated policies and init script.

    1. Log in to Immuta as an Application Admin.

    2. Click the App Settings icon in the left sidebar and click the Integrations tab.

    3. Your existing Databricks integration should be listed here; expand it and note the configuration values. Now select Remove to remove your integration.

    4. Click Add Integration and select Databricks Integration to add a new integration.

    5. Enter your Databricks integration settings again as configured previously.

    6. Click Add Integration to add the integration, and then select Configure Cluster Policies to set up the updated cluster policies and init script.

    7. Select the cluster policies you wish to use for your Immuta-enabled Databricks clusters.

    8. Use the options below to view instructions for automatically pushing cluster policies and the init script (recommended) or manually updating your cluster policies.

      • Automatically push cluster policies

        1. Select Automatically Push Cluster Policies and enter your privileged Databricks access token. This token must have privileges to write to cluster policies.

    9. Restart any Databricks clusters using these updated policies for the changes to take effect.

    Migrate to Unity Catalog

    When you enable Unity Catalog, Immuta automatically migrates your existing Databricks data sources in Immuta to reference the legacy hive_metastore catalog to account for Unity Catalog's three-level hierarchy. New data sources will reference the Unity Catalog metastore you create and attach to your Databricks workspace.

    Because the hive_metastore catalog is not managed by Unity Catalog, existing data sources in the hive_metastore cannot have Unity Catalog access controls applied to them. Data sources in the Hive Metastore must be managed by the Databricks Spark integration.

    To allow Immuta to administer Unity Catalog access controls on that data, move the data to Unity Catalog and re-register those tables in Immuta by completing the steps below. If you don't move all data before configuring the integration, metastore magic will protect your existing data sources throughout the migration process.

    1. Ensure that all Databricks clusters that have Immuta installed are stopped and the Immuta configuration is removed from the cluster. Immuta-specific cluster configuration is no longer needed with the Databricks Unity Catalog integration.

    2. Move all data into Unity Catalog before configuring Immuta with Unity Catalog. Existing data sources will need to be re-created after they are moved to Unity Catalog and the Unity Catalog integration is configured.

    3. .

    Private Container Registries

    This guide demonstrates how to configure a private container registry with the Immuta Enterprise Helm chart (IEHC).

    circle-info

    Image availability

    This guide assumes that you have already copied all Immuta container images to your private registry. The process of copying images to a private registry can vary significantly depending on your specific environment and tools and is therefore outside the scope of this document.

    hashtag
    Helm values

    circle-info

    Image repository overrides

    Each image.repository field defined in the default Helm values must be overridden. For the purposes of this guide, only the configuration for Secure is shown.

    1. Examine the default Helm values in the chart; this will include all relevant values required to override the registry and images.

    2. Edit the immuta-values.yaml to include the following Helm values. Update all with your own values.

    Redshift

    In this integration, Immuta generates policy-enforced views in your configured Redshift schema for tables registered as Immuta data sources.

    hashtag
    Getting started

    This guide outlines how to integrate Redshift with Immuta.

    hashtag
    How-to guides

    • : Configure the integration in Immuta.

    • : Configure Redshift Spectrum in Immuta.

    hashtag
    Reference guide

    : This guide describes the design and components of the integration.

    Databricks Change Data Feed

    CDF shows the row-level changes between versions of a Delta table. The changes displayed include row data and metadata that indicates whether the row was inserted, deleted, or updated.

    Immuta does not support applying policies to the changed data, and the CDF cannot be read for data source tables if the user does not have access to the raw data in Databricks. However, the CDF can be read if the querying user is allowed to read the raw data and one of the following statements is true:

    • the table is in the current workspace,

    • the table is in a scratch path,

    • non-Immuta reads are enabled AND the table does not intersect with a workspace under which the current user is not acting, or

    • non-Immuta reads are enabled AND the table is not part of an Immuta data source.

    hashtag
    Configure Change Data Feed

    There are no configuration changes necessary to use this feature.

    hashtag
    Limitation

    Immuta does not support reading changes in .

    Python & SQL & R with Library Support

    circle-info

    Py4j security disabled: In addition to support for Python, SQL, and R, this configuration adds support for additional Python libraries and utilities by disabling Databricks-native Py4j security.

    This configuration does not rely on Databricks-native Py4j security to secure the cluster, while process isolation is still enabled to secure filesystem and network access from within Python processes. On an Immuta-enabled cluster, once Py4J security is disabled the Immuta SecurityManager is installed to prevent nefarious actions from Python in the JVM. Disabling Py4J security also allows for expanded Python library support, including many Python ML classes (such as LogisticRegression, StringIndexer, and DecisionTreeClassifier) and dbutils.fs.

    By default, all actions in R will execute as the root user. Among other things, this permits access to the entire filesystem (including sensitive configuration data). And without iptable restrictions, a user may freely access the cluster’s cloud storage credentials. To properly support the use of the R language, Immuta’s initialization script wraps the R and Rscript binaries to launch each command as a temporary, non-privileged user. This user has limited filesystem and network access. The Immuta SecurityManager is also installed to prevent users from bypassing policies and protects against the above vulnerabilities from within the JVM.

    The SecurityManager will incur a small increase in performance overhead; average latency will vary depending on whether the cluster is homogeneous or heterogeneous. (In homogeneous clusters, all users are at the same level of groups/authorizations; this is enforced externally, rather than directly by Immuta.)

    When users install third-party Java/Scala libraries, they will be denied access to sensitive resources by default. However, cluster administrators can specify which of the installed Databricks libraries should be by Immuta.

    A homogeneous cluster is recommended for configurations where Py4J security is disabled. If all users have the same level of authorization, there would not be any data leakage, even if a nefarious action was taken.

    For full details on Databricks’ best practices in configuring clusters, please read their .

    Databricks Unity Catalog

    This integration allows you to manage and access data in your Databricks account across all of your workspaces. With Immuta’s Databricks Unity Catalog integration, you can write your policies in Immuta and have them enforced automatically by Databricks across data in your Unity Catalog metastore.

    hashtag
    Getting started

    This getting started guide outlines how to integrate Databricks Unity Catalog with Immuta.

    hashtag
    How-to guides

    • : Configure the Databricks Unity Catalog integration.

    • : Migrate from the legacy Databricks integrations to the Databricks Unity Catalog integration.

    hashtag
    Reference guide

    : This guide describes the design and components of the integration.

    Enable Snowflake Low Row Access Policy Mode

    circle-exclamation

    If you have Snowflake low row access policy mode enabled in private preview and have impersonation enabled, see these upgrade instructions. Otherwise, query performance will be negatively affected.

    1. Click the App Settings icon in the sidebar and scroll to the Global Integration Settings section.

    2. Click the Enable Snowflake Low Row Access Policy Mode checkbox to enable the feature.

    3. Confirm to allow Immuta to automatically disable impersonation for the Snowflake integration. If you do not confirm, you will not be able to enable Snowflake low row access policy mode.

    4. Click Save.

    hashtag
    Configure your Snowflake integration

    If you already have a configured, you don't need to reconfigure your integration. Your Snowflake policies automatically refresh when you enable Snowflake low row access policy mode.

    1. . Note that you will not be able to enable project workspaces or user impersonation with Snowflake low row access policy mode enabled.

    2. Click Save.

    Configure Project UDFs Cache Settings

    This page outlines the configuration for setting up project UDFs, which allow users to set their current project in Immuta through Spark. For details about the specific functions available and how to use them, see the Use Project UDFs (Databricks) page.

    circle-info

    Use project UDFs in Databricks

    Currently, caches are not all invalidated outside of Databricks because Immuta caches information pertaining to a user's current project in the NameNode plugin and in Vulcan. Consequently, this feature should only be used in Databricks.

    hashtag
    Web Service and On-Cluster Caches

    Immuta caches a mapping of user accounts and users' current projects in the Immuta Web Service and on-cluster. When users change their project with UDFs instead of the Immuta UI, Immuta invalidates all the caches on-cluster (so that everything changes immediately) and the cluster submits a request to change the project context to a web worker. Immediately after that request, another call is made to a web worker to refresh the current project.

    To allow use of project UDFs in Spark jobs, raise the caching on-cluster and lower the cache timeouts for the Immuta Web Service. Otherwise, caching could cause dissonance among the requests and calls to multiple web workers when users try to change their project contexts.

    hashtag
    Recommended Configuration

    hashtag
    1 - Lower Web Service Cache Timeout

    1. Click the App Settings icon in the left sidebar and scroll to the HDFS Cache Settings section.

    2. Lower the Cache TTL of HDFS user names (ms) to 0.

    3. Click Save.

    hashtag
    2 - Raise Cache Timeout On-Cluster

    In the Spark environment variables section, set the IMMUTA_CURRENT_PROJECT_CACHE_TIMEOUT_SECONDS and IMMUTA_PROJECT_CACHE_TIMEOUT_SECONDS to high values (like 10000).

    Note: These caches will be invalidated on cluster when a user calls immuta.set_current_project, so they can effectively be cached permanently on cluster to avoid periodically reaching out to the web service.

    hashtag
    Blocking UDFs

    If your compliance requirements restrict users from changing projects within a session, you can block the use of Immuta's project UDFs on a Databricks Spark cluster. To do so, configure the immuta.spark.databricks.disabled.udfs option as described on the .

    Snowflake Table Grants Migration

    To migrate from the private preview version of table grants (available before September 2022) to the GA version, complete the steps below.

    1. Navigate to the App Settings page.

    2. Click Integration Settings in the left panel, and scroll to the Global Integrations Settings section.

    3. Uncheck the Snowflake Table Grants checkbox to disable the feature.

    4. Click Save. Wait for about 1 minute per 1000 users. This gives time for Immuta to drop all the previously created user roles.

    5. Use the to re-enable the feature.

    Python & SQL

    circle-info

    Performance: This is the most performant policy configuration.

    In this configuration, Immuta is able to rely on Databricks-native security controls, reducing overhead. The key security control here is the enablement of process isolation. This prevents users from obtaining unintentional access to the queries of other users. In other words, masked and filtered data is consistently made accessible to users in accordance with their assigned attributes. This Immuta cluster configuration relies on Py4J security being enabled.

    Many Python ML classes (such as LogisticRegression, StringIndexer, and DecisionTreeClassifier) and dbutils.fs are unfortunately not supported with Py4J security enabled. Users will also be to use the Databricks Connect client library. Additionally, only Python and SQL are available as supported languages.

    For full details on Databricks’ best practices in configuring clusters, please read their .

    Python & SQL & R

    circle-info

    Additional overhead: In relation to the Python & SQL cluster policy, this configuration trades some additional overhead for added support of the R language.

    In this configuration, you are able to rely on the Databricks-native security controls. The key security control here is the enablement of process isolation. This prevents users from obtaining unintentional access to the queries of other users. In other words, masked and filtered data is consistently made accessible to users in accordance with their assigned attributes.

    Like the Python & SQL configuration, Py4j security is enabled for the Python & SQL & R configuration. However, because R has been added Immuta enables the SecurityManager, in addition to Py4j security, to provide more security guarantees. For example, by default all actions in R execute as the root user; among other things, this permits access to the entire filesystem (including sensitive configuration data), and, without iptable restrictions, a user may freely access the cluster’s cloud storage credentials. To address these security issues, Immuta’s initialization script wraps the R and Rscript binaries to launch each command as a temporary, non-privileged user with limited filesystem and network access and installs the Immuta SecurityManager, which prevents users from bypassing policies and protects against the above vulnerabilities from within the JVM.

    Consequently, the cost of introducing R is that the SecurityManager incurs a small increase in performance overhead; however, average latency will vary depending on whether the cluster is homogeneous or heterogeneous. (In homogeneous clusters, all users are at the same level of groups/authorizations; this is enforced externally, rather than directly by Immuta.)

    Many Python ML classes (such as LogisticRegression, StringIndexer, and DecisionTreeClassifier) and dbutils.fs are unfortunately not supported with Py4J security enabled. Users will also be to use the Databricks Connect client library.

    When users install third-party Java/Scala libraries, they will be denied access to sensitive resources by default. However, cluster administrators can specify which of the installed Databricks libraries should be by Immuta.

    For full details on Databricks’ best practices in configuring clusters, please read their .

    External Cache Configuration

    This guide demonstrates how to configure an external key-value cache (such as Redis or Memcached) with the Immuta Enterprise Helm chart (IEHC).

    circle-info

    Kubernetes namespace

    The following section(s) presume the IEHC was deployed into namespace immuta and that the current namespace is immuta.

    Troubleshooting

    hashtag
    Frequently asked questions

    hashtag
    How can I ensure the fully qualified domain name (FQDN) is resolvable from within the Kubernetes cluster?

    Snowflake Table Grants

    Snowflake table grants simplifies the management of privileges in Snowflake when using Immuta. Instead of having to manually grant users access to tables registered in Immuta, you allow Immuta to manage privileges on your Snowflake tables and views according to subscription policies. Then, users subscribed to a data source in Immuta can view and query the Snowflake table, while users who are not subscribed to the data source cannot view or query the Snowflake table.

    hashtag
    Snowflake privileges

    Enabling Snowflake table grants gives the following privileges to the Immuta Snowflake role:

    DBFS Access

    This page outlines how to access DBFS in Databricks for non-sensitive data. Databricks Administrators should place the desired configuration in the Spark environment variables (recommended) or the immuta_conf.xml file (not recommended).

    hashtag
    DBFS FUSE Mount

    circle-info

    Scala Cluster Security Details

    It is most secure to leverage an equalized project when working in a Scala cluster; however, it is not required to limit Scala to equalized projects. This document outlines security recommendations for Scala clusters and discusses the security risks involved when equalized projects are not used.

    circle-info

    Language support: R and Scala are both supported, but require advanced configuration; work with your Immuta support professional to use these languages.

    Phased Snowflake Onboarding Concept Guide

    While you're onboarding Snowflake data sources and designing policies, you don't want to disrupt your Snowflake users' existing workflows. Instead, you want to gradually onboard Immuta through a series of successive changes that will not impact your existing Snowflake users.

    A phased onboarding approach to configuring the Snowflake integration ensures that your users will not be immediately affected by changes as you add data sources and configure policies.

    Several features allow you to gradually onboard data sources and policies in Immuta:

    • : By default, no policy is applied at registration time; instead of applying a restrictive policy immediately upon registration, the table is registered in Immuta and waits for a policy to be applied, if ever.

    Spark Direct File Reads

    In addition to supporting direct file reads through workspace and scratch paths, Immuta allows direct file reads in Spark for file paths. As a result, users who prefer to interact with their data using file paths or who have existing workflows revolving around file paths can continue to use these workflows without rewriting those queries for Immuta.

    When reading from a path in Spark, the Immuta Databricks plugin queries the Immuta Web Service to find Databricks data sources for the current user that are backed by data from the specified path. If found, the query plan maps to the Immuta data source and follows existing code paths for policy enforcement.

    hashtag
    Read Data

    Users can read data from individual parquet files in a sub-directory and partitioned data from a sub-directory (or by using a

    Getting Started

    The how-to guides linked on this page illustrate how to integrate Redshift with Immuta.

    Requirement: Redshift cluster with an RA3 node is required for the multi-database integration. For other instance types, you may configure a single-database integration using one of the .

    hashtag
    Configure your Redshift integration

    Configuring is required for Secure. These guides provide information on the recommended feature to enable with Redshift.

    Warehouse Sizing Recommendations

    The warehouse you select when configuring the Snowflake integration uses compute resources to set up the integration, register data sources, orchestrate policies, and run jobs like sensitive data discovery. Snowflake credit charges are based on the size of and amount of time the warehouse is active, not the number of queries run.

    This document prescribes how and when to adjust the size and scale of clusters for your warehouse to manage workloads so that you can use Snowflake compute resources the most cost effectively.

    In general, increase the size of and number of clusters for the warehouse to handle heavy workloads and multiple queries. Workloads are typically lighter after data sources are onboarded and policies are established in Immuta, so compute resources can be reduced after those workloads complete.

    hashtag

    Configure Snowflake Lineage Tag Propagation

    circle-info

    Private preview: This feature is only available to select accounts. Reach out to your Immuta representative to enable this feature.

    hashtag
    Prerequisite

    Contact your Immuta representative to enable this feature in your Immuta tenant.

    IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URIS=maven:/com.github.immuta.hadoop.immuta-spark-third-party-maven-lib-test:2020-11-17-144644
    IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URIS=maven:/my.group.id:my-package-id:1.2.3,dbfs:/path/to/my/library.jar
    TrustedLibraryUtils: Successfully found all configured Immuta configured trusted libraries in Databricks.
    TrustedLibraryUtils: Wrote trusted libs file to [/databricks/immuta/immutaTrustedLibs.json]: true.
    TrustedLibraryUtils: Added trusted libs file with 1 entries to spark context.
    TrustedLibraryUtils: Trusted library installation complete.

    Select Apply Policies to push the cluster policies and init script again.

  • Click Save and Confirm to deploy your changes.

  • Manually update cluster policies

    1. Download the init script and the new cluster policies to your local computer.

    2. Click Save and Confirm to save your changes in Immuta.

    3. Log in to your Databricks workspace with your administrator account to set up cluster policies.

    4. Get the path you will upload the init script (`immuta_cluster_init_script_proxy.sh`) to by opening one of the cluster policy `.json` files and looking for the `defaultValue` of the field `init_scripts.0.dbfs.destination`. This should be a DBFS path in the form of `dbfs:/immuta-plugin/hostname/immuta_cluster_init_script_proxy.sh`.

    5. Click Data in the left pane to upload your init script to DBFS to the path you found above.

    6. To find your existing cluster policies you need to update, click Compute in the left pane and select the Cluster policies tab.

    7. Edit each of these cluster policies that were configured before and overwrite the contents of the JSON with the new cluster policy JSON you downloaded.

  • Any legacy database

    Project owners cannot limit masked joins to a single project. Turning masked joins on in a single project in Immuta enables masked joins across all of a subscriber's data sources, regardless of which projects the data sources belong to.

    subscription policies page
    Purpose-based policy
    Row-level security policy
    User impersonation
    release notes
    table grants to be enabled
    Snowflake integration enabled
    Snowflake table grants enabled
    purpose-based policies
    project-scoped purpose exceptions feature
    Enable Unity Catalog
    Redshift integration configuration
    Redshift Spectrum configuration
    Redshift integration reference guide
    streaming queriesarrow-up-right
    trusted
    governance documentationarrow-up-right
    Databricks Unity Catalog configuration
    Migrate to Databricks Unity Catalog
    Databricks Unity Catalog integration reference guide
    Snowflake integration
    Configure your Snowflake integration
    Databricks environment variables page
    Enable Snowflake table grants tutorial
    unablearrow-up-right
    governance documentationarrow-up-right
    unablearrow-up-right
    trusted
    governance documentationarrow-up-right
    There are several benefits to this design:
    • All existing roles maintain access to the data and registration of the table or view with Immuta has zero impact on your data platform.

    • It gives you time to configure tags on the Immuta registered tables and views, either manually or through automatic means, such as Immuta’s sensitive data detection (SDD), or an external catalog integration to include Snowflake tags.

    • It gives you time to assess and validate the sensitive data tags that were applied.

    • You can build only row and column controls with Immuta and let your existing roles manage table access instead of using Immuta subscription policies for table access.

  • Snowflake table grants coupled with Snowflake low row access policy mode: With these features enabled, Immuta manages access to tables (subscription policies) through GRANTs. This works by assigning each user their own unique role created by Immuta and all table access is managed using that single role.

    Without these two features enabled, Immuta uses a Snowflake row access policy (RAP) to manage table access. A RAP only allows users to access rows in the table if they were explicitly granted access through an Immuta subscription policy; otherwise, the user sees no rows. This behavior means all existing Snowflake roles lose access to the table contents until explicitly granted access through Immuta subscription policies. Essentially, roles outside of Immuta don't control access anymore.

    By using table grants and the low row access policy mode, users and roles outside Immuta continue to work.

    There are two benefits to this approach:

    • All pre-existing Snowflake roles retain access to the data until you explicitly revoke access (outside Immuta).

    • It provides a way to test that Immuta GRANTs are working without impacting production workloads.

  • hashtag
    Requirements

    The following configuration is required for phased Snowflake onboarding:

    • Impersonation is disabled

    • Project workspaces are disabled

    If either of these capabilities is necessary for your use case, you cannot do phased Snowflake onboarding as described below.

    hashtag
    Next

    See the Getting started page for step-by-step guidance to implement phased Snowflake onboarding.

    Subscription policy of “None” by default

    Subscribe the Immuta user to the data sources.

    Build Immuta data policies
    Create a new Immuta user
    Update the Immuta user's Snowflake username
    Give the Immuta user the appropriate attributes and groups
    Create a Snowflake Data Sharearrow-up-right
    helm show values oci://ocir.immuta.com/stable/immuta-enterprise --version 2024.2.16
    placeholder values
    hashtag
    Prerequisite

    The Immuta in production guide must be completed before proceeding.

    hashtag
    Redis

    1. Edit secret immuta-secret that was created in the Immuta in production guide.

    2. Add key-value IMMUTA_SERVER_CACHE_PROVIDER_OPTIONS_PASSWORD: <cache-password>.

    hashtag
    Edit Helm values

    Edit the immuta-values.yaml file to include the relevant Helm values listed below. Update all placeholder values with your own values.

    hashtag
    Redis

    circle-info

    TLS configuration

    TLS must be configured both client-side and server-side. The following Helm values demonstrate connecting to Redis with TLS enabled.

    hashtag
    Memcached

    hashtag
    Apply Helm values

    Perform a Helm upgradearrow-up-right to apply the changes made to immuta-values.yaml.

    Create a pod named debug-dns and spawn an interactive shell.

  • Install package bind-utils.

  • Perform DNS lookups on a given FQDN.

  • hashtag
    I'm unsure which Kubernetes namespace or Helm release is associated with my Immuta installation. How can I find this out?

    hashtag
    I no longer have my immuta-values.yaml Helm values file. How do I recover this file?

    hashtag
    I don't want to keep passing option --namespace every time I run a Helm command. How do I set a default?

    hashtag
    PostgreSQL

    hashtag
    How do I determine if the database is accepting connections?

    1. Create a pod named debug-postgres and spawn an interactive shell.

    2. Validate that the database is listening.

    hashtag
    Redis

    hashtag
    How can a TCP connection be established without using Redis CLI?

    1. Create a pod named debug-redis and spawn an interactive shell.

    2. Send a raw TCP message to the database using Netcat.

    hashtag
    How do I establish a TCP connection?

    1. Create a pod named debug-redis and spawn an interactive shell.

    2. Establish a connection to the database using the Redis client. If a connection can be established with Netcat and the redis-cli command does not return, then Redis could be expecting a TLS connection. Pass option --tls.

    hashtag
    Elasticsearch

    hashtag
    How do I query the API using cURL?

    1. Create a pod named debug-elasticsearch and spawn an interactive shell.

    2. Install package curl.

    3. Check the cluster healtharrow-up-right.

    circle-info

    Basic authentication

    Depending on the cluster's configuration it might be necessary to use basic autharrow-up-right. Pass option --header "Authorization: Basic $token" where token equals $(printf '%s:%s' "<username>" "<password>" | base64)

    DBFS FUSE Mount limitation: This feature cannot be used in environments with E2 Private Link enabled.

    This feature (provided by Databricks) mounts DBFS to the local cluster filesystem at /dbfs. Although disabled when using process isolation, this feature can safely be enabled if raw, unfiltered data is not stored in DBFS and all users on the cluster are authorized to see each other’s files. When enabled, the entirety of DBFS essentially becomes a scratch path where users can read and write files in /dfbs/path/to/my/file as though they were local files.

    For example,

    In Python,

    Note: This solution also works in R and Scala.

    hashtag
    Enable DBFS FUSE Mount

    To enable the DBFS FUSE mount, set this configuration: immuta.spark.databricks.dbfs.mount.enabled=true.

    circle-info

    Mounting a bucket

    • Users can mount additional buckets to DBFSarrow-up-right that can also be accessed using the FUSE mount.

    • Mounting a bucket is a one-time action, and the mount will be available to all clusters in the workspace from that point on.

    • Mounting must be performed from a non-Immuta cluster.

    hashtag
    Scala DBUtils (and %fs magic) with Scratch Paths

    Scratch paths will work when performing arbitrary remote filesystem operations with fs magic or Scala dbutils.fs functions. For example,

    hashtag
    Configure Scala DBUtils (and %fs magic) with Scratch Paths

    To support %fs magic and Scala DBUtils with scratch paths, configure

    hashtag
    Configure DBUtils in Python

    To use dbutils in Python, set this configuration: immuta.spark.databricks.py4j.strict.enabled=false.

    hashtag
    Example Workflow

    This section illustrates the workflow for getting a file from a remote scratch path, editing it locally with Python, and writing it back to a remote scratch path.

    1. Get the file from remote storage:

    2. Make a copy if you want to explicitly edit localScratchFile, as it will be read-only and owned by root:

    3. Write the new file back to remote storage:

    hashtag
    Recommendations

    There are limitations to isolation among users in Scala jobs on a Databricks cluster, even when using Immuta’s SecurityManager. When data is broadcast, cached (spilled to disk), or otherwise saved to SPARK_LOCAL_DIR, it's impossible to distinguish between which user’s data is composed in each file/block. If you are concerned about this vulnerability, Immuta suggests that Scala clusters

    • be limited to Scala jobs only.

    • use project equalization, which forces all users to act under the same set of attributes, groups, and purposes with respect to their data access.

    hashtag
    Context for Security: Why Project Equalization is Recommended

    When data is read in Spark using an Immuta policy-enforced plan, the masking and redaction of rows is performed at the leaf level of the physical Spark plan, so a policy such as "Mask using hashing the column social_security_number for everyone" would be implemented as an expression on a project node right above the FileSourceScanExec/LeafExec node at the bottom of the plan. This process prevents raw data from being shuffled in a Spark application and, consequently, from ending up in SPARK_LOCAL_DIR.

    This policy implementation coupled with an equalized project guarantees that data being dropped into SPARK_LOCAL_DIR will have policies enforced and that those policies will be homogeneous for all users on the cluster. Since each user will have access to the same data, if they attempt to manually access other users' cached/spilled data, they will only see what they have access to via equalized permissions on the cluster. If project equalization is not turned on, users could dig through that directory and find data from another user with heightened access, which would result in a data leak.

    hashtag
    Configuration for Requiring Equalized Projects with Scala

    To require that Scala clusters be used in equalized projects and avoid the risk described above, change the immuta.spark.require.equalization value to true in your Immuta configuration file when you spin up Scala clusters:

    Once this configuration is complete, users on the cluster will need to switch to an Immuta equalized project before running a job. (Remember that when working under an Immuta Project, only tables within that project can be seen.) Once the first job is run using that equalized project, all subsequent jobs, no matter the user, must also be run under that same equalized project. If you need to change a cluster's project, you must restart the cluster.

    IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URIS=dbfs:/immuta/bstabile/jars/immuta-spark-third-party-lib-test.jar
    GRANT REFERENCE_USAGE ON DATABASE "<Immuta database of the provider account>" TO SHARE "<DATA_SHARE>";
    computerScientists:
    - Alan Turing
    - Grace Hopper
    - Donald Knuth
    - Tim Berners-Lee
    - John McCarthy
    - <first-name> <last-name>
    computerScientists:
    - Alan Turing
    - Grace Hopper
    - Donald Knuth
    - Tim Berners-Lee
    - John McCarthy
    - Margaret Hamilton
    global:
      imageRegistry: <private-registry-fqdn>
    secure:
      backgroundWorker:
        image:
          repository: <prefix>/immuta-service
      web:
        image:
          repository: <prefix>/immuta-service
    kubectl edit secret/immuta-secret
    cache:
      enabled: false
    
    secure:
      extraConfig:
        server:
          cache:
            provider:
              constructor: catbox-redis
              options:
                host: <redis-fqdn>
                port: <port>
                # Setting options.tls to an empty dict enables TLS without configuring any other options.
                tls: {}
    
                # Dict representation of TLS config options json-object for package ioredis
                # https://github.com/redis/ioredis
                #
                # tls:
                #   ca:
                #   key:
                #   cert:
    
      extraEnvVars:
      - name: IMMUTA_SERVER_CACHE_PROVIDER_OPTIONS_PASSWORD
        valueFrom:
          secretKeyRef:
            key: IMMUTA_SERVER_CACHE_PROVIDER_OPTIONS_PASSWORD
            name: immuta-secret
    cache:
      enabled: false
    
    secure:
      extraConfig:
        server:
          cache:
            provider:
              constructor: catbox-memcached
              options:
                host: <memcached-fqdn>
                port: <port>
    helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.20
    kubectl run debug-dns --stdin --tty --rm --image docker.io/rockylinux/rockylinux:9 -- sh
    dnf install bind-utils
    dig <fqdn>
    kubectl run debug-postgres --stdin --tty --rm --image docker.io/bitnami/postgresql:latest -- sh
    pg_isready --host <postgres-fqdn> --port 5432
    kubectl run debug-redis --stdin --tty --rm --image docker.io/rockylinux/rockylinux:9 -- sh
    nc -zv <redis-fqdn> 6379
    kubectl run debug-redis --stdin --tty --rm --image docker.io/bitnami/redis:latest -- sh
    redis-cli -h <redis-fqdn> -p 6379
    kubectl run debug-elasticsearch --stdin --tty --rm --image docker.io/rockylinux/rockylinux:9 -- sh
    dnf install curl
    curl --fail --request GET "http://<elasticsearch-fqdn>:9200/_cluster/health?pretty"
    helm list --all-namespaces --output json | jq '.[]|select(.chart | startswith("immuta"))'
    helm get values <release-name> > immuta-values.yaml
    kubectl config set-context --current --namespace=<name>
    dbutils.fs.cp(s3ScratchFile, "file://{}".format(localScratchFile))
    shutil.copy(localScratchFile, localScratchFileCopy)
    with open(localScratchFileCopy, "a") as f:
        f.write("Some appended file content")
    dbutils.fs.cp("file://{}".format(localScratchFileCopy), s3ScratchFile)
    %sh echo "I'm creating a new file in DBFS" > /dbfs/my/newfile.txt
    %python
    with open("/dbfs/my/newfile.txt", "w") as f:
      f.write("I'm creating a new file in DBFS")
    %fs put -f s3://my-bucket/my/scratch/path/mynewfile.txt "I'm creating a new file in S3"
    %scala dbutils.fs.put("s3://my-bucket/my/scratch/path/mynewfile.txt", "I'm creating a new file in S3")
           <property>
               <name>immuta.spark.databricks.scratch.paths</name>
               <value>s3://my-bucket/my/scratch/path</value>
           </property>
    %python
    import os
    import shutil
    
    s3ScratchFile = "s3://some-bucket/path/to/scratch/file"
    localScratchDir = os.environ.get("IMMUTA_LOCAL_SCRATCH_DIR")
    localScratchFile = "{}/myfile.txt".format(localScratchDir)
    localScratchFileCopy = "{}/myfile_copy.txt".format(localScratchDir)
    <property>
    <name>immuta.spark.require.equalization</name>
    <value>true</value>
    </property>
    Integration settings:
    • Enable Snowflake table grants: Enable Snowflake table grants and configure the Snowflake role prefix.

    • Use Snowflake data sharing with Immuta: Use Snowflake data sharing with table grants or project workspaces.

    • Snowflake low row access policy mode: Enable Snowflake low row access policy mode.

    • : Configure your Snowflake integration to automatically apply tags added to a Snowflake table to its descendant data source columns in Immuta.

    Snowflake lineage tag propagation: Snowflake column lineage specifies how data flows from source tables or columns to the target tables in write operations. When Snowflake lineage tag propagation is enabled in Immuta, Immuta automatically applies tags added to a Snowflake table to its descendant data source columns in Immuta so you can build policies using those tags to restrict access to sensitive data.

  • Snowflake low row access policy mode: The Snowflake low row access policy mode improves query performance in Immuta's Snowflake integration. To do so, this mode decreases the number of Snowflake row access policies Immuta creates and uses table grants to manage user access. This guide describes the design and requirements of this mode.

  • Snowflake table grants: Snowflake table grants simplifies the management of privileges in Snowflake when using Immuta. Instead of manually granting users access to tables registered in Immuta, you allow Immuta to manage privileges on your Snowflake tables and views according to subscription policies. This guide describes the components of Snowflake table grants and how they are used in Immuta's Snowflake integration.

  • Warehouse sizing recommendations: Adjust the size and scale of clusters for your warehouse to manage workloads so that you can use Snowflake compute resources the most cost effectively.

  • Configure a Snowflake integration
    Snowflake table grants migration
    Edit or remove an existing integration
    Phased Snowflake onboarding approach
    Snowflake integration reference guide
    Snowflake data sharing with Immuta

    MANAGE GRANTS ON ACCOUNT allows the Immuta Snowflake role to grant and revoke SELECT privileges on Snowflake tables and views that have been added as data sources in Immuta.

  • CREATE ROLE ON ACCOUNT allows for the creation of a Snowflake role for each user in Immuta, enabling fine-grained, attribute-based access controls to determine which tables are available to which individuals.

  • hashtag
    Table grants role

    Since table privileges are granted to roles and not to users in Snowflake, Immuta's Snowflake table grants feature creates a new Snowflake role for each Immuta user. This design allows Immuta to manage table grants through fine-grained access controls that consider the individual attributes of users.

    Each Snowflake user with an Immuta account will be granted a role that Immuta manages. The naming convention for this role is <IMMUTA>_USER_<username>, where

    • <IMMUTA> is the prefix you specified when enabling the feature on the Immuta app settings page.

    • <username> is the user's Immuta username.

    hashtag
    Querying Snowflake tables managed by Immuta

    Users are granted access to each Snowflake table or view automatically when they are subscribed to the corresponding data source in Immuta.

    Users have two options for querying Snowflake tables that are managed by Immuta:

    • Use the rolearrow-up-right that Immuta creates and manages. (For example, USE ROLE IMMUTA_USER_<username>. See the section above for details about the role and name conventions.) If the current active primary role is used to query tables, USAGE on a Snowflake warehouse must be granted to the Immuta-managed Snowflake role for each user.

    • USE SECONDARY ROLES ALLarrow-up-right, which allows users to use the privileges from all roles that they have been granted, including IMMUTA_USER_<username>, in addition to the current active primary role. Users may also set a value for DEFAULT_SECONDARY_ROLES as an object propertyarrow-up-right on a Snowflake user. To learn more about primary roles and secondary roles in Snowflake, see .

    hashtag
    Applying GRANTs and REVOKEs at scale

    Immuta uses an algorithm to determine the most optimal way to group users in a role hierarchy in order to optimize the number of GRANTs (or REVOKES) executed in Snowflake. This is done by determining the least amount of possible permutations of access across tables and users based on the policies in place; then, those become intermediate roles in the hierarchy that each user is added to, based on the intermediate roles they belong to.

    As an example, take the below users and data sources they have access to. To do this naively by individually granting every user to the tables they have access to would result in 37 grants:

    Conversely, using the Immuta algorithm, we can optimize the number of grants in the same scenario down to 29:

    It’s important to consider a few things here:

    1. If the permutations of access are small, there will be a huge optimization realized (very few intermediate roles). If every user has their own unique permutation of access, the optimization will be negligible (an intermediate role per user). It is most common that the number of permutations of access will be many multiples smaller than the actual user count, so there should be large optimizations. In other words, a much smaller number of intermediate roles and the number of total overall grants reduced, since the tables are granted to roles and roles to users.

    2. This only happens once up front. After that, changes are incremental based on policy changes and user attribute changes (smaller updates), unless there’s a policy that makes a sweeping change across all users. The addition of new users who have access becomes much more straightforward also due to the fact above. User’s access will be granted via the intermediate role, and, therefore, a lot of the work is front loaded in the intermediate role creation.

    hashtag
    Limitations

    • Project workspaces are not supported when Snowflake table grants is enabled.

    • If an Immuta tenant is connected to an external IAM and that external IAM has a username identical to another username in Immuta's built-in IAM, those users will have the same Snowflake role, leading both to see the same data.

    • Sometimes the role generated can contain special characters such as @ because it's based on the user name configured from your identity manager. Because of this, it is recommended that any code references to the Immuta-generated role be enclosed with double quotes.

    where
    predicate). Use the tabs below to view examples of reading data using these methods.

    To read from an individual file, load a partition file from a sub-directory:

    spark.read.format("parquet").load("s3:/my_bucket/path/to/my_parquet_table/partition_column=01/my_file.parquet")

    To read partitioned data from a sub-directory, load a parquet partition from a sub-directory:

    spark.read.format("parquet").load("s3:/my_bucket/path/to/my_parquet_table/partition_column=01")

    Alternatively, load a parquet partition using a where predicate:

    spark.read.format("parquet").load("s3:/my_bucket/path/to/my_parquet_table").where("partition_column=01")

    hashtag
    Limitations

    • Direct file reads for Immuta data sources only apply to table-backed Immuta data sources, not data sources created from views or queries.

    • If more than one data source has been created for a path, Immuta will use the first valid data source it finds. It is therefore not recommended to use this integration when more than one data source has been created for a path.

    • In Databricks, multiple input paths are supported as long as they belong to the same data source.

    • CSV-backed tables are not currently supported.

    • Loading a delta partition from a sub-directory is not recommended by Spark and is not supported in Immuta. Instead, use a where predicate:

  • Configure your Redshift integration or configure your Redshift Spectrum integration.

  • Select None as your default subscription policy.

  • Integrate an IAM with Immuta.

  • .

  • hashtag
    Discover your data

    circle-info

    Private preview: SDD for Redshift is currently in private preview and available to all accounts.

    These guides provide step-by-step instructions for discovering, classifying, and tagging your data.

    1. Enable sensitive data discovery (SDD).

    2. Register a subset of your tables to configure and validate SDD.

    3. Configure SDD to discover entities of interest for your policy needs.

    4. .

    5. Register your remaining tables at the with .

    6. .

    hashtag
    Secure your data

    These guides provide step-by-step instructions for configuring and securing your data with governance policies, or see the Secure use cases for a comprehensive guide on creating policies to fit your organization's use case.

    1. Create a global subscription policy.

    2. Create a global data policy.

    3. Validate the policies. You do not have to validate every policy you create in Immuta; instead, examine a few to validate the behavior you expect to see.

    4. Once all Immuta policies are in place, remove or alter old permissions and revoke access to the ungoverned tables.

    Redshift Spectrum options
    a Redshift integration
    Integration and data source registration warehouse use

    The Snowflake integration uses warehouse compute resources to sync policies created in Immuta to the Snowflake objects registered as data sources and, if enabled, to run sensitive data discovery and schema monitoring. Follow the guidelines below to adjust the warehouse size and scale according to your needs.

    • Increase the sizearrow-up-right of and numberarrow-up-right of clusters for the warehouse during large policy syncs, updates, and changes.

    • Enable auto-suspend and auto-resumearrow-up-right to optimize resource use in Snowflake. In the Snowflake UI, the lowest auto suspend time setting is 5 minutes. However, through SQL query, you can set auto_suspend to 61 seconds (since the minimum uptime for a warehouse is 60seconds). For example,

    • Sensitive data discovery uses compute resources for each table registered if it is enabled. Consider disabling sensitive data discovery when registering data sources if you have an or a tagging strategy in place.

    • Register data before creating global policies. By default, Immuta on registered data (unless an existing global policy applies to it), which allows Immuta to only pull metadata instead of also applying policies when data sources are created. Registering data before policies are created reduces the workload and the Snowflake compute resources needed.

    • Begin onboarding with a small dataset of tables, and then review and monitor query performance in the . Adjust the virtual warehouse accordingly to handle heavier loads.

    • uses the compute warehouse that was employed during the initial ingestion. If you expect a low number of new tables or minimal changes to the table structure, consider scaling down the warehouse size.

    • Resize the warehouse after after data sources are registered and policies are established. For example,

    For more details and guidance about warehouse sizing, see the Snowflake Warehouse Considerations documentationarrow-up-right.

    hashtag
    Identifying bulk jobs and heavy workloads

    Even after your integration is configured, data sources are registered, and policies are established, changes to those data sources or policies may initiate heavy workloads. Follow the guidelines below to adjust your warehouse size and scale according to your needs.

    • Review your Snowflake query historyarrow-up-right to identify query performance and bottlenecks.

    • Check how many credits queries have consumed:

    • After reviewing query performance and cost, implement strategies above to adjust your warehouse.

    hashtag
    Configure the Snowflake integration

    1. Navigate to the App Setting page and click the Integration tab.

    2. Click +Add Integration and select Snowflake from the dropdown menu.

    3. Complete the Host, Port, and Default Warehouse fields.

    4. Enable Query Audit.

    5. Enable Lineage and complete the following fields:

      • Ingest Batch Sizes: This setting configures the number of rows Immuta ingests per batch when streaming Access History data from your Snowflake instance.

      • Table Filter: This filter determines which tables Immuta will ingest lineage for. Enter a regular expression that excludes / from the beginning and end to filter tables. Without this filter, Immuta will attempt to ingest lineage for every table on your Snowflake instance.

    6. Select Manual or Automatic Setup and

    hashtag
    Trigger Snowflake lineage sync job

    hashtag
    Prerequisite

    Authenticate with the Immuta API.

    hashtag
    Trigger the lineage job

    The Snowflake lineage sync endpoint triggers the lineage ingestion job that allows Immuta to propagate Snowflake tags added through lineage to Immuta data sources.

    1. Copy the example and replace the Immuta URL and API key with your own.

    2. Change the payload attribute values to your own, where

      • tableFilter (string): This regular expression determines which tables Immuta will ingest lineage for. Enter a regular expression that excludes / from the beginning and end to filter tables. Without this filter, Immuta will attempt to ingest lineage for every table on your Snowflake instance.

      • batchSize (integer): This parameter configures the number of rows Immuta ingests per batch when streaming Access History data from your Snowflake instance. Minimum 1.

      • lastTimestamp (string): Setting this parameter will only return lineage events later than the value provided. Use a format like 2022-06-29T09:47:06.012-07:00.

    hashtag
    Next steps

    Once the sync job is complete, you can complete the following steps:

    • Register Snowflake data sources

    • Build policies

    Getting Started

    The instructions and how-to guides on this page illustrate how to install Immuta in your Kubernetes environment. If you are upgrading Immuta, navigate to the Upgrade section instead.

    hashtag
    Prerequisites and requirements

    • Use a supported version of Kubernetes.

    • Use Helm 3.2.0 or newer (When using a Helm version older than 3.8.0, enable OCI experimental mode by exporting environment variable HELM_EXPERIMENTAL_OCI=1.)

    • Deploy the services listed on the Deployment requirements guide. See the for guidance for specific cloud providers.

    • Grant to create Kubernetes resources in the cluster.

    hashtag
    Pull the Helm chart

    Consult the if unsure which Helm chart to use.

    hashtag
    ocir.immuta.com

    circle-exclamation

    Helm chart availability

    The deprecated Immuta Helm chart (IHC) is not available from ocir.immuta.com.

    Copy the snippet below and replace the placeholder text with the credentials provided to you by your Immuta support professional:

    hashtag
    Install Immuta

    Immuta can be installed on any Kubernetes cluster. Select a guide below that corresponds to your Kubernetes distribution to install Immuta. If your distribution is not listed below (such as or ), follow the generic installation instructions:

    • : This guide includes instructions for

      • Amazon Elastic Kubernetes Service (EKS)

      • Google Kubernetes Engine (GKE)

    hashtag
    Configure Ingress

    To complete your installation and access the Immuta application, .

    hashtag
    Additional recommendations

    The includes guidance for various scenarios you may encounter during and post-deployment. Below are several guides from that section that most customers follow to complete their deployment of Immuta, but none of these is a requirement for the Immuta installation to work.

    • : Secure your Ingress by specifying a Secret that contains a TLS private key and certificate.

    • : Follow these best practices for configuring your deployment for a production environment.

    • : The Immuta Enterprise Helm chart manages its own Memcached deployment inside the cluster. However, you can opt to externalize the key-value cache post-installation.

    Immuta in Production

    This guide highlights best practices when deploying Immuta in a production environment.

    circle-info

    Kubernetes namespace

    The following section(s) presume the Immuta Enterprise Helm chart was deployed into namespace immuta and that the current namespace is immuta.

    hashtag
    Helm values

    Back up or source control your immuta-values.yaml Helm values file.

    hashtag
    Kubernetes resource requests and limits

    Assign to pods.

    hashtag
    Edit Helm values

    Edit immuta-values.yaml to include the following recommended resource requests and limits for most Immuta deployments.

    circle-info

    Increase replica count to 3 on web and backgroundWorker for large deployments.

    hashtag
    Kubernetes secrets

    Use in the immuta-values.yaml file instead of passwords and tokens. The following section demonstrates how to create a secret and reference it in the Helm values file.

    hashtag
    Create secret

    1. Create a file named secret-data.env with the following content.

    2. Create secret named immuta-secret from file secret-data.env.

    3. Delete file secret-data.env

    hashtag
    Edit Helm values

    1. Edit immuta-values.yaml to include the following Helm values.

    2. Remove any sensitive key-value pairs from the immuta-values.yaml Helm values that were made redundant after the secret was created.

    hashtag
    Apply Helm values

    Perform a to apply the changes made to immuta-values.yaml.

    Edit or Remove Your Snowflake Integration

    To edit or remove a Snowflake integration, you have two options:

    • Automatic: Grant Immuta one-time use of credentials with the following privileges to automatically edit or remove the integration:

      • CREATE DATABASE ON ACCOUNT WITH GRANT OPTION

      • CREATE ROLE ON ACCOUNT WITH GRANT OPTION

      • CREATE USER ON ACCOUNT WITH GRANT OPTION

      • MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION

    • Manual: Run the Immuta script in your Snowflake environment as a user with the following privileges to edit or remove the integration:

      • CREATE DATABASE ON ACCOUNT WITH GRANT OPTION

      • CREATE ROLE ON ACCOUNT WITH GRANT OPTION

    hashtag
    Edit a Snowflake integration

    Select one of the following options for editing your integration:

    • : Grant Immuta one-time use of credentials to automatically edit the integration.

    • : Run the Immuta script in your Snowflake environment yourself to edit the integration.

    hashtag
    Automatic edit

    1. Click the App Settings icon in the left sidebar.

    2. Click the Integrations tab and click the down arrow next to the Snowflake integration.

    3. Edit the field you want to change or check a checkbox of a feature you would like to enable. Note any field shadowed is not editable, and the integration must be disabled and re-installed to change it.

    hashtag
    Manual edit

    1. Click the App Settings icon in the left sidebar.

    2. Click the Integrations tab and click the down arrow next to the Snowflake integration.

    3. Edit the field you want to change or check a checkbox of a feature you would like to enable. Note any field shadowed is not editable, and the integration must be disabled and re-installed to change it.

    hashtag
    Remove a Snowflake integration

    Select one of the following options for deleting your integration:

    • : Grant Immuta one-time use of credentials to automatically remove the integration and Immuta-managed resources from your Snowflake environment.

    • : Run the Immuta script in your Snowflake environment yourself to remove Immuta-managed resources and policies from Snowflake.

    hashtag
    Automatic removal

    1. Click the App Settings icon in the left sidebar.

    2. Click the Integrations tab and click the down arrow next to the Snowflake integration.

    3. Click the checkbox to disable the integration.

    hashtag
    Manual removal

    1. Click the App Settings icon in the left sidebar.

    2. Click the Integrations tab and click the down arrow next to the Snowflake integration.

    3. Click the checkbox to disable the integration.

    Self-Managed Deployment

    This section illustrates how to install Immuta on Kubernetes using the Immuta Enterprise Helm chart.

    hashtag
    Getting started

    This how-to guide includes instructions and links for installing Immuta in any Kubernetes environment.

    hashtag
    Requirements

    This reference guide provides an overview of the Immuta Enterprise Helm chart version requirements and infrastructure recommendations.

    hashtag
    Install

    The guides in this section illustrate how to install and deploy Immuta in your Kubernetes environment. If your distribution is not listed below (such as or ), follow the generic installation instructions:

    • : This guide includes instructions for

      • Amazon Elastic Kubernetes Service (EKS)

      • Google Kubernetes Engine (GKE)

    hashtag

    The guides in this section illustrate how to configure your Immuta Enterprise Helm chart for various scenarios, including optimizing your deployment for production environments.

    hashtag
    Upgrade

    The guides in this section illustrate how to upgrade the Immuta Enterprise Helm chart:

    • : Upgrade from Immuta v2024.1.x or older to Immuta v2024.2 LTS.

    • : Upgrade from Immuta v2024.2 LTS or newer.

    hashtag

    This guide provides links to additional resources for disaster recovery strategies.

    hashtag

    This page provides troubleshooting guidance and outlines frequently asked questions for the Immuta installation.

    hashtag

    This guide outlines the updates and bug fixes to the Immuta Enterprise Helm chart.

    Ephemeral Overrides

    circle-info

    Ephemeral overrides best practices

    1. Disable ephemeral overrides for clusters when using multiple workspaces and dedicate a single cluster to serve queries from Immuta in a single workspace.

    2. If you use multiple E2 workspaces without disabling ephemeral overrides, avoid applying the where user row-level policy to data sources.

    hashtag
    Overview

    In Immuta, a Databricks data source is considered ephemeral, meaning that the compute resources associated with that data source will not always be available.

    Ephemeral data sources allow the use of ephemeral overrides, user-specific connection parameter overrides that are applied to Immuta metadata operations.

    When a user runs a Spark job in Databricks, Immuta plugins automatically submit ephemeral overrides for that user to Immuta for all applicable data sources to use the current cluster as compute for all subsequent metadata operations for that user against the applicable data sources.

    hashtag
    Example Query and Ephemeral Override Request

    1. A user runs a query on cluster B.

    2. The Immuta plugins on the cluster check if there is a source in the Metastore with a matching database, table name, and location for its underlying data. Note: If tables are dynamic or change over time, users can disable the comparison of the location of the underlying data by setting immuta.ephemeral.table.path.check.enabled to false; disabling allows users to avoid keeping the relevant data sources in Immuta up-to-date (which would require API calls and automation).

    If the user attempts to query data source 2 and they have not enabled JDBC sources, they will be presented with an error message telling them to do so:

    com.immuta.spark.exceptions.ImmutaConfigurationException: This query plan will cause data to be pulled over JDBC. This spark context is not configured to allow this. To enable JDBC set immuta.enable.jdbc=true in the spark context hadoop configuration.

    hashtag
    Immuta Operations that Use Ephemeral Overrides

    Ephemeral overrides are enabled by default because Immuta must be aware of a cluster that is running to serve metadata queries. The operations that use the ephemeral overrides include

    • Visibility checks on the data source for a particular user. These checks assess how to apply row-level policies for specific users.

    • Stats collection triggered by a specific user.

    • Validating a custom WHERE clause policy against a data source. When owners or governors create custom WHERE clause policies, Immuta uses compute resources to validate the SQL in the policy. In this case, the ephemeral overrides for the user writing the policy are used to contact a cluster for SQL validation.

    However, ephemeral overrides can be problematic in environments that have a dedicated cluster to handle maintenance activities, since ephemeral overrides can cause these operations to execute on a different cluster than the dedicated one.

    hashtag
    Configure Overrides in Immuta-Enabled Clusters

    To reduce the risk that a user has overrides set to a cluster (or multiple clusters) that aren't currently up,

    • direct all clusters' HTTP paths for overrides to a cluster dedicated for metadata queries or

    • disable overrides completely.

    hashtag
    Disable Ephemeral Overrides

    To disable ephemeral overrides, set immuta.ephemeral.host.override in spark-defaults.conf to false.

    Databricks Spark

    This integration enforces policies on Databricks tables registered as data sources in Immuta, allowing users to query policy-enforced data on Databricks clusters (including job clusters). Immuta policies are applied to the plan that Spark builds for users' queries, all executed directly against Databricks tables.

    The guides in this section outline how to integrate Databricks with Immuta.

    hashtag
    How-to guides

    • : Configure the Databricks Spark integration.

    • : Access DBFS in Databricks for non-sensitive data.

    • : Allow Immuta users to access tables that are not protected by Immuta.

    • : Hide the Immuta database from users in Databricks, since user queries do not need to reference it.

    • : Run R and Scala spark-submit jobs on your Databricks cluster.

    • : Raise the caching on-cluster and lower the cache timeouts for the Immuta web service to allow use of project UDFs in Spark jobs.

    • : Use an existing Hive external metastore instead of the built-in metastore.

    hashtag
    Reference guides

    • : This guide describes the design and components of the integration.

    • Configuration settings: These guides describe various integration settings that can be configured, including , cluster policies, and .

    • : This guide describes Immuta's support of Databricks change data feed.

    Delta Lake API

    Delta Lake API reference guide

    When using Delta Lake, the API does not go through the normal Spark execution path. This means that Immuta's Spark extensions do not provide protection for the API. To solve this issue and ensure that Immuta has control over what a user can access, the Delta Lake API is blocked.

    Spark SQL can be used instead to give the same functionality with all of Immuta's data protections.

    hashtag
    Requests

    Below is a table of the Delta Lake API with the Spark SQL that may be used instead.

    Delta Lake API
    Spark SQL

    See here for a complete list of the .

    hashtag
    Merging tables in project workspaces

    When a table is created in a workspace, you can merge a different Immuta data source from that workspace into that table you created.

    1. Create a table in the project workspace.

    2. Create a temporary view of the Immuta data source you want to merge into that table.

    3. Use that temporary view as the data source you add to the project workspace.

    TLS Configuration

    This guide demonstrates how to configure TLS termination for an .

    circle-info

    Kubernetes namespace

    The following section(s) presume the Immuta Enterprise Helm chart was deployed into namespace immuta and that the current namespace is immuta.

    Getting Started

    The how-to guides linked on this page illustrate how to integrate Snowflake with Immuta to secure your data with governance policies, discover what data types and sensitive data should be secured, and observe your users' activity to ensure risky user access is caught and addressed.

    Requirement: Snowflake Enterprise Edition

    hashtag
    Configure your Snowflake integration

    Configuring is required for Detect, Discover, and Secure. These guides provide information on the recommended features to enable with Snowflake, or see the for a comprehensive guide on the benefits of these features and other recommendations.

    Immuta in an Air-Gapped Environment

    This page provides one possible way to download and package Immuta artifacts for consumption on a separate network with no Internet access.

    1. Copy the snippet below and replace the placeholder text with the credentials provided by your Immuta representative:

    2. Download the IEHC for the current Immuta release:

    Rotating Credentials

    This guide demonstrates how to update credentials referenced in the Immuta Enterprise Helm chart (IEHC).

    circle-info

    Kubernetes namespace

    The following section(s) presume the IEHC was deployed into namespace immuta and that the current namespace is immuta.

    Limited Enforcement in Databricks

    Databricks non-admin users will only see sources to which they are subscribed in Immuta, and this can present problems if organizations have a data lake full of non-sensitive data and Immuta removes access to all of it. The Limited Enforcement Scope feature addresses this challenge by allowing Immuta users to access any tables that are not protected by Immuta (i.e., not registered as a data source or a table in a project workspace). Although this is similar to how privileged users in Databricks operate, non-privileged users cannot bypass Immuta controls.

    This feature is composed of two configurations:

    • Allowing non-Immuta reads: Immuta users with regular (unprivileged) Databricks roles may SELECT from tables that are not registered in Immuta.

    Databricks Libraries Introduction

    This page provides an overview of Immuta's feature and support of .

    hashtag
    Databricks Libraries and Immuta's Security Manager

    The Immuta security manager blocks users from executing code that could allow them to gain access to sensitive data by only allowing select code paths to access sensitive files and methods. These select code paths provide Immuta's code access to sensitive resources while blocking end users from these sensitive resources directly.

    Similarly, when users install third-party libraries those libraries will be denied access to sensitive resources by default. However, cluster administrators can specify which of the installed Databricks libraries should be

    ALTER WAREHOUSE "WH_NAME" SET WAREHOUSE_SIZE = 'XSMALL' AUTO_SUSPEND = 61 AUTO_RESUME = TRUE MIN_CLUSTER_COUNT = 1 MAX_CLUSTER_COUNT = 2 SCALING_POLICY = 'STANDARD' COMMENT = '';
    SELECT h.* FROM "SNOWFLAKE"."ACCOUNT_USAGE"."QUERY_HISTORY" h
    INNER JOIN "SNOWFLAKE"."ACCOUNT_USAGE"."SESSIONS" s
    ON s.session_id = h.session_id
    WHERE GET(parse_json(s.client_environment), 'APPLICATION') = 'IMMUTA' limit 25;
    Snowflake lineage tag propagation
    Map external user IDs from Redshift to Immuta
    Validate that the SDD tags are applied correctly
    schema level
    schema monitoring turned on
    Implement classification to categorize and tag sensitive data

    CREATE USER ON ACCOUNT WITH GRANT OPTION

  • MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION

  • APPLY MASKING POLICY ON ACCOUNT WITH GRANT OPTION

  • APPLY ROW ACCESS POLICY ON ACCOUNT WITH GRANT OPTION

  • From the Select Authentication Method Dropdown, select either Username and Password or Key Pair Authentication:

    • Username and Password option: Complete the Username, Password, and Role fields.

    • Key Pair Authentication option:

      1. Complete the Username field.

      2. When using a private key, enter the private key file password in the Additional Connection String Options. Use the following format: PRIV_KEY_FILE_PWD=<your_pw>

      3. Click Key Pair (Required), and upload a Snowflake key pair file.

      4. Complete the Role field.

  • Click Save.

  • Click edit script to download the script, and then run it in Snowflake.

  • Click Save.

  • Enter the Username, Password, and Role that was entered when the integration was configured.
  • Click Save.

  • Click cleanup script to download the script.
  • Click Save.

  • Run the cleanup script in Snowflake.

  • Automatic
    Manual
    Automatic
    Manual
    Microsoft Azure Kubernetes Service (AKS)
  • Red Hat OpenShift

  • Generic installation

  • K3sarrow-up-right
    RKE2arrow-up-right
    Managed public cloud
    Configure
    Upgrading to Immuta v2024.2 LTS
    Upgrade Immuta
    Disaster recovery
    Troubleshooting
    Release notes

    The Immuta plugins on the cluster detect that the user is subscribed to data sources 1, 2, and 3 and that data sources 1 and 3 are both present in the Metastore for cluster B, so the plugins submit ephemeral override requests for data sources 1 and 3 to override their connections with the HTTP path from cluster B.

  • Since data source 2 is not present in the Metastore, it is marked as a JDBC source.

  • High Cardinality Column detection. Certain advanced policy types (e.g., minimization and randomized response) in Immuta require a High Cardinality Column, and that column is computed on data source creation. It can be recomputed on demand and, if so, will use the ephemeral overrides for the user requesting computation.

    this configuration

    Databricks libraries: The trusted libraries feature allows Databricks cluster administrators to avoid Immuta security manager errors when using third-party libraries. This guide describes the feature and its configuration.

  • Delta Lake API: When using Delta Lake, the API does not go through the normal Spark execution path. This means that Immuta's Spark extensions do not provide protection for the API. To solve this issue and ensure that Immuta has control over what a user can access, the Delta Lake API is blocked. This reference guide outlines the Spark SQL options that can be substituted for the Delta Lake API.

  • Spark direct file reads: Immuta allows direct file reads in Spark for file paths. This guide describes that process.

  • Databricks configuration
    DBFS access
    Limited enforcement in Databricks
    Hiding the Immuta database in Databricks
    Run spark-submit jobs on Databricks
    Project UDFs cache settings
    External metastores
    Databricks Spark integration reference guide
    environment variables
    performance
    Databricks change data feed
    Snowflake documentationarrow-up-right
    hashtag
    Prerequisite

    The Ingress configuration must be completed before proceeding.

    hashtag
    Ingress-NGINX Controllerarrow-up-right

    1. Edit immuta-values.yaml to include the following Helm values.

    2. Create a TLS secretarrow-up-right from a given public/private PEM formatted key pair.

    3. Perform a Helm upgradearrow-up-right to apply the changes made to immuta-values.yaml.

    Refer to the Ingress-Nginx Controller documentationarrow-up-right for further assistance.

    hashtag
    GKE Ingress Controllerarrow-up-right

    1. Edit immuta-values.yaml to include the following Helm values.

    2. Perform a Helm upgradearrow-up-right to apply the changes made to immuta-values.yaml.

    Refer to the GKE Ingress Controller documentationarrow-up-right for further assistance.

    hashtag
    AWS Load Balancer Controllerarrow-up-right

    1. Edit immuta-values.yaml to include the following Helm values.

    2. Perform a Helm upgradearrow-up-right to apply the changes made to immuta-values.yaml.

    Refer to the AWS Load Balancer Controller documentationarrow-up-right for further assistance.

    hashtag
    AKS Application Gateway Ingress Controllerarrow-up-right

    1. Edit immuta-values.yaml to include the following Helm values.

    2. Perform a Helm upgradearrow-up-right to apply the changes made to immuta-values.yaml.

    Refer to the Application Gateway Ingress Controller documentationarrow-up-right for further assistance.

    hashtag
    Traefikarrow-up-right

    1. Edit immuta-values.yaml to include the following Helm values.

    2. Create a TLS secretarrow-up-right from a given public/private PEM formatted key pair.

    3. Perform a Helm upgradearrow-up-right to apply the changes made to immuta-values.yaml.

    Refer to the Traefik documentationarrow-up-right for further assistance.

    Ingress resourcearrow-up-right
    external catalog available
    does not apply a subscription policy
    Snowflake Query Monitorarrow-up-right
    Schema monitoring
  • Tag Filter: This filter determines which tags to propagate using lineage. Enter a regular expression that excludes / from the beginning and end to filter tags. Without this filter, Immuta will ingest lineage for every tag on your Snowflake instance.

  • follow the steps in this guide to configure the Snowflake integration
    Microsoft Azure Kubernetes Service (AKS)
  • Red Hat OpenShift

  • Generic installation

  • recommendations table
    RBAC permissionsarrow-up-right
    upgrade overview
    K3sarrow-up-right
    RKE2arrow-up-right
    Managed public cloud
    configure Ingress
    configure section
    TLS configuration
    Immuta in production
    External cache configuration
    , as it's no longer needed.
    memory resource limitsarrow-up-right
    Kubernetes secretsarrow-up-right
    Helm upgradearrow-up-right
    Run the following command:

    DeltaTable.convertToDelta

    CONVERT TO DELTA parquet./path/to/parquet/

    DeltaTable.delete

    DELETE FROM [table_identifier delta./path/to/delta/] WHERE condition

    DeltaTable.generate

    GENERATE symlink_format_manifest FOR TABLE [table_identifier delta./path/to/delta]

    DeltaTable.history

    DESCRIBE HISTORY [table_identifier delta./path/to/delta] (LIMIT x)

    DeltaTable.merge

    MERGE INTO

    DeltaTable.update

    UPDATE [table_identifier delta./path/to/delta/] SET column = valueWHERE (condition)

    DeltaTable.vacuum

    Delta SQL Commandsarrow-up-right

    VACUUM [table_identifier delta./path/to/delta]

    Allowing non-Immuta writes: Immuta users with regular (unprivileged) Databricks roles can run DDL commands and data-modifying commands against tables or spaces that are not registered in Immuta.

    Additionally, Immuta supports auditing all queries run on a Databricks cluster, regardless of whether users touch Immuta-protected data or not. To configure Immuta to do so, navigate to the Enable Auditing of All Queries in Databricks section.

    hashtag
    Enable Non-Immuta Reads

    circle-info

    Non-Immuta reads

    • This setting does not allow reading data directly with commands like spark.read.format("x"). Users are still required to read data and query tables using Spark SQL.

    • When non-Immuta reads are enabled, Immuta users will see all databases and tables when they run show databases and/or show tables. However, this does not mean they will be able to query all of them.

    1. Enable non-Immuta Reads by setting this configuration in the Spark environment variables (recommended) or immuta_conf.xml (not recommended):

    2. Opt to adjust the cache duration by changing the default value in the Spark environment variables (recommended) or immuta_conf.xml (not recommended). (Immuta caches whether a table has been exposed as an Immuta source to improve performance. The default caching duration is 1 hour.)

    hashtag
    Enable Non-Immuta Writes

    circle-info

    Non-Immuta writes

    • These non-protected tables/spaces have the same exposure as detailed in the read section, but with the distinction that users can write data directly to these paths.

    • With non-Immuta writes enabled, it will be possible for users on the cluster to mix any policy-enforced data they may have access to via any registered data sources in Immuta with non-Immuta data, and write the ensuing result to a non-Immuta write space where it would be visible to others. If this is not a desired possibility, the cluster should instead be configured to only use Immuta’s project workspaces.

    1. Enable non-Immuta Writes by setting this configuration in the Spark environment variables (recommended) or immuta_conf.xml (not recommended):

    2. Opt to adjust the cache duration by changing the default value in the Spark environment variables (recommended) or immuta_conf.xml (not recommended). (Immuta caches whether a table has been exposed as an Immuta source to improve performance. The default caching duration is 1 hour.)

    hashtag
    Enable Auditing of All Queries in Databricks

    Enable support for auditing all queries run on a Databricks cluster (regardless of whether users touch Immuta-protected data or not) by setting this configuration in the Spark environment variables (recommended) or immuta_conf.xml (not recommended):

    hashtag
    Default Configuration Values

    The controls and default values associated with non-Immuta reads, non-Immuta writes, and audit functionality are outlined below.

    secure:
      ingress:
        hostname: <immuta-fqdn>
        annotations:
          nginx.ingress.kubernetes.io/auth-tls-secret: <namespace>/<secret-name>
    kubectl create secret tls <secret-name> --cert=path/to/tls.cert --key=path/to/tls.key
    helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.20
    secure:
      ingress:
        hostname: <immuta-fqdn>
        annotations:
          ingress.gcp.kubernetes.io/pre-shared-cert: <certificate-name>
    helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.20
    secure:
      ingress:
        hostname: <immuta-fqdn>
        annotations:
          alb.ingress.kubernetes.io/certificate-arn: <certificate-arn>
    helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.20
    secure:
      ingress:
        hostname: <immuta-fqdn>
        annotations:
          appgw.ingress.kubernetes.io/appgw-ssl-certificate: <certificate-name>
    helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.20
    secure:
      ingress:
        annotations:
          traefik.ingress.kubernetes.io/router.tls: "true"
        hostname: <immuta-fqdn>
        tls: true
        # If left unset the TLS secret name defaults to <hostname>-tls
        secretName: <secret-name>
    kubectl create secret tls <secret-name> --cert=path/to/tls.cert --key=path/to/tls.key
    helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.20
    # Not recommended by Spark and not supported in Immuta
    spark.read.format("delta").load("s3:/my_bucket/path/to/my_delta_table/partition_column=01")
    
    # Recommended by Spark and supported in Immuta.
    spark.read.format("delta").load("s3:/my_bucket/path/to/my_delta_table").where("partition_column=01")
    ALTER WAREHOUSE "INTEGRATION_WH" SET WAREHOUSE_SIZE = 'XSMALL' AUTO_SUSPEND = 120 AUTO_RESUME = TRUE MIN_CLUSTER_COUNT = 1 MAX_CLUSTER_COUNT = 2 SCALING_POLICY = 'STANDARD'; 
    curl -X 'POST' \
        'https://www.organization.immuta.com/lineage/ingest/snowflake' \
        -H 'accept: application/json' \
        -H 'Content-Type: application/json' \
        -H 'Authorization: 846e9e43c86a4ct1be14290d95127d13f' \
        -d '{
        "tableFilter": "MY_DATABASE\\MY_SCHEMA\\..*",
        "batchSize": 1,
        "lastTimestamp": "2022-06-29T09:47:06.012-07:00"
        }'
    echo <token> | helm registry login --password-stdin --username <username> ocir.immuta.com
    audit:
      worker:
        replicaCount: 1
        resources:
          requests:
            cpu: 1000m
            memory: 1024Mi
          limits:
            cpu: 1000m
            memory: 2048Mi  
      deployment:
        replicaCount: 1
        resources:
          requests:
            cpu: 1000m
            memory: 4096Mi
          limits:
            cpu: 3000m
            memory: 8192Mi
    secure:
      backgroundWorker:
        replicaCount: 2
        resources:
          requests:
            cpu: 1000m
            memory: 4096Mi
          limits:
            cpu: 4000m
            memory: 4096Mi  
      web:
        replicaCount: 2 
        resources:
          requests:
            cpu: 1000m
            memory: 4096Mi
          limits:
            cpu: 4000m
            memory: 4096Mi
    discover:
      deployment:
        replicaCount: 1
        resources:
          requests:
            cpu: 500m
            memory: 4096Mi
          limits:
            cpu: 3000m
            memory: 4096Mi
    cache:
      deployment:
        replicaCount: 1
        resources:
          requests:
            cpu: 500m
            memory: 512Mi
          limits:
            cpu: 1000m
            memory: 512Mi
    # audit
    ELASTICSEARCH_USERNAME=<elasticsearch-username>
    ELASTICSEARCH_PASSWORD=<elasticsearch-password>
    
    # PostgreSQL connection string used by audit for the metadata database
    #   postgresql://<user>:<password>@<postgres-fqdn>:5432/<database>?schema=audit
    #
    # More info
    #   https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING
    DATABASE_CONNECTION_STRING=postgresql://immuta:<postgres-password>@<postgres-fqdn>:5432/immuta?schema=audit
    
    # secure
    IMMUTA_DATABASES_IMMUTA_CONNECTIONS_IMMUTADB_PASSWORD=<postgres-password>
    kubectl create secret generic immuta-secret --from-env-file=secret-data.env
    audit:
      #...
      deployment:
        existingSecret: immuta-secret
      export:
        cronJob:
          existingSecret: immuta-secret
    
    secure:
      #...
      existingSecret:
        name: immuta-secret
        # Optional. Map expected keys with keys in existing secret
        # keyMapping: {}
    helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.20
    rm -i secret-data.env
    MERGE INTO delta_native.target_native as target
    USING immuta_temp_view_data_source as source
    ON target.dr_number = source.dr_number
    WHEN MATCHED THEN
    UPDATE SET target.date_reported = source.date_reported
    <property>
        <name>immuta.spark.databricks.allow.non.immuta.reads</name>
        <value>true</value>
    </property>
    <property>
        <name>immuta.spark.non.immuta.table.cache.seconds</name>
        <value>3600</value>
    </property>
    <property>
        <name>immuta.spark.databricks.allow.non.immuta.writes</name>
        <value>true</value>
    </property>
    <property>
        <name>immuta.spark.non.immuta.table.cache.seconds</name>
        <value>3600</value>
    </property>
    <property>
        <name>immuta.spark.audit.all.queries</name>
        <value>true</value>
    </property>
    <property>
        <name>immuta.spark.databricks.allow.non.immuta.reads</name>
        <value>false</value>
    </property>
    <property>
        <name>immuta.spark.databricks.allow.non.immuta.writes</name>
        <value>false</value>
    </property>
    <property>
        <name>immuta.spark.non.immuta.table.cache.seconds</name>
        <value>3600</value>
    </property>
    <property>
        <name>immuta.spark.audit.all.queries</name>
        <value>false</value>
    </property>
    1. Configure your Snowflake integration with the following features enabled:

      • Snowflake table grants (enabled by default)

      • Snowflake low row access policy mode (enabled by default)

      • (enabled by default)

    2. Select None as your .

    3. .

    4. .

    hashtag
    Detect your user activity

    These guides provide step-by-step instructions for auditing and detecting your users' activity, or see the Detect use case for a comprehensive guide on the benefits of these features and other recommendations.

    1. Set up audit export to S3 or ADLS Gen2 for your Snowflake audit logs.

    2. View the Detect dashboards to see the activity of your registered users on registered tables.

    hashtag
    Discover your data

    These guides provide step-by-step instructions for discovering, classifying, and tagging your data.

    1. Enable sensitive data discovery (SDD).

    2. Register a subset of your tables to configure and validate SDD.

    3. Configure SDD to discover entities of interest for your policy needs.

    4. .

    5. Register your remaining tables at the with .

    6. .

    hashtag
    Secure your data

    These guides provide step-by-step instructions for configuring and securing your data with governance policies, or see the Secure use cases for a comprehensive guide on creating policies to fit your organization's use case.

    1. Create a global subscription policy.

    2. Validate the policy. You do not have to validate every policy you create in Immuta; instead, examine a few to validate the behavior you expect to see:

      1. Validate that the Immuta users impacted now have an Immuta role in Snowflake dedicated to them.

      2. Validate that when acting under the Immuta role those users have access to the table(s) in question.

      3. Validate that users without access in Immuta can still access the table with a different Snowflake role that has access.

      4. Validate that a user with enabled retains access if

        • they were not granted access by Immuta and

        • they have a role that provides them access, even if they are not currently acting under that role.

    3. .

    4. Validate that a user with a role that can access the table in question (whether it's an Immuta role or not) sees the impact of that data policy.

    5. Once all Immuta policies are in place, remove or alter old roles.

    a Snowflake integration
    Detect use case
    hashtag
    Prerequisite
    circle-info

    Skopeo installation

    This guide utilizes the skopeo command to copy container images; ensure it's installed before proceeding. Refer to the skopeo documentationarrow-up-right for further assistance.

    hashtag
    Checklist

    hashtag
    Skopeo

    hashtag
    Helm

    hashtag
    Download artifacts

    This section demonstrates how to download the Helm chart and container images to your local machine. These artifacts will be packaged and transferred to the air-gapped environment later.

    circle-info

    Upon completion of these steps, the saved artifacts can be found in local directory offline-kit.

    1. Create a directory named offline-kit.

    2. Download the Helm chart into directory offline-kit.

    3. Extract file DIGESTS.md from the Helm chart archive.

    4. Open file ./offline-kit/DIGESTS.md. This file includes the name and digest of every container image referenced by the Helm chart.

    5. Download each image listed in file DIGESTS.md using . Each image will be saved to directory offline-kit with the filename<name>-<tag>.tar.

    hashtag
    Transfer artifacts

    This section demonstrates how to push the previously archived container images to a private registry that's accessible from within your air-gapped environment.

    circle-info

    The exact process for transferring files into an air-gapped network can vary significantly depending on your specific security policies and infrastructure.

    1. Transfer directory offline-kit (created in the previous section) onto a machine that's within your air-gapped environment.

    2. Push each image to your private registry using skopeoarrow-up-right.

    hashtag
    Chart installation

    circle-info

    A Helm chart can be referenced from a local file path, instead of remotely if desired. It is not necessary to reference it remotely. When referring to documentation, substitute any references to oci://ocir.immuta.com/stable/immuta-enterprise with the path to the unarchived (.tgz) chart file.

    Edit the immuta-values.yaml to reference the private container registry and images.

    hashtag
    Kubernetes secrets

    hashtag
    Edit secrets

    circle-info

    Using an alternative editor

    Set environment variable KUBE_EDITOR to specify an alternative text editor.

    1. Validate that secret immuta-secret exists in the current namespace.

    2. Edit secret immuta-secret in place.

    3. Edit secret immuta-legacy-secret in place. Skip this step if the legacy query engine and fingerprint services are disabled (the default).

    4. Restart pods.

    hashtag
    Legacy query engine

    circle-info

    Considerations when using the legacy query engine

    The following section is only necessary if the legacy query engine and fingerprint services have been enabled.

    1. Validate that secret immuta-legacy-secret exists in the current namespace.

    2. Get the query engine replica count, this value will be referenced in subsequent step(s).

    3. Scale the replica count down to 1.

    4. Get the query engine pod name, this value will be referenced in subsequent step(s).

    5. Update the with a query engine superuser password.

    6. Update the with a query engine replication password.

    7. Update the with a query engine feature password.

    8. Scale the replica count back up to the previous value by updating the .

    hashtag
    Apply Helm values

    1. Update credentials in the immuta-values.yaml file.

    2. Perform a Helm upgradearrow-up-right to apply the changes made to immuta-values.yaml. Update the placeholder value with your own release name.

    .

    hashtag
    Databricks Trusted Libraries

    The trusted libraries feature allows Databricks cluster administrators to avoid Immuta security manager errors when using third-party libraries. An administrator can specify an installed library as "trusted," which will enable that library's code to bypass the Immuta security manager. Contact your Immuta support professional for custom security configurations for your libraries.

    This feature does not impact Immuta's ability to apply policies; trusting a library only allows code through what previously would have been blocked by the security manager.

    circle-exclamation

    Security vulnerability

    Using this feature could create a security vulnerability, depending on the third-party library. For example, if a library exposes a public method named readProtectedFile that displays the contents of a sensitive file, then trusting that library would allow end users access to that file. Work with your Immuta support professional to determine if the risk does not apply to your environment or use case.

    circle-exclamation

    Databricks Libraries API: Installing trusted libraries outside of the Databricks Libraries API (e.g., ADD JAR ...) is not supported.

    The following types of libraries are supported when installing a third-party library using the Databricks UI or the Databricks Libraries API:

    • Library source is Upload, DBFS or DBFS/S3 and the Library Type is Jar.

    • Library source is Maven.

    hashtag
    Limitations

    • Databricks installs libraries right after a cluster has started, but there is no guarantee that library installation will complete before a user's code is executed. If a user executes code before a trusted library installation has completed, Immuta will not be able to identify the library as trusted. This can be solved by either

      • waiting for library installation to complete before running any third-party library commands or

      • executing a Spark query. This will force Immuta to wait for any trusted Immuta libraries to complete installation before proceeding.

    • When installing a library using Maven as a library source, Databricks will also install any transitive dependencies for the library. However, those transitive dependencies are installed behind the scenes and will not appear as installed libraries in either the Databricks UI or using the Databricks Libraries API. Only libraries specifically listed in the IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URIS environment variable will be trusted by Immuta, which does not include installed transitive dependencies. This effectively means that any code paths that include a class from a transitive dependency but do not include a class from a trusted third-party library can still be blocked by the Immuta security manager. For example, if a user installs a trusted third-party library that has a transitive dependency of a file-util library, the user will not be able to directly use the file-util library to read a sensitive file that is normally protected by the Immuta security manager.

      In many cases, it is not a problem if dependent libraries aren't trusted because code paths where the trusted library calls down into dependent libraries will still be trusted. However, if the dependent library needs to be trusted, there is a workaround:

    hashtag
    Troubleshooting

    In case of failure, check the driver logs for details. Some possible causes of failure include

    • One of the Immuta configured trusted library URIs does not point to a Databricks library. Check that you have configured the correct URI for the Databricks library.

    • For trusted Maven artifacts, the URI must follow this format: maven:/group.id:artifact-id:version.

    • Databricks failed to install a library. Any Databricks library installation errors will appear in the Databricks UI under the Libraries tab.

    hashtag
    Configuration

    For details about configuring trusted libraries, navigate to the installation guide.

    hashtag
    Notebook-Scoped Libraries on Machine Learning Clusters

    Users on Databricks runtimes 8+ can manage notebook-scoped libraries with %pip commandsarrow-up-right.

    However, this functionality differs from Immuta's trusted libraries feature, and Python libraries are still not supported as trusted libraries. The Immuta Security Manager will deny the code of libraries installed with %pip access to sensitive resources.

    hashtag
    Configuration

    No additional configuration is needed to enable this feature. Users only need to be running on clusters with DBR 8+.

    Databricks Trusted Libraries
    Notebook-Scoped Libraries on Machine Learning Clusters
    trusted by Immuta

    Enabling Legacy Query Engine and Fingerprint

    The query engine and fingerprint services are no longer installed by default. This guide demonstrates how to enable the query engine and fingerprint services using the Immuta Enterprise Helm chart (IEHC).

    If you are using any of the data platforms below, you must enable the query engine:

    • Amazon Redshift

    • Azure Synapse Analytics

    • Google BigQuery

    If you are using the legacy sensitive data discovery (SDD) feature, you must enable the query engine and fingerprint services.

    circle-info

    Kubernetes namespace

    The following section(s) presume the IEHC was deployed into namespace immuta, and that the current namespace is immuta.

    hashtag
    Prerequisites

    circle-info

    When migrating from the IHC to IEHC, query engine state is not retained. You must enable query engine rehydration to restore existing data source tables. If SQL credentials are used, they must be recreated by using LDAP sync or manually with the following command executed in the bometadata database:

    TRUNCATE bometadata."profile-sql";

    • The guide must be completed before proceeding.

    • Validate that secret immuta-secret exists in the current namespace.

    hashtag
    Create Kubernetes secret

    1. Create a file named secret-data.env with the following content.

    2. Create secret named immuta-legacy-secret from file secret-data.env

    3. Delete file secret-data.env

    hashtag
    Edit Helm values

    1. Edit the immuta-values.yaml file to include the following Helm values.

    2. Update all in the immuta-values.yaml file.

    circle-exclamation

    Avoid these special characters in generated passwords

    whitespace, $, &, :, \, /, '

    hashtag
    Apply Helm values

    Perform a to apply the changes made to immuta-values.yaml.

    Upgrade to Immuta 2024.2 LTS

    This guide demonstrates how to upgrade an existing Immuta deployment installed with the Immuta Helm chart (IHC) to the latest LTS release using the newer Immuta Enterprise Helm chart (IEHC).

    circle-exclamation

    Helm chart deprecation notice

    As of Immuta version 2024.2, the IHC has been deprecated in favor of the IEHC. The immuta-values.yaml Helm values files are not cross-compatible.

    hashtag
    Prerequisites

    hashtag
    Create a PostgreSQL database

    1. The PostgreSQL instance has been provisioned and is actively running.

    2. The PostgreSQL instance's hostname/FQDN is .

    3. The PostgreSQL instance is .

    For additional information, consult the Deployment requirements.

    hashtag
    Validate the Helm release

    1. Fetch the metadata for the Helm release associated with Immuta.

    2. Review the output from the previous step and verify the following:

      • The Immuta version (appVersion) is

    hashtag
    Metadata database

    circle-exclamation

    Azure prerequisites

    If using an Azure flexible server as an external database source, ensure that the pgcrypto server extension is enabled in the Azure server parameters UI before proceeding.

    The new IEHC no longer supports deploying a Metadata database (PostgreSQL) inside the Kubernetes cluster. Before transitioning to the new IEHC, it's first necessary to externalize the Metadata database.

    hashtag
    Built-in

    The following demonstrates how to take a database backup and import the data into each cloud provider's managed PostgreSQL service.

    hashtag
    Create backup of old database

    1. Get the metadata database pod name.

    2. Spawn a shell inside the running metadata database pod.

    3. Perform a database backup.

    4. Type

    hashtag
    Setup new database

    1. Create a pod named immuta-setup-db and spawn a shell.

    2. Press enter when the prompt appears and connect to the new PostgreSQL database as a superuser. Depending on the cloud provider, the default superuser name (postgres) might differ.

    hashtag
    Restore backup to new database

    1. Create a pod named immuta-restore-db and spawn a shell.

    2. Copy file bometadata.dump from the host's working directory to pod immuta-restore-db.

    3. Spawn a shell inside pod immuta-restore-db

    hashtag
    External

    No additional work is required. The existing database can be reused with the new IEHC.

    hashtag
    Helm values

    circle-info

    Helm values file compatibility

    The immuta-values.yaml Helm values file used by the IHC is not compatible with the new IEHC.

    1. Rename the existing immuta-values.yaml Helm values file used by the IHC.

    2. Legacy audit records: If you want to be able to view audit records from before the 2024.2 upgrade, set FeatureFlag_auditLegacyViewHide to false in your Helm values file.

    Snowflake Lineage Tag Propagation

    circle-info

    Private preview: This feature is only available to select accounts. Reach out to your Immuta representative to enable this feature.

    Snowflake column lineage specifies how data flows from source tables or columns to the target tables in write operations. When Snowflake lineage tag propagation is enabled in Immuta, Immuta automatically applies tags added to a Snowflake table to its descendant data source columns in Immuta so you can build policies using those tags to restrict access to sensitive data.

    Snowflake Access History tracks user read and write operations. Snowflake column lineage extends this Access History to specify how data flows from source columns to the target columns in write operations, allowing data stewards to understand how sensitive data moves from ancestor tables to target tables so that they can

    • trace data back to its source to validate the integrity of dashboards and reports,

    • identify who performed write operations to meet compliance requirements,

    • evaluate data quality and pinpoint points of failure, and

    • tag sensitive data on source tables without having tag columns on their descendant tables.

    However, tagging sensitive data doesn’t innately protect that data in Snowflake; users need Immuta to disseminate these lineage tags automatically to descendant tables registered in Immuta so data stewards can build policies using the semantic and business context captured by those tags to restrict access to sensitive data. When Snowflake lineage tag propagation is enabled, Immuta propagates tags applied to a data source to its descendant data source columns in Immuta, which keeps your data inventory in Immuta up-to-date and allows you to protect your data with policies without having to manually tag every new Snowflake data source you register in Immuta.

    hashtag
    Data flow

    1. An application administrator enables the feature on the Immuta app settings page.

    2. Snowflake lineage metadata (column names and tags) for the Snowflake tables is stored in the metadata database.

    3. A data owner creates a new data source (or adds a new column to a Snowflake table) that initiates a job that applies all tags for each column from its ancestor columns.

    hashtag
    Snowflake access history view and Immuta lineage job

    The Snowflake Account Usage ACCESS_HISTORY view contains column lineage information.

    To appropriately propagate tags to descendant data sources, Immuta fetches Access History metadata to determine what column tags have been updated, stores this metadata in the Immuta metadata database, and then applies those tags to relevant descendant columns of tables registered in Immuta.

    Consider the following example using the Customer, Customer 2, and Customer 3 tables that were all registered in Immuta as data sources.

    • Customer: source table

    • Customer 2: descendant of Customer

    • Customer 3: descendant of Customer 2

    If the Discovered.Electronic Mail Address tag is added to the Customer data source in Immuta, that tag will propagate through lineage to the Customer 2 and Customer 3 data sources.

    hashtag
    Data source registration

    After an application administrator has enabled Snowflake lineage tag propagation, data owners can register data in Immuta and have tags in Snowflake propagated from ancestor tables to descendant data sources. Whenever new tags are added to those tables in Immuta, those upstream tags will propagate to descendant data sources.

    By default all tags are propagated, but these tags can be filtered on the app settings page or using the Immuta API.

    hashtag
    Managing tags

    Lineage tag propagation works with any tag added to the data dictionary. Tags can be manually added, synced from an external catalog, or discovered by SDD. Consider the following example using the Customer, Customer 2, and Customer 3 tables that were all registered in Immuta as data sources.

    • Customer: source table

    • Customer 2: descendant of Customer

    • Customer 3: descendant of Customer 2

    Immuta added the Discovered.Electronic Mail Address tag to the Customer data source, and that tag propagated through lineage to the Customer 2 and Customer 3 data sources.

    Removing the tag from the Customer 2 table soft deletes it from the Customer 2 data source. When a tag is deleted, downstream lineage tags are removed, unless another parent data source still has that tag. The tag remains visible, but it will not be re-added if a future propagation event specifies the same tag again. Immuta prevents you from removing Snowflake object tags from data sources. You can only remove Immuta-managed tags. To remove Snowflake object tags from tables, you must remove them in Snowflake.

    However the Discovered.Electronic Mail Address tag still applies to the Customer 3 data source because Customer still has the tag applied. The only way a tag will be removed from descendant data sources is if no other ancestor of the descendant still prescribes the tag.

    If the Snowflake lineage tag propagation feature is disabled, tags will remain on Immuta data sources.

    hashtag
    Sensitive data discovery

    will still run on data sources and can be manually triggered. Tags applied through sensitive data discovery will propagate as tags added through lineage to descendant Immuta data sources.

    hashtag
    Snowflake lineage audit

    Immuta audit records include Snowflake lineage tag events when a tag is added or removed.

    The example audit record below illustrates the SNOWFLAKE_TAGS.pii tag successfully propagating from the Customer table to Customer 2:

    hashtag
    Limitations

    • Without tableFilter set, Immuta will ingest lineage for every table on the Snowflake instance.

    • Tag propagation based on lineage is not retroactive. For example, if you add a table, add tags to that table, and then run the lineage ingestion job, tags will not get propagated. However, if you add a table, run the lineage ingestion job, and then add tags to the table, the tags will get propagated.

    • The lineage job needs to pull in lineage data before any tag is applied in Immuta. When Immuta gets new lineage information from Snowflake, Immuta does not update existing tags in Immuta.

    Sparklyr

    circle-info

    Single-user clusters recommended

    Like Databricks, Immuta recommends single-user clusters for sparklyr when user isolation is required. A single-user cluster can either be a job cluster or a cluster with credential passthrough enabled. Note: spark-submit jobs are not currently supported.

    Two cluster types can be configured with sparklyr: Single-User Clusters (recommended) and Multi-User Clusters (discouraged).

    • : Credential Passthrough (required on Databricks) allows a single-user cluster to be created. This setting automatically configures the cluster to assume the role of the attached user when reading from storage. Because Immuta requires that raw data is readable by the cluster, the instance profile associated with the cluster should be used rather than a role assigned to the attached user.

    • : Because Immuta cannot guarantee user isolation in a multi-user sparklyr cluster, it is not recommended to deploy a multi-user cluster. To force all users to act under the same set of attributes, groups, and purposes with respect to their data access and eliminate the risk of a data leak, all sparklyr multi-user clusters must be equalized either by convention (all users able to attach to the cluster have the same level of data access in Immuta) or by configuration (detailed below).

    hashtag
    Single-User Cluster Configuration

    hashtag
    1 - Enable sparklyr

    In addition to the configuration for an Immuta cluster with R, add this environment variable to the Environment Variables section of the cluster:

    This configuration makes changes to the iptables rules on the cluster to allow the sparklyr client to connect to the required ports on the JVM used by the sparklyr backend service.

    hashtag
    2 - Set Up a sparklyr Connection in Databricks

    1. Install and load libraries into a notebook. Databricks includes the stable version of sparklyr, so library(sparklyr) in an R notebook is sufficient, but you may opt to install the latest version of sparklyr from CRAN. Additionally, loading library(DBI) will allow you to execute SQL queries.

    2. Set up a sparklyr connection:

    hashtag
    3 - Configure a Single-User Cluster

    Add the following items to the Spark Config section of the cluster:

    The trustedFileSystems setting is required to allow Immuta’s wrapper FileSystem (used in conjunction with the ImmutaSecurityManager for data security purposes) to be used with credential passthrough. Additionally, the InstanceProfileCredentialsProvider must be configured to continue using the cluster’s instance profile for data access, rather than a role associated with the attached user.

    hashtag
    Multi-User Cluster Configuration

    circle-exclamation

    Avoid deploying multi-user clusters with sparklyr configuration

    It is possible, but not recommended, to deploy a multi-user cluster sparklyr configuration. Immuta cannot guarantee user isolation in a multi-user sparklyr configuration.

    The configurations in this section enable sparklyr, require project equalization, map sparklyr sessions to the correct Immuta user, and prevent users from accessing Immuta project workspaces.

    1. Add the following environment variables to the Environment Variables section of your cluster configuration:

    2. Add the following items to the Spark Config section:

    hashtag
    Limitations

    Immuta’s integration with sparklyr does not currently support

    • spark-submit jobs,

    • UDFs, or

    • Databricks Runtimes 5, 6, or 7.

    Getting Started

    The how-to guides linked on this page illustrate how to integrate Databricks Unity Catalog with Immuta to secure your data with governance policies, discover what data types and sensitive data should be secured, and observe your users' activity to ensure risky user access is caught and addressed.

    Requirements:

    • Unity Catalog metastore createdarrow-up-right and attached to a Databricks workspace. Immuta supports configuring a single metastore for each configured integration, and that metastore may be attached to multiple Databricks workspaces.

    • Unity Catalog enabled on your Databricks cluster or SQL warehouse. All SQL warehouses have Unity Catalog enabled if your workspace is attached to a Unity Catalog metastore.

    hashtag
    Configure your Databricks Unity Catalog integration

    Configuring is required for Detect, Discover, and Secure. These guides provide information on the recommended features to enable with Databricks Unity Catalog, or see the for a comprehensive guide on the benefits of these features and other recommendations.

    1. with the following feature enabled: (enabled by default)

    2. Select None as your .

    3. .

    hashtag

    These guides provide step-by-step instructions for auditing and detecting your users' activity, or see the for a comprehensive guide on the benefits of these features and other recommendations.

    1. or for your .

    2. .

    hashtag

    These guides provide step-by-step instructions for discovering, classifying, and tagging your data.

    1. .

    2. to configure and validate SDD.

    3. to discover entities of interest for your policy needs.

    hashtag

    These guides provide step-by-step instructions for configuring and securing your data with governance policies, or see the for a comprehensive guide on creating policies to fit your organization's use case.

    1. .

    2. .

    3. Validate the policies. You do not have to validate every policy you create in Immuta; instead, examine a few to validate the behavior you expect to see.

    Databricks Metastore Magic

    Databricks metastore magic allows you to migrate your data from the Databricks legacy Hive metastore to the Unity Catalog metastore while protecting data and maintaining your current processes in a single Immuta instance.

    Databricks metastore magic is for customers who intend to use the Databricks Unity Catalog integration, but they would like to protect tables in the Hive metastore.

    hashtag
    Requirement

    Unity Catalog support is enabled in Immuta.

    hashtag
    Databricks metastores and Immuta policy enforcement

    Databricks has two built-in metastores that contain metadata about your tables, views, and storage credentials:

    • Legacy Hive metastore: Created at the workspace level. This metastore contains metadata of the configured tables in that workspace available to query.

    • Unity Catalog metastore: Created at the account level and is attached to one or more Databricks workspaces. This metastore contains metadata of the configured tables available to query. All clusters on that workspace use the configured metastore and all workspaces that are configured to use a single metastore share those tables.

    Databricks allows you to use the legacy Hive metastore and the Unity Catalog metastore simultaneously. However, Unity Catalog does not support controls on the Hive metastore, so you must attach a Unity Catalog metastore to your workspace and move existing databases and tables to the attached Unity Catalog metastore to use the governance capabilities of Unity Catalog.

    Immuta's Databricks Spark integration and Unity Catalog integration enforce access controls on the Hive and Unity Catalog metastores, respectively. However, because these metastores have two distinct security models, users were discouraged from using both in a single Immuta instance before metastore magic; the Databricks Spark integration and Unity Catalog integration were unaware of each other, so using both concurrently caused undefined behavior.

    hashtag
    Databricks metastore magic solution

    Metastore magic reconciles the distinct security models of the legacy Hive metastore and the Unity Catalog metastore, allowing you to use multiple metastores (specifically, the Hive metastore or alongside Unity Catalog metastores) within a Databricks workspace and single Immuta instance and keep policies enforced on all your tables as you migrate them. The diagram below shows Immuta enforcing policies on registered tables across workspaces.

    In clusters A and D, Immuta enforces policies on data sources in each workspace's Hive metastore and in the Unity Catalog metastore shared by those workspaces. In clusters B, C, and E (which don't have Unity Catalog enabled in Databricks), Immuta enforces policies on data sources in the Hive metastores for each workspace.

    hashtag
    Enforce policies as you migrate

    With metastore magic, the Databricks Spark integration enforces policies only on data in the Hive metastore, while the Unity Catalog integration enforces policies on tables in the Unity Catalog metastore. The table below illustrates this policy enforcement.

    Table location
    Databricks Spark integration
    Databricks Unity Catalog integration

    To enforce plugin-based policies on Hive metastore tables and Unity Catalog controls on Unity Catalog metastore tables, enable the and the Databricks Unity Catalog integration. Note that some Immuta policies are not supported in the Databricks Unity Catalog integration. See the for details.

    hashtag
    Enforcing policies on Databricks SQL

    Databricks SQL cannot run the Databricks Spark plugin to protect tables, so Hive metastore data sources will not be policy enforced in Databricks SQL.

    To enforce policies on data sources in Databricks SQL, use to manually lock down Hive metastore data sources and the Databricks Unity Catalog integration to protect tables in the Unity Catalog metastore. Table access control is enabled by default on SQL warehouses, and any Databricks cluster without the Immuta plugin must have table access control enabled.

    hashtag
    Supported Databricks cluster configurations

    The table below outlines the integrations supported for various Databricks cluster configurations. For example, the only integration available to enforce policies on a cluster configured to run on Databricks Runtime 9.1 is the Databricks Spark integration.

    Example cluster
    Databricks Runtime
    Unity Catalog in Databricks
    Databricks Spark integration
    Databricks Unity Catalog integration

    Legend:

    • ✅ The feature or integration is enabled.

    • ⛔ The feature or integration is disabled.

    Run spark-submit Jobs on Databricks

    This guide illustrates how to run R and Scala spark-submit jobs on Databricks, including prerequisites and caveats.

    circle-info

    Language support: R and Scala are supported, but require advanced configuration; work with your Immuta support professional to use these languages. Python spark-submit jobs are not supported by the Databricks Spark integration.

    circle-info

    Using R in a notebook: Because of how some user properties are populated in Databricks, users should load the SparkR library in a separate cell before attempting to use any SparkR functions.

    hashtag
    R spark-submit

    hashtag
    Prerequisites

    Before you can run spark-submit jobs on Databricks you must initialize the Spark session with the settings outlined below.

    1. Initialize the Spark session by entering these settings into the R submit script immuta.spark.acl.assume.not.privileged="true" and spark.hadoop.immuta.databricks.config.update.service.enabled="false".

      This will enable the R script to access Immuta data sources, scratch paths, and workspace tables.

    2. Once the script is written, upload the script to a location in dbfs/S3/ABFS to give the Databricks cluster access to it.

    hashtag
    Create the R spark submit Job

    To create the R spark-submit job,

    1. Go to the Databricks jobs page.

    2. Create a new job, and select Configure spark-submit.

    3. Set up the parameters:

      Note: The path dbfs:/path/to/script.R can be in S3 or ABFS (on Azure Databricks), assuming the cluster is configured with access to that path.

    hashtag
    Scala spark-submit

    hashtag
    Prerequisites

    Before you can run spark-submit jobs on Databricks you must initialize the Spark session with the settings outlined below.

    1. Configure the Spark session with immuta.spark.acl.assume.not.privileged="true" and spark.hadoop.immuta.databricks.config.update.service.enabled="false".

      Note: Stop your Spark session (spark.stop()) at the end of your job or the cluster will not terminate.

    2. The spark submit job needs to be launched using a different classloader which will point at the designated user JARs directory. The following Scala template can be used to handle launching your submit code using a separate classloader:

    hashtag
    Create the Scala spark-submit Job

    To create the Scala spark-submit job,

    1. Build and upload your JAR to dbfs/S3/ABFS where the cluster has access to it.

    2. Select Configure spark-submit, and configure the parameters:

      Note: The fully-qualified class name of the class whose main function will be used as the entry point for your code in the --class parameter.

    hashtag
    Caveats

    • The user mapping works differently from notebooks because spark-submit clusters are not configured with access to the Databricks SCIM API. The cluster tags are read to get the cluster creator and match that user to an Immuta user.

    • Privileged users (Databricks Admins and Whitelisted Users) must be tied to an Immuta user and given access through Immuta to access data through spark-submit jobs because the setting immuta.spark.acl.assume.not.privileged="true" is used.

    Getting Started

    The how-to guides linked on this page illustrate how to integrate Starburst (Trino) with Immuta.

    hashtag
    Configure your Starburst (Trino) integration

    Configuring a Starburst (Trino) integration is required for Secure. These guides provide information on the recommended features to enable with Starburst (Trino).

    1. .

    2. Select None as your .

    3. .

    4. .

    hashtag

    These guides provide step-by-step instructions for auditing and detecting your users' activity, or see the for a comprehensive guide on the benefits of these features and other recommendations.

    1. or for your .

    2. .

    hashtag

    circle-info

    Public preview: SDD for Starburst (Trino) is currently in public preview and available to all accounts.

    These guides provide step-by-step instructions for discovering, classifying, and tagging your data.

    1. .

    2. to configure and validate SDD.

    3. to discover entities of interest for your policy needs.

    hashtag

    These guides provide step-by-step instructions for configuring and securing your data with governance policies, or see the for a comprehensive guide on creating policies to fit your organization's use case.

    1. .

    2. .

    3. Validate the policies. You do not have to validate every policy you create in Immuta; instead, examine a few to validate the behavior you expect to see.

    Configure Redshift Spectrum

    Allow Immuta to create secure views of your external tables through one of these methods:

    • Configure the integration with an existing database that contains the external tables: Instead of creating an immuta database that manages all schemas and views created when Redshift data is registered in Immuta, the integration adds the Immuta-managed schemas and views to an existing database in Redshift

    • Configure the integration by creating a new immuta database and re-create all of your external tables in that database.

    For an overview of the integration, see the documentation.

    hashtag
    Requirements

    • A Redshift cluster with an AWS row-level security patch applied. for guidance.

    • that is .

    • The must be set to false (default setting) for your Redshift cluster.

    hashtag
    Use an existing database

    1. Click the App Settings icon in the left sidebar.

    2. Click Integrations in the left panel.

    3. Click the +Add Integration button and select Redshift from the dropdown menu.

    hashtag
    Register data

    .

    hashtag
    Create a new Immuta database

    1. Click the App Settings icon in the left sidebar.

    2. Click Integrations in the left panel.

    3. Click the +Add Integration button and select Redshift from the dropdown menu.

    Then, add your external tables to the Immuta database.

    hashtag
    Register data

    .

    Databricks Spark Integration

    This page provides an overview of the Databricks integration. For installation instructions, see the .

    hashtag
    Overview

    Databricks is a plugin integration with Immuta. This integration allows you to protect access to tables and manage row-, column-, and cell-level controls without enabling table ACLs or credential passthrough. Policies are applied to the plan that Spark builds for a user's query and enforced live on-cluster.

    Configure Redshift Integration

    This page illustrates how to configure the on the Immuta app settings page. To configure this integration via the Immuta API, see the .

    For instructions on configuring Redshift Spectrum, see the guide.

    hashtag
    Requirements

    mkdir ./offline-kit
    helm pull oci://ocir.immuta.com/stable/immuta-enterprise --destination ./offline-kit --version 2024.3.9
    tar --extract --gzip --strip-components=1 --directory=./offline-kit --file=./immuta-enterprise-*.tgz immuta-enterprise/DIGESTS.md
    skopeo copy docker-archive:offline-kit/<name>-<tag>.tar docker://<private-registry-fqdn>/immuta/<name>:<tag>
    export IMMUTA_VERSION=2024.2.20
    export IMMUTA_IMAGES="audit-service audit-export-cronjob cache classify-service immuta-service"
    export IMMUTA_LEGACY_IMAGES="immuta-db immuta-fingerprint"
    for image in ${IMMUTA_IMAGES} ${IMMUTA_LEGACY_IMAGES}; do
      skopeo copy docker://ocir.immuta.com/stable/${image}:${IMMUTA_VERSION} docker-archive://${PWD}/${image}-${IMMUTA_VERSION}.tar;
    done
    echo <token> | helm registry login --password-stdin --username <username> ocir.immuta.com
    helm pull oci://ocir.immuta.com/stable/immuta-enterprise --version 2024.2.20
    export PRIVATE_REGISTRY=your.private-registry.com
    export IMMUTA_VERSION=2024.2.20
    export IMMUTA_IMAGES="audit-service audit-export-cronjob cache classify-service immuta-service"
    export IMMUTA_LEGACY_IMAGES="immuta-db immuta-fingerprint"
    for image in ${IMMUTA_IMAGES} ${IMMUTA_LEGACY_IMAGES}; do
      skopeo copy docker-archive://${PWD}/${image}-${IMMUTA_VERSION}.tar docker://${PRIVATE_REGISTRY}/immuta/${image}:${IMMUTA_VERSION};
    done
    helm upgrade --install immuta ./immuta-enterprise-2024.2.20.tgz -f immuta-values.yaml
    kubectl get secret/immuta-secret
    kubectl edit secret/immuta-secret
    kubectl get secret/immuta-legacy-secret
    kubectl get statefulset --selector "app.kubernetes.io/component=query-engine" --output name
    kubectl scale statefulset --all --replicas 1 --selector "app.kubernetes.io/component=query-engine"
    helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.20
    Snowflake query audit
    default subscription policy
    Integrate an IAM with Immuta
    Map external user IDs from Snowflake to Immuta
    Validate that the SDD tags are applied correctly
    schema level
    schema monitoring turned on
    Implement classification to categorize and tag sensitive data
    SECONDARY ROLES ALLarrow-up-right
    Create a global data policy
    Map external user IDs from Unity Catalog to Immuta.
    Validate that the SDD tags are applied correctly.
  • Register your remaining tables at the schema level with schema monitoring turned on.

  • Implement classification to categorize and tag sensitive data.

  • Once all Immuta policies are in place, remove or alter old permissions and revoke access to the ungoverned tables.
    a Databricks Unity Catalog integration
    Detect use case
    Configure your Unity Catalog integration
    Query audit
    default subscription policy
    Integrate an IAM with Immuta
    Detect your user activity
    Detect use case
    Set up audit export to S3
    ADLS Gen2
    Databricks Unity Catalog audit logs
    View the Detect dashboards to see the activity of your users on Databricks Unity Catalog tables
    Discover your data
    Enable sensitive data discovery (SDD)
    Register a subset of your tables
    Configure SDD
    Secure your data
    Secure use cases
    Create a global subscription policy
    Create a global data policy
    Validate that the SDD tags are applied correctly.
  • Register your remaining tables at the schema level with schema monitoring turned on.

  • Implement classification to categorize and tag sensitive data.

  • Once all Immuta policies are in place, remove or alter old permissions and revoke access to the ungoverned tables.
    Configure your Starburst (Trino) integration
    default subscription policy
    Integrate an IAM with Immuta
    Map external user IDs from Starburst (Trino) to Immuta
    Detect your user activity
    Detect use case
    Set up audit export to S3
    ADLS Gen2
    Starburst (Trino) audit logs
    View the Detect dashboards to see the activity of your registered users on registered tables
    Discover your data
    Enable sensitive data discovery (SDD)
    Register a subset of your tables
    Configure SDD
    Secure your data
    Secure use cases
    Create a global subscription policy
    Create a global data policy
    A Redshift cluster with an RA3 node is required for the multi-database integration. You must use a Redshift RA3 instance type because Immuta requires cross-database views, which are only supported in Redshift RA3 instance types. For other instance types, you may configure a single-database integration using one of the Redshift Spectrum options.
  • For automated installations, the credentials provided must be a Superuser or have the ability to create databases and users and modify grants.

  • The enable_case_sensitive_identifier parameterarrow-up-right must be set to false (default setting) for your Redshift cluster.

  • hashtag
    Add a Redshift integration

    1. Click the App Settings icon in the left sidebar.

    2. Click the Integrations tab.

    3. Click the +Add Integration button and select Redshift from the dropdown menu.

    4. Complete the Host and Port fields.

    5. Enter an Immuta Database. This is a new database where all secure schemas and Immuta created views will be stored.

    6. Opt to check the Enable Impersonation box and customize the Impersonation Role name as needed. This will allow users to natively impersonate another user.

    hashtag
    Select your configuration method

    You have two options for configuring your Redshift environment:

    • Automatic setup: Grant Immuta one-time use of credentials to automatically configure your Redshift environment and the integration.

    • Manual setup: Run the Immuta script in your Redshift environment yourself to configure your environment and the integration.

    hashtag
    Automatic setup

    circle-info

    Immuta requires temporary, one-time use of credentials with specific privileges

    When performing an automated installation, Immuta requires temporary, one-time use of credentials with the following privileges:

    • CREATE DATABASE

    • CREATE USER

    • REVOKE ALL PRIVILEGES ON DATABASE

    • GRANT TEMP ON DATABASE

    • MANAGE GRANTS ON ACCOUNT

    These privileges will be used to create and configure a new IMMUTA database within the specified Redshift instance. The credentials are not stored or saved by Immuta, and Immuta doesn’t retain access to them after initial setup is complete.

    You can create a new account for Immuta to use that has these privileges, or you can grant temporary use of a pre-existing account. By default, the pre-existing account with appropriate privileges is a Superuser. If you create a new account, it can be deleted after initial setup is complete.

    Alternatively, you can create the IMMUTA database within the specified Redshift instance without giving Immuta user credentials for a Superuser using the manual setup option.

    1. Select Automatic.

    2. Enter an Initial Database from your Redshift integration for Immuta to use to connect.

    3. Use the dropdown menu to select your Authentication Method.

      1. Username and Password: Enter the Username and Password of the privileged user.

      2. AWS Access Key: Enter the Database User, Access Key ID, and Secret Key. Opt to enter in the Session Token.

    hashtag
    Manual setup

    circle-info

    Required privileges

    The specified role used to run the bootstrap needs to have the following privileges:

    • CREATE DATABASE

    • CREATE USER

    • REVOKE ALL PRIVILEGES ON DATABASE

    • GRANT TEMP ON DATABASE

    • MANAGE GRANTS ON ACCOUNT

    1. Select Manual and download both of the bootstrap scripts from the Setup section.

    2. Run the bootstrap script (initial database) in the Redshift initial database.

    3. Run the bootstrap script (Immuta database) in the new Immuta Database in Redshift.

    4. Choose your authentication method, and enter the information of the newly created account.

    hashtag
    Save the configuration

    Click Save.

    hashtag
    Register data

    Register Redshift data in Immuta.

    hashtag
    Edit a Redshift integration

    circle-info

    Required privileges

    When performing edits to an integration, Immuta requires temporary, one-time use of credentials of a Superuser or a user with the following permissions:

    • Create Databases

    • Create users

    • Modify grants

    Alternatively, you can download the Edit Script from your Redshift configuration on the Immuta app settings page and run it in Redshift.

    1. Click the App Settings icon in the left sidebar.

    2. Navigate to the Integrations tab and click the down arrow next to the Redshift Integration.

    3. Edit the field you want to change. Note any field shadowed is not editable, and the integration must be disabled and re-installed to change it.

    4. Enter Username and Password.

    5. Click Save.

    hashtag
    Remove a Redshift integration

    circle-exclamation

    Disabling Redshift Spectrum

    Disabling the Redshift integration is not supported when you set the fields nativeWorkspaceName, nativeViewName, and nativeSchemaName to create Redshift Spectrum data sources. Disabling the integration when these fields are used in metadata ingestion causes undefined behavior.

    1. Click the App Settings icon in the left sidebar.

    2. Navigate to the Integrations tab and click the down arrow next to the Redshift Integration.

    3. Click the checkbox to disable the integration.

    4. Enter the username and password that were used to initially configure the integration.

    5. Click Save.

    Redshift integration
    Integrations API getting started guide
    Redshift Spectrum

    Cluster 3

    11.3

    ⛔

    ✅ / ⛔

    Unavailable

    Cluster 4

    11.3

    ✅

    ⛔

    ⛔

    Cluster 5

    11.3

    ✅

    ✅

    ✅

    Hive metastore

    ✅

    ❌

    Unity Catalog metastore

    ❌

    ✅

    Cluster 1

    9.1

    Unavailable

    ✅

    Unavailable

    Cluster 2

    10.4

    Unavailable

    ✅

    AWS Glue Data Catalogarrow-up-right
    Databricks Spark integration
    Databricks Unity Catalog integration reference guide
    Hive metastore table access controlsarrow-up-right

    Unavailable

    skopeoarrow-up-right
    placeholder value
    placeholder value
    placeholder value
    placeholder value
    A data owner or governor adds a tag to a column in Immuta that has descendants, which initiates a job that propagates the tag to all descendants.
  • An audit record is created that includes which tags were applied and from which columns those tags originated.

  • There can be up to a 3-hour delay in Snowflake for a lineage event to make it into the ACCESS_HISTORY view.

  • Immuta does not ingest lineage information for views.

  • Snowflake only captures lineage events for CTAS, CLONE, MERGE, and INSERT write operations. Snowflake does not capture lineage events for DROP, RENAME, ADD, or SWAP. Instead of using these latter operations, you need to recreate a table with the same name if you need to make changes.

  • Immuta cannot enforce coherence of your Snowflake lineage. If a column, table, or schema in the middle of the lineage graph gets dropped, Immuta will not do anything unless a table with that same name gets recreated. This means a table that gets dropped but not recreated could live in Immuta’s system indefinitely.

  • Sensitive data discovery
  • Edit the cluster configuration, and change the Databricks Runtime to be a supported version.

  • Configure the Environment Variables section as you normally would for an Immuta cluster.

  • Note: The path dbfs:/path/to/code.jar can be in S3 or ABFS (on Azure Databricks) assuming the cluster is configured with access to that path.
  • Edit the cluster configuration, and change the Databricks Runtime to a supported version.

  • Include IMMUTA_INIT_ADDITIONAL_JARS_URI=dbfs:/path/to/code.jar in the "Environment Variables" (where dbfs:/path/to/code.jar is the path to your jar) so that the jar is uploaded to all the cluster nodes.

  • There is an option of using the immuta.api.key setting with an Immuta API key generated on the Immuta profile page.
  • Currently when an API key is generated it invalidates the previous key. This can cause issues if a user is using multiple clusters in parallel, since each cluster will generate a new API key for that Immuta user. To avoid these issues, manually generate the API key in Immuta and set the immuta.api.key on all the clusters or use a specified job user for the submit job.

  • Add the transitive dependency jar paths to the IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URIS environment variable. In the driver log4j logs, Databricks outputs the source jar locations when it installs transitive dependencies. In the cluster driver logs, look for a log message similar to the following:
  • In the above example, where slf4j is the transitive dependency, you would add the path dbfs:/FileStore/jars/maven/org/slf4j/slf4j-api-1.7.25.jar to the IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URIS environment variable and restart your cluster.

  • , as it's no longer needed.
    Any legacy database
    Immuta in production
    placeholder values
    Helm upgradearrow-up-right
    Pass the connection object to execute queries:
    Single-User Clusters
    Multi-User Clusters
    read -r -p "Enter the container image to download (e.g., docker.io/hello-world:latest):" image && \
    skopeo copy docker://"$image" docker-archive:"offline-kit/$(sed 's#.*/##; s#:#-#g' <<< "$image").tar"
    kubectl edit secret/immuta-legacy-secret
    kubectl rollout restart deployment --all --selector "app.kubernetes.io/component=audit,app.kubernetes.io/component=secure"
    kubectl get pod --selector "app.kubernetes.io/component=query-engine"
    kubectl exec pod/<query-engine-pod-name> -- \
        psql -d immuta -c \
        "ALTER USER postgres WITH ENCRYPTED PASSWORD '<new-patroni-superuser-password>'"
    kubectl exec pod/<query-engine-pod-name> -- \
        psql -d immuta -c \
        "ALTER USER replicator WITH ENCRYPTED PASSWORD '<new-patroni-replication-password>'"
    kubectl exec pod/<query-engine-pod-name> -- \
        psql -d immuta -c \
        "ALTER USER feature_service WITH ENCRYPTED PASSWORD '<new-immuta-feature-password>'"
    kubectl scale statefulset --all --replicas <query-engine-previous-replica-count> --selector "app.kubernetes.io/component=query-engine"
    {
      "id": "c8e020cb-232c-4ba9-a0d8-f3a84ba6808d",
      "dateTime": "1670355170336",
      "month": 1475,
      "profileId": 1,
      "userId": "immuta_system_account",
      "dataSourceId": 2,
      "dataSourceName": "Customer 2",
      "count": 1,
      "recordType": "nativeLineageDataSourceTagUpdate",
      "success": true,
      "component": "dataSource",
      "extra": {
        "sourceColumn": {
          "nativeColumnName": "\"MY_DATABASE\".\"PUBLIC\".\"CUSTOMER\".\"C_FIRST_NAME\"",
          "dataSourceId": 1,
          "columnName": "c_first_name"
        },
        "dataSourceId": 2,
        "columnName": "c_first_name",
        "tagPropagationDirection": "downstream",
        "tags": [
          {
            "name": "SNOWFLAKE_TAGS.pii",
            "source": "immuta-us-east-1"
          }
        ]
      },
      "newAuditServiceFields": {
        "actorIp": null,
        "sessionId": null
      },
      "createdAt": "2022-12-06T19:32:50.372Z",
      "updatedAt": "2022-12-06T19:32:50.372Z"
    }
     [
     "--conf","spark.driver.extraJavaOptions=-Djava.security.manager=com.immuta.security.ImmutaSecurityManager -Dimmuta.security.manager.classes.config=file:///databricks/immuta/allowedCallingClasses.json -Dimmuta.spark.encryption.fpe.class=com.immuta.spark.encryption.ff1.ImmutaFF1Service",
     "--conf","spark.executor.extraJavaOptions=-Djava.security.manager=com.immuta.security.ImmutaSecurityManager -Dimmuta.security.manager.classes.config=file:///databricks/immuta/allowedCallingClasses.json -Dimmuta.spark.encryption.fpe.class=com.immuta.spark.encryption.ff1.ImmutaFF1Service",
     "--conf","spark.databricks.repl.allowedLanguages=python,sql,scala,r",
     "dbfs:/path/to/script.R",
     "arg1", "arg2", "..."
     ]
    package com.example.job
    
    import java.net.URLClassLoader
    import java.io.File
    
    import org.apache.spark.sql.SparkSession
    
    object ImmutaSparkSubmitExample {
    def main(args: Array[String]): Unit = {
        val jarDir = new File("/databricks/immuta/jars/")
        val urls = jarDir.listFiles.map(_.toURI.toURL)
    
        // Configure a new ClassLoader which will load jars from the additional jars directory
        val cl = new URLClassLoader(urls)
        val jobClass = cl.loadClass(classOf[ImmutaSparkSubmitExample].getName)
        val job = jobClass.newInstance
        jobClass.getMethod("runJob").invoke(job)
    }
    }
    
    class ImmutaSparkSubmitExample {
    
    def getSparkSession(): SparkSession = {
        SparkSession.builder()
        .appName("Example Spark Submit")
        .enableHiveSupport()
        .config("immuta.spark.acl.assume.not.privileged", "true")
        .config("spark.hadoop.immuta.databricks.config.update.service.enabled", "false")
        .getOrCreate()
    }
    
    def runJob(): Unit = {
        val spark = getSparkSession
        try {
        val df = spark.table("immuta.<YOUR DATASOURCE>")
    
        // Run Immuta Spark queries...
    
        } finally {
        spark.stop()
        }
    }
    }
     [
     "--conf","spark.driver.extraJavaOptions=-Djava.security.manager=com.immuta.security.ImmutaSecurityManager -Dimmuta.security.manager.classes.config=file:///databricks/immuta/allowedCallingClasses.json -Dimmuta.spark.encryption.fpe.class=com.immuta.spark.encryption.ff1.ImmutaFF1Service",
     "--conf","spark.executor.extraJavaOptions=-Djava.security.manager=com.immuta.security.ImmutaSecurityManager -Dimmuta.security.manager.classes.config=file:///databricks/immuta/allowedCallingClasses.json -Dimmuta.spark.encryption.fpe.class=com.immuta.spark.encryption.ff1.ImmutaFF1Service",
     "--conf","spark.databricks.repl.allowedLanguages=python,sql,scala,r",
     "--class","org.youorg.package.MainClass",
     "dbfs:/path/to/code.jar",
     "arg1", "arg2", "..."
     ]
    INFO LibraryDownloadManager: Downloaded library dbfs:/FileStore/jars/maven/org/slf4j/slf4j-api-1.7.25.jar as
    local file /local_disk0/tmp/addedFile8569165920223626894slf4j_api_1_7_25-784af.jar
    kubectl get secret/immuta-secret
    # query-engine
    IMMUTA_FEATURE_PASSWORD=<immuta-feature-password>
    PATRONI_SUPERUSER_PASSWORD=<patroni-superuser-password>
    PATRONI_REPLICATION_PASSWORD=<patroni-replication-password>
    PATRONI_RESTAPI_PASSWORD=<patroni-api-password>
    kubectl create secret generic immuta-legacy-secret --from-env-file=secret-data.env
    legacy:
      enabled: true
    
      queryEngine:
        statefulset:
          extraEnvVars:
          - name: IMMUTA_FEATURE_PASSWORD
            valueFrom:
              secretKeyRef:
                name: immuta-legacy-secret
                key: IMMUTA_FEATURE_PASSWORD
          - name: PATRONI_SUPERUSER_PASSWORD
            valueFrom:
              secretKeyRef:
                name: immuta-legacy-secret
                key: PATRONI_SUPERUSER_PASSWORD
          - name: PATRONI_REPLICATION_PASSWORD
            valueFrom:
              secretKeyRef:
                name: immuta-legacy-secret
                key: PATRONI_REPLICATION_PASSWORD
          - name: PATRONI_RESTAPI_PASSWORD
            valueFrom:
              secretKeyRef:
                name: immuta-legacy-secret
                key: PATRONI_RESTAPI_PASSWORD
    
        postgres:
          # Query Engine feature user
          # Instead use queryEngine.statefulset.extraEnvVars[].name[IMMUTA_FEATURE_PASSWORD]
          # password: <immuta-feature-password>
    
          # Query Engine superuser user
          # Instead use queryEngine.statefulset.extraEnvVars[].name[PATRONI_SUPERUSER_PASSWORD]
          # superuserPassword: <patroni-superuser-password>
    
          # Query Engine replication user
          # Instead use queryEngine.statefulset.extraEnvVars[].name[PATRONI_REPLICATION_PASSWORD]
          # replicationPassword: <patroni-replication-password>
    
          # Query Engine patroni api user
          # Instead use queryEngine.statefulset.extraEnvVars[].name[PATRONI_RESTAPI_PASSWORD]
          # patroniApiPassword: <patroni-api-password>
        immutaSecurity:
          # Each Kubernetes Service has a DNS record associated with it. See: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
          # The anatomy of a domain name is as followed:
          #   <service>.<namespace>.svc.<cluster-domain>
          #
          # Where the default cluster domain is: cluster.local
          authEndpoint: "http://immuta-secure.immuta.svc.cluster.local:8823"
    
    secure:
      extraEnvVars:
      - name: IMMUTA_DATABASES_IMMUTA_CONNECTIONS_FEATURESTOREDB_PASSWORD
        valueFrom:
          secretKeyRef:
            name: immuta-legacy-secret
            key: IMMUTA_FEATURE_PASSWORD
    
      extraConfig:
        queryEngineRehydration:
          enabled: true
        disableFeatureStore: false
        databases:
          immuta:
            connections:
              featureStoreDb:
                # Each Kubernetes Service has a DNS record associated with it. See: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
                # The anatomy of a domain name is as followed:
                #   <service>.<namespace>.svc.<cluster-domain>
                #
                # Where the default cluster domain is: cluster.local
                host: "immuta-legacy-query-engine-service.immuta.svc.cluster.local"
                port: 5432
                ssl: false
                # Query Engine feature user
                # Instead use secure.extraEnvVars[].name[IMMUTA_DATABASES_IMMUTA_CONNECTIONS_FEATURESTOREDB_PASSWORD]
                # password: <immuta-feature-password>
        fingerprints:
          # Each Kubernetes Service has a DNS record associated with it. See: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
          # The anatomy of a domain name is as follows:
          #   <service>.<namespace>.svc.<cluster-domain>
          #
          # Where the default cluster domain is: cluster.local
          uri: "http://immuta-legacy-fingerprint-service.immuta.svc.cluster.local:5001/"
          queryEngineHost: "immuta-legacy-query-engine-service.immuta.svc.cluster.local"
          queryEnginePort: 5432
    helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.20
    rm -i secret-data.env
    IMMUTA_DATABRICKS_SPARKLYR_SUPPORT_ENABLED=true
    sc <- spark_connect(method = "databricks")
    spark.databricks.passthrough.enabled true
    
    spark.databricks.pyspark.trustedFilesystems com.databricks.s3a.S3AFileSystem,shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem,shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem,com.databricks.adl.AdlFileSystem,shaded.databricks.V2_1_4.com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem,shaded.databricks.org.apache.hadoop.fs.azure.NativeAzureFileSystem,shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem,org.apache.hadoop.fs.ImmutaSecureFileSystemWrapper
    
    spark.hadoop.fs.s3a.aws.credentials.provider com.amazonaws.auth.InstanceProfileCredentialsProvider
    IMMUTA_DATABRICKS_SPARKLYR_SUPPORT_ENABLED=true
    
    IMMUTA_SPARK_REQUIRE_EQUALIZATION=true
    
    IMMUTA_SPARK_CURRENT_USER_SCIM_FALLBACK=false
    immuta.spark.acl.assume.not.privileged true
    
    immuta.api.key=<user’s API key>
    dbGetQuery(sc, "show tables in immuta")

    The last LTS (2022.5.x) or 2024.1 or newer

  • Less than 2024.2

  • The Immuta Helm chart (version) is greater than or equal to 4.13.5

  • The Immuta Helm chart name (chart) is immuta

  • If any of the criteria is not met, it's first necessary to perform a Helm upgrade using the IHC. Contact your Immuta representative for guidance.

  • exit
    , and then press
    Enter
    to exit the shell prompt.
  • Copy file bometadata.dump from the pod to the host's working directory.

  • Create an
    immuta
    role and database.
  • Type \q, and then press Enter to exit the psql prompt.

  • Authenticate as the immuta user and create the pgcrypto extension.

  • Type \q, and then press Enter to exit the psql prompt.

  • .
  • Perform a database restore while authenticated as role immuta. Refer to the value substituted for <postgres-password> when prompted to enter a password.

  • Type exit, and then press Enter to exit the shell prompt.

  • Delete pod immuta-restore-db that was previously created.

  • Follow a cloud provider-specific installation guide to complete the upgrade. If your distribution is not listed below (such as K3sarrow-up-right or RKE2arrow-up-right), follow the generic installation instructions:
    • Managed public cloud: This guide includes instructions for

      • Amazon Elastic Kubernetes Service (EKS)

      • Google Kubernetes Engine (GKE)

      • Microsoft Azure Kubernetes Service (AKS)

    resolvable from within the Kubernetes cluster
    accepting connections

    The Redshift role used to run the Immuta bootstrap script must have the following privileges when configuring the integration to

    • Use an existing database:

      • ALL PRIVILEGES ON DATABASE for the database you configure the integration with, as you must manage grants on that database.

      • CREATE USER

      • GRANT TEMP ON DATABASE

    • Create a new database:

      • CREATE DATABASE

      • CREATE USER

  • A Redshift database that contains an external schema and external tablesarrow-up-right.

  • Complete the Host and Port fields.
  • Enter the name of the database you created the external schema in as the Immuta Database. This database will store all secure schemas and Immuta-created views.

  • Opt to check the Enable Impersonation box and customize the Impersonation Role name as needed. This will allow users to natively impersonate another user.

  • Select Manual and download both of the bootstrap scripts from the Setup section. The specified role used to run the bootstrap needs to have the following privileges:

    • ALL PRIVILEGES ON DATABASE for the database you configure the integration with, as you must manage grants on that database.

    • CREATE USER

    • GRANT TEMP ON DATABASE

  • Run the bootstrap script (Immuta database) in the Redshift database that contains the external schema.

  • Choose your authentication method, and enter the credentials from the bootstrap script for the Immuta_System_Account.

  • Click Test Redshift Connection.

  • Click Save.

  • Complete the Host and Port fields.
  • Enter an Immuta Database. This is a new database where all secure schemas and Immuta created views will be stored.

  • Opt to check the Enable Impersonation box and customize the Impersonation Role name as needed. This will allow users to natively impersonate another user.

  • Select Manual and download both of the bootstrap scripts from the Setup section. The specified role used to run the bootstrap needs to have the following privileges:

    • ALL PRIVILEGES ON DATABASE for the database you configure the integration with, as you must manage grants on that database.

    • CREATE DATABASE

    • CREATE USER

    • GRANT TEMP ON DATABASE

  • Run the bootstrap script (initial database) in the Redshift initial database.

  • Run the bootstrap script (Immuta database) in the new Immuta Database in Redshift.

  • Choose your authentication method, and enter the credentials from the bootstrap script for the Immuta_System_Account.

  • Click Test Redshift Connection.

  • Click Save.

  • Redshift overview
    Contact Immutaarrow-up-right
    An AWS IAM role for Redshiftarrow-up-right
    associated with your Redshift clusterarrow-up-right
    enable_case_sensitive_identifier parameterarrow-up-right
    Register Redshift data in Immuta
    Register Redshift data in Immuta
    hashtag
    Architecture

    An Application Admin will configure Databricks with either the

    • Simplified Databricks Configuration on the Immuta App Settings page

    • Manual Databricks Configuration where Immuta artifacts must be downloaded and staged to your Databricks clusters

    In both configuration options, the Immuta init script adds the Immuta plugin in Databricks: the Immuta Security Manager, wrappers, and Immuta analysis hook plan rewrite. Once an administrator gives users Can Attach To entitlements on the cluster, they can query Immuta-registered data source directly in their Databricks notebooks.

    circle-info

    Simplified Databricks configuration additional entitlements

    The credentials used to do the Simplified Databricks configuration with automatic cluster policy push must have the following entitlement:

    • Allow cluster creation

    This will give Immuta temporary permission to push the cluster policies to the configured Databricks workspace and overwrite any cluster policy templates previously applied to the workspace.

    hashtag
    Policy Enforcement

    circle-info

    Immuta best practices: Test user

    Test the integration on an Immuta-enabled cluster with a user that is not a Databricks administrator.

    hashtag
    Registering Data Sources

    You should register entire databases with Immuta and run Schema Monitoring jobs through the Python script provided during data source registration. Additionally, you should use a Databricks administrator account to register data sources with Immuta using the UI or API; however, you should not test Immuta policies using a Databricks administrator account, as they are able to bypass controls.

    hashtag
    Table Access

    A Databricks administrator can control who has access to specific tables in Databricks through Immuta Subscription Policies or by manually adding users to the data source. Data users will only see the immuta database with no tables until they are granted access to those tables as Immuta data sources.

    hashtag
    The immuta Database

    When a table is registered in Immuta as a data source, users can see that table in both the backing database and in the immuta database. This allows for an option to use the immuta database as a single database for all tables.

    hashtag
    Fine-grained Access Control

    After data users have subscribed to data sources, administrators can apply fine-grained access controls, such as restricting rows or masking columns with advanced anonymization techniques, to manage what the users can see in each table. More details on the types of data policies can be found on the Data Policies page, including an overview of masking struct and array columns in Databricks.

    Note: Immuta recommends building Global Policies rather than Local Policies, as they allow organizations to easily manage policies as a whole and capture system state in a more deterministic manner.

    hashtag
    Accessing Data

    All access controls must go through SQL.

    Note: With R, you must load the SparkR library in a cell before accessing the data.

    hashtag
    Mapping Users

    Usernames in Immuta must match usernames in Databricks. It is best practice to use the same identity manager for Immuta that you use for Databricks (Immuta supports these identity manager protocols and providers).

    hashtag
    Data Flow

    1. An Immuta Application Administrator configures the Databricks integration and registers available cluster policies Immuta generates.

    2. The Immuta init script adds the immuta plugin in Databricks: the Immuta SecurityManager, wrappers, and Immuta analysis hook plan rewrite.

    3. A Data Owner registers Databricks tables in Immuta as data sources. A Data Owner, Data Governor, or Administrator creates or changes a policy or user in Immuta.

    4. Data source metadata, tags, user metadata, and policy definitions are stored in Immuta's Metadata Database.

    5. A Databricks user who is subscribed to the data source in Immuta directly in their notebook or workspace.

    6. During Spark Analysis, Spark calls down to the Metastore to get table metadata.

    7. Immuta intercepts the call to retrieve table metadata from the Metastore.

    8. Immuta modifies the Logical Plan to enforce policies that apply to that user.

    9. Immuta wraps the Physical Plan with specific Java classes to signal to the SecurityManager that it is a trusted node and is allowed to scan raw data.

    10. The Physical Plan is applied and filters out and transforms raw data coming back to the user.

    11. The user sees policy-enforced data.

    Databricks Installation Introduction

    Ingress Configuration

    This guide demonstrates how to configure Ingressarrow-up-right. Ingress can be configured in numerous ways. Configurations for the most popular controllers are outlined below.

    circle-info

    Kubernetes namespace

    The following section(s) presume the Immuta Enterprise Helm chart was deployed into namespace immuta and that the current namespace is immuta.

    The Immuta web service listens on the following ports:

    Port
    Protocol
    Description
    Optional

    hashtag

    circle-exclamation

    Deprecation notice

    Kubernetes is ending support for Ingress NGINX. See the official for details.

    circle-info

    Ingress hostname

    This is the fully qualified domain name (FQDN) as defined by RFC 3986 used to access the Immuta UI. If a FQDN has yet to be determined set Secure's ingress hostname to immuta.local.

    1. Edit the immuta-values.yaml file to include the following Helm values.

    2. Perform a to apply the changes made to immuta-values.yaml.

    Refer to the for further assistance.

    hashtag

    1. Edit immuta-values.yaml to include the following Helm values.

    2. Create a file named frontendconfig.yaml with the following content.

    3. Apply the FrontendConfig CRD.

    Refer to the for further assistance.

    hashtag

    1. Edit immuta-values.yaml to include the following Helm values.

    2. Perform a to apply the changes made to immuta-values.yaml.

    Refer to the for further assistance.

    hashtag

    1. Edit immuta-values.yaml to include the following Helm values.

    2. Perform a to apply the changes made to immuta-values.yaml.

    Refer to the for further assistance.

    hashtag

    1. Edit immuta-values.yaml to include the following Helm values.

    2. Create a file named middleware.yaml with the following content.

    3. Apply the Middleware CRD.

    Refer to the for further assistance.

    hashtag

    1. Edit immuta-values.yaml to include the following Helm values. Because the Ingress resource will be managed by the OpenShift route you create and not the Immuta Enterprise Helm chart, ingress is set to false below.

    2. Get the service name for Secure.

    Refer to the for further assistance.

    Configuration

    circle-info

    This page contains references to the term whitelist, which Immuta no longer uses. When the term is removed from the software, it will be removed from this page.

    hashtag
    Prerequisites

    • Databricks instance: Premium tier workspace and

    • Databricks instance has network level access to Immuta tenant

    • Permissions and access to download (outside Internet access) or transfer files to the host machine

    Recommended Databricks Workspace Configurations:

    Note: Azure Databricks authenticates users with Microsoft Entra ID. Be sure to configure your Immuta tenant with an IAM that uses the same user ID as does Microsoft Entra ID. Immuta's Spark security plugin will look to match this user ID between the two systems. See this for details.

    hashtag
    Supported Databricks Runtime Versions

    Use the table below to determine which version of Immuta supports your Databricks Runtime version:

    Databricks Runtime Version
    Immuta Version

    hashtag
    Supported Databricks Cluster Configurations

    The table below outlines the integrations supported for various Databricks cluster configurations. For example, the only integration available to enforce policies on a cluster configured to run on Databricks Runtime 9.1 is the Databricks Spark integration.

    Example cluster
    Databricks Runtime
    Unity Catalog in Databricks
    Databricks Spark integration
    Databricks Unity Catalog integration

    Legend:

    • ✅ The feature or integration is enabled.

    • ⛔ The feature or integration is disabled.

    hashtag
    Supported Access Mode and Languages

    Immuta supports the Custom access mode.

    • Supported Languages:

      • Python

      • SQL

    hashtag
    Databricks Installation Overview

    circle-info

    Users who can read raw tables on-cluster

    • If a Databricks Admin is tied to an Immuta account, they will have the ability to read raw tables on-cluster.

    • If a Databricks user is listed as an "ignored" user, they will have the ability to read raw tables on-cluster. Users can be added to the

    The Immuta Databricks integration injects an Immuta plugin into the SparkSQL stack at cluster startup. The Immuta plugin creates an "immuta" database that is available for querying and intercepts all queries executed against it. For these queries, policy determinations will be obtained from the connected Immuta tenant and applied before returning the results to the user.

    The Databricks cluster init script provided by Immuta downloads the Immuta artifacts onto the target cluster and puts them in the appropriate locations on local disk for use by Spark. Once the init script runs, the Spark application running on the Databricks cluster will have the appropriate artifacts on its CLASSPATH to use Immuta for policy enforcement.

    The cluster init script uses environment variables in order to

    • Determine the location of the required artifacts for downloading.

    • Authenticate with the service/storage containing the artifacts.

    Note: Each target system/storage layer (HTTPS, for example) can only have one set of environment variables, so the cluster init script assumes that any artifact retrieved from that system uses the same environment variables.

    hashtag
    Limitations

    See the for known limitations.

    hashtag
    Installation Methods

    There are two installation options for Databricks. Click a link below to navigate to a tutorial for your chosen method:

    • : The steps to enable the integration with this method include

      1. Adding the integration on the App Settings page.

      2. Downloading or automatically pushing cluster policies to your Databricks workspace.

    hashtag
    Debugging Immuta Installation Issues

    For easier debugging of the Immuta Databricks installation, enable cluster init script logging. In the cluster page in Databricks for the target cluster, under Advanced Options -> Logging, change the Destination from NONE to DBFS and change the path to the desired output location. Note: The unique cluster ID will be added onto the end of the provided path.

    For debugging issues between the Immuta web service and Databricks, you can view the Spark UI on your target Databricks cluster. On the cluster page, click the Spark UI tab, which shows the Spark application UI for the cluster. If you encounter issues creating Databricks data sources in Immuta, you can also view the JDBC/ODBC Server portion of the Spark UI to see the result of queries that have been sent from Immuta to Databricks.

    hashtag
    Using the Validation and Debugging Notebook

    The Validation and Debugging Notebook (immuta-validation.ipynb) is packaged with other Databricks release artifacts (for manual installations), or it can be downloaded from the App Settings page when configuring Databricks through the Immuta UI. This notebook is designed to be used by or under the guidance of an Immuta Support Professional.

    1. Import the notebook into a Databricks workspace by navigating to Home in your Databricks instance.

    2. Click the arrow next to your name and select Import.

    3. Once you have executed commands in the notebook and populated it with debugging information, export the notebook and its contents by opening the File menu, selecting Export, and then selecting DBC Archive.

    Simplified Databricks Configuration

    This guide details the simplified installation method for enabling access to Databricks with Immuta policies enforced.

    Ensure your Databricks workspace, instance, and permissions meet the guidelines outlined in the Installation introduction before you begin.

    circle-exclamation

    Databricks Unity Catalog: If Unity Catalog is enabled in a Databricks workspace, you must use an Immuta cluster policy when you setup the integration to create an Immuta-enabled cluster.

    hashtag
    1 - Add the Integration on the App Settings Page

    1. Log in to Immuta and click the App Settings icon in the left sidebar.

    2. Scroll to the System API Key subsection under HDFS and click Generate Key.

    3. Click Save and then Confirm.

    hashtag
    2 - Configure Cluster Policies

    Several cluster policies are available on the App Settings page when configuring this integration:

    Click a link above to read more about each of these cluster policies before continuing with the tutorial.

    1. Click Configure Cluster Policies.

    2. Select one or more cluster policies in the matrix by clicking the Select button(s).

    3. Opt to check the Enable Unity Catalog checkbox to generate cluster policies that will enable Unity Catalog on your cluster. This option is only available when Databricks runtime 11.3 is selected.

    hashtag
    3 - Add Policies to Your Cluster

    1. Create a cluster in Databricks by following the .

    2. In the Policy dropdown, select the Cluster Policies you pushed or manually added from Immuta.

    3. Select the Custom Access mode.

    hashtag
    4 - Register Data

    .

    hashtag
    5 - Query Immuta Data

    When the Immuta-enabled Databricks cluster has been successfully started, Immuta will create an immuta database, which allows Immuta to track Immuta-managed data sources separately from remote Databricks tables so that policies and other security features can be applied. However, users can query sources with their original database or table name without referencing the immuta database. Additionally, when configuring a Databricks cluster you can hide immuta from any calls to SHOW DATABASES so that users aren't misled or confused by its presence. For more details, see the page.

    1. Before users can query an Immuta data source, an administrator must give the user Can Attach To permissions on the cluster.

    2. See the for a detailed walkthrough of creating Databricks data sources in Immuta.

    hashtag
    Example Queries

    Below are example queries that can be run to obtain data from an Immuta-configured data source. Because Immuta supports raw tables in Databricks, you do not have to use Immuta-qualified table names in your queries like the first example. Instead, you can run queries like the second example, which does not reference the .

    Customize Read and Write Access Policies for Starburst (Trino)

    circle-info

    Private preview: Write policies are only available to select accounts. Contact your Immuta representative to enable this feature.

    hashtag
    Requirements

    • Starburst (Trino) version 438 or newer

    • Write policies for Starburst (Trino) enabled. Contact your Immuta representative to get this feature enabled on your account.

    hashtag
    Configuration options

    In its default setting, the Starburst (Trino) integration's write access value controls the authorization of SQL operations that perform data modification (such as INSERT, UPDATE, DELETE, MERGE, and TRUNCATE). However, administrators can allow table modification operations (such as ALTER and DROP tables) to be authorized as write operations. Two locations allow administrators to specify how are applied to data in Starburst (Trino). Select one or both of the options below to customize these settings. If the access-control.properties file is used, it may override the policies configured in the Immuta web service.

    • : Configure write policies in the Immuta web service to allow all Starburst (Trino) clusters targeting that Immuta tenant to receive the same write policy configuration for data sources. This configuration will only affect tables or views registered as Immuta data sources.

    • : Configure write policies using the access-control.properties file in or to broadly customize access for Immuta users on a specific cluster. This configuration file takes precedence over write policies passed from the Immuta web service. Use this option if all Immuta users should have the same level of access to tables regardless of the write policy setting in the Immuta web service.

    hashtag
    Immuta web service configuration

    Contact your Immuta representative to configure read and write access in the Immuta web service if all Starburst (Trino) data source operations should be affected identically across Starburst (Trino) clusters connected to your Immuta tenant. A configuration example is provided below.

    hashtag
    Configuration example

    The following example maps WRITE to READ, WRITE and OWN permissions and READ to just READ. Both READ and WRITE permissions should always include READ:

    Given the above configuration, when a user gets write access to a Starburst (Trino) data source, they will have both data and table modification permissions on that data source. See the for details about these operations.

    hashtag
    Starburst cluster configuration

    Configure the integration to allow read and write policies to apply to any data source (registered or unregistered in Immuta) on a Starburst cluster.

    1. Create the Immuta access control configuration file in the Starburst configuration directory (/etc/starburst/immuta-access-control.properties for Docker installations or <starburst_install_directory>/etc/immuta-access-control.properties for standalone installations).

    2. Modify one or both properties below to customize the behavior of read or write access policies for all users:

    hashtag
    Trino cluster configuration

    1. Create the Immuta access control configuration file in the Trino configuration directory (/etc/trino/config.properties for Docker installations or <trino_install_directory>/etc/config.properties for standalone installations).

    2. Modify one or both properties below to customize the behavior of read or write access policies for all users:

    Generic Installation

    This is a generic guide that demonstrates how to deploy Immuta into any Kubernetes cluster without dependencies on any particular cloud provider.

    hashtag
    Considerations

    For the purposes of this guide, the following state stores are deployed in Kubernetes using third-party Helm charts maintained by :

    Databricks Spark Pre-Configuration Details

    This page describes the Databricks integration, configuration options, and features. See the for a tutorial on enabling Databricks and these features through the App Settings page.

    hashtag
    Feature Availability

    Starburst (Trino) Integration Reference Guide

    circle-info

    Starburst and Trino

    is based on open-source . Consequently, this page occasionally refers to the Trino Execution Engine and Trino methods.

    The Starburst (Trino) integration allows you to access policy-enforced data directly in your Starburst catalogs without rewriting queries or changing workflows. Instead of generating policy-enforced views and adding them to an Immuta catalog that users have to query (like in the legacy Starburst (Trino) integration), Immuta policies are translated into Starburst (Trino) rules and permissions and applied directly to tables within users’ existing catalogs.

    kubectl cp <metadata-database-pod-name>:/tmp/bometadata.dump bometadata.dump
    psql --host <postgres-fqdn> --username immuta --port 5432 --password
    CREATE EXTENSION pgcrypto;
    pg_restore --host=<postgres-fqdn> --port=5432 --username=immuta --password --dbname=immuta --no-owner --role=immuta < /tmp/bometadata.dump
    kubectl delete pod/immuta-restore-db
    helm get metadata --output yaml <helm-release-name>
    kubectl get pod --selector "app.kubernetes.io/component=database" --output name
    kubectl exec --stdin --tty <metadata-database-pod-name> -- sh
    pg_dump --dbname=bometadata --file=/tmp/bometadata.dump --format=custom --no-owner --no-privileges
    kubectl run immuta-setup-db --stdin --tty --rm --image docker.io/bitnami/postgresql:latest -- sh
    psql --host <postgres-fqdn> --username <postgres-admin> --dbname postgres --port 5432 --password
    kubectl run immuta-restore-db --image docker.io/bitnami/postgresql:latest -- sleep infinity
    kubectl cp bometadata.dump immuta-restore-db:/tmp
    mv immuta-values.yaml immuta-values.ihc.yaml
    CREATE ROLE immuta with login encrypted password '<postgres-password>';
    GRANT immuta TO CURRENT_USER;
    
    CREATE DATABASE immuta OWNER immuta;
    
    GRANT all ON DATABASE immuta TO immuta;
    ALTER ROLE immuta SET search_path TO bometadata,public;
    REVOKE immuta FROM CURRENT_USER;
    kubectl exec immuta-restore-db --stdin --tty -- sh
    df = spark.sql("select * from immuta.table")
    import org.apache.spark.sql.SparkSession
    
    val spark = SparkSession
      .builder()
      .appName("Spark SQL basic example")
      .config("spark.some.config.option", "some-value")
      .getOrCreate()
    val sqlDF = spark.sql("SELECT * FROM immuta.table")
    %sql
    select * from immuta.table
    library(SparkR)
    df <- SparkR::sql("SELECT * from immuta.table")

    GRANT TEMP ON DATABASE

  • REVOKE ALL PRIVILEGES ON DATABASE

  • Red Hat OpenShift
    Generic installation
    queries the corresponding table

    Cluster 3

    11.3

    ⛔

    ✅ / ⛔

    Unavailable

    Cluster 4

    11.3

    ✅

    ⛔

    ⛔

    Cluster 5

    11.3

    ✅

    ✅

    ✅

    R (requires advanced configuration; work with your Immuta support professional to use R)
  • Scala (requires advanced configuration; work with your Immuta support professional to use Scala)

  • immuta.spark.acl.whitelist
    configuration to become ignored users.
    Creating or restarting your cluster.
  • Manual Configuration: The steps to enable the integration with this method include

    1. Downloading and configuring Immuta artifacts.

    2. Staging Immuta artifacts somewhere the cluster can read from during its startup procedures.

    3. Protecting Immuta environment variables with Databricks Secrets.

    4. Creating and configuring the cluster to start with the init script and load Immuta into its SparkSQL environment.

  • 11.3 LTS

    2023.1 and newer

    10.4 LTS

    2022.2.x and newer

    7.3 LTS 9.1 LTS

    2021.5.x and newer

    Cluster 1

    9.1

    Unavailable

    ✅

    Unavailable

    Cluster 2

    10.4

    Unavailable

    ✅

    Cluster access control enabledarrow-up-right
    Workspace access control enabledarrow-up-right
    Personal access tokens enabledarrow-up-right
    Microsoft Entra ID page
    Databricks Pre-Configuration Details page
    Simplified Configuration

    Unavailable

    Perform a Helm upgradearrow-up-right to apply the changes made to immuta-values.yaml.

    Perform a Helm upgradearrow-up-right to apply the changes made to immuta-values.yaml.

    Create a file named
    route.yaml
    with the following content. Update all
    with your own values.
  • Apply the Route CRD.

  • Perform a Helm upgradearrow-up-right to apply the changes made to immuta-values.yaml.

  • 443

    TCP

    HTTPS

    False

    80

    TCP

    HTTP (redirects to HTTPS)

    True

    Ingress NGINX Controllerarrow-up-right
    Kubernetes announcementarrow-up-right
    Helm upgradearrow-up-right
    Ingress-Nginx Controller documentationarrow-up-right
    GKE Ingress Controllerarrow-up-right
    Google Cloud documentationarrow-up-right
    AWS Load Balancer Controllerarrow-up-right
    Helm upgradearrow-up-right
    AWS Load Balancer Controller documentationarrow-up-right
    AKS Application Gateway Ingress Controllerarrow-up-right
    Helm upgradearrow-up-right
    Application Gateway Ingress Controller documentationarrow-up-right
    Traefikarrow-up-right
    Traefik documentationarrow-up-right
    OpenShift Ingress Operatorarrow-up-right
    Red Hat OpenShift documentationarrow-up-right
    placeholder values
    immuta.allowed.immuta.datasource.operations: This property governs objects (catalogs, schemas, tables, etc.) that are registered as data sources in Immuta. These permissions apply to all querying users except for administrators defined in immuta.user.admin (who get all permissions).
    • READ: Grants SELECT on tables or views; grants SHOW on tables, views, or columns

    • WRITE: Grants INSERT, UPDATE, DELETE, MERGE, or TRUNCATE on tables; grants REFRESH on materialized views.

    • OWN: Grants ALTER and DROP on tables; grants SET on comments and properties

  • immuta.allowed.non.immuta.datasource.operations: This property governs objects (catalogs, schemas, tables, etc.) that are not registered as data sources in Immuta. Use all or a combination of the following access values:

    • READ: Grants SELECT on tables or views; grants SHOW on tables, views, or columns

    • WRITE: Grants INSERT, UPDATE, DELETE, MERGE, or TRUNCATE on tables; grants REFRESH on materialized views.

    • OWN: Grants ALTER and DROP on tables; grants SET on comments and properties

    • CREATE: Grants CREATE on catalogs, schema, tables, and views. This is the only property that can allow CREATE permissions, since CREATE is enforced on new objects that do not exist in Starburst or Immuta yet (such as a new table being created with CREATE TABLE).

  • For example, the following configuration allows READ, WRITE, and OWN operations to be authorized on data sources registered in Immuta and all operations are permitted on data that is not registered in Immuta:

  • Enable the Immuta access control plugin in the Starburst cluster's configuration file (/etc/starburst/config.properties for Docker installations or <starburst_install_directory>/etc/config.properties for standalone installations). For example,

  • immuta.allowed.immuta.datasource.operations: This property governs objects (catalogs, schemas, tables, etc.) that are registered as data sources in Immuta. These permissions apply to all querying users except for administrators defined in immuta.user.admin (who get all permissions).
    • READ: Grants SELECT on tables or views; grants SHOW on tables, views, or columns

    • WRITE: Grants INSERT, UPDATE, DELETE, MERGE, or TRUNCATE on tables; grants REFRESH on materialized views.

    • OWN: Grants ALTER and DROP on tables; grants SET on comments and properties

  • immuta.allowed.non.immuta.datasource.operations: This property governs objects (catalogs, schemas, tables, etc.) that are not registered as data sources in Immuta. Use all or a combination of the following access values:

    • READ: Grants SELECT on tables or views; grants SHOW on tables, views, or columns

    • WRITE: Grants INSERT, UPDATE, DELETE, MERGE, or TRUNCATE on tables; grants REFRESH on materialized views.

    • OWN: Grants ALTER and DROP on tables; grants SET on comments and properties

    • CREATE: Grants CREATE on catalogs, schema, tables, and views. This is the only property that can allow CREATE permissions, since CREATE is enforced on new objects that do not exist in Starburst or Immuta yet (such as a new table being created with CREATE TABLE).

  • For example, the following configuration allows READ, WRITE, and OWN operations to be authorized on data sources registered in Immuta and all operations are permitted on data that is not registered in Immuta:

  • Enable the Immuta access control plugin in Trino's configuration file (/etc/trino/config.properties for Docker installations or <trino_install_directory>/etc/config.properties for standalone installations). For example,

  • read and write access policies
    Immuta web service
    Starburst (Trino) cluster
    Starburst
    Trino
    Starburst (Trino) privileges section of the Subscription policy access types guide
    helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.20
    helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.20
    oc apply -f route.yaml
    helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.20
    secure:
      ingress:
        hostname: <immuta-fqdn>
        ingressClassName: nginx
        annotations:
          nginx.ingress.kubernetes.io/force-ssl-redirect: 'true'
          nginx.ingress.kubernetes.io/proxy-body-size: '64m'
    helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.20
    secure:
      ingress:
        hostname: <immuta-fqdn>
        annotations:
          # Determines which type of load balancer is provisioned
          #   gce-internal
          #   gce
          kubernetes.io/ingress.class: gce
          # Listen on both 80 and 443
          kubernetes.io/ingress.allow-http: 'true'
          # Redirect traffic from 80 to 443
          cloud.google.com/frontend-config: immuta
    apiVersion: networking.gke.io/v1beta1
    kind: FrontendConfig
    metadata:
      name: immuta
    spec:
      redirectToHttps:
        enabled: true
        responseCodeName: RESPONSE_CODE
    kubectl apply -f frontendconfig.yaml
    secure:
      ingress:
        hostname: <immuta-fqdn>
        ingressClassName: alb
        annotations:
          # Determines which type of load balancer is provisioned
          #   internal
          #   internet-facing
          alb.ingress.kubernetes.io/scheme: internet-facing
          alb.ingress.kubernetes.io/target-type: ip
          # Listen on both 80 and 443
          alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS":443}]'
          # Redirect traffic from 80 to 443
          alb.ingress.kubernetes.io/ssl-redirect: '443'
    helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.20
    secure:
      ingress:
        hostname: <immuta-fqdn>
        ingressClassName: webapprouting.kubernetes.azure.com
        # https://azure.github.io/application-gateway-kubernetes-ingress/annotations/
        annotations:
          appgw.ingress.kubernetes.io/ssl-redirect: 'true'
    helm upgrade <release-name> oci://ocir.immuta.com/stable/immuta-enterprise --values immuta-values.yaml --version 2024.2.20
    secure:
      ingress:
        hostname: <immuta-fqdn>
        ingressClassName: traefik
        annotations:
          # Listen on ports 80 and 443
          traefik.ingress.kubernetes.io/router.entrypoints: web,websecure
          # Redirect HTTP to HTTPS
          # When referencing middleware you must prefix the name with its namespace
          # <namespace>-<middleware-name>@kubernetescrd
          traefik.ingress.kubernetes.io/router.middlewares: immuta-https-redirectscheme@kubernetescrd
    apiVersion: traefik.containo.us/v1alpha1
    kind: Middleware
    metadata:
      name: https-redirectscheme
    spec:
      redirectScheme:
        scheme: https
        permanent: true
    kubectl apply -f middleware.yaml
    secure:
      ingress:
        enabled: false
    oc get service --selector "app.kubernetes.io/component=secure" --output template='{{ .metadata.name }}'
    apiVersion: route.openshift.io/v1
    kind: Route
    metadata:
      name: immuta
    spec:
      host: <immuta-fqdn>
      to:
        kind: Service
        name: immuta-secure
      port:
        targetPort: http
      tls:
        termination: edge
        insecureEdgeTerminationPolicy: Redirect
    access-control.config-files=/etc/starburst/immuta-access-control.properties
    access-control.config-files=/etc/trino/immuta-access-control.properties
    accessGrantMapping:
      WRITE: ['READ', 'WRITE', 'OWN']
      READ: ['READ']
    immuta.allowed.immuta.datasource.operations=READ,WRITE,OWN
    immuta.allowed.non.immuta.datasource.operations=READ,WRITE,CREATE,OWN
    immuta.allowed.immuta.datasource.operations=READ,WRITE,OWN
    immuta.allowed.non.immuta.datasource.operations=READ,WRITE,CREATE,OWN

    Scroll to the Integration Settings section, and click + Add Integration.

  • Select Databricks Integration from the dropdown menu.

  • Complete the Hostname field.

  • Enter a Unique ID for the integration. By default, your Immuta tenant URL populates this field. This ID is used to tie the set of cluster policies to your Immuta tenant and allows multiple Immuta tenants to access the same Databricks workspace without cluster policy conflicts.

  • Select your configured Immuta IAM from the dropdown menu.

  • Choose one of the following options for your data access model:

    • Protected until made available by policy: All tables are hidden until a user is permissioned through an Immuta policy. This is how most databases work and assumes least privileged access and also means you will have to register all tables with Immuta.

    • Available until protected by policy: All tables are open until explicitly registered and protected by Immuta. This makes a lot of sense if most of your tables are non-sensitive and you can pick and choose which to protect.

  • Select the Storage Access Type from the dropdown menu.

  • Opt to add any Additional Hadoop Configuration Files.

  • Click Add Integration.

  • Scala
  • Sparklyr

  • Opt to make changes to these cluster policies by clicking Additional Policy Changes and editing the text field.

  • Use one of the two Installation Types described below to apply the policies to your cluster:

    • Automatically Push Cluster Policies: This option allows you to automatically push the cluster policies to the configured Databricks workspace. This will overwrite any cluster policy templates previously applied to this workspace.

      1. Select the Automatically Push Cluster Policies radio button.

      2. Enter your Admin Token. This token must be for a user who can create cluster policies in Databricks.

      3. Click Apply Policies.

    • Manually Push Cluster Policies: Enabling this option will allow you to manually push the cluster policies to the configured Databricks workspace. There will be various files to download and manually push to the configured Databricks workspace.

      1. Select the Manually Push Cluster Policies radio button.

      2. Click Download Init Script.

  • Opt to click the Download the Benchmarking Suite to compare a regular Databricks cluster to one protected by Immuta. Detailed instructions are available in the first notebook, which will require an Immuta and non-Immuta cluster to generate test data and perform queries.

  • Click Close, and then click Save and Confirm.

  • Opt to adjust Autopilot Options and Worker Type settings: The default values provided here may be more than what is necessary for non-production or smaller use-cases. To reduce resource usage you can enable/disable autoscaling, limit the size and number of workers, and set the inactivity timeout to a lower value.
  • Opt to configure the Instances tab in the Advanced Options section:

    • IAM Role (AWS ONLY): Select the instance role you created for this cluster. (For access key authentication, you should instead use the environment variables listed in the AWS section.)

  • Click Create Cluster.

  • Python & SQL
    Python & SQL & R
    Python & SQL & R with Library Support
    Databricks documentationarrow-up-right
    Register Databricks securables in Immuta
    Hiding the immuta Database in Databricks
    Databricks Data Source Creation guide
    immuta database
    Elasticsearcharrow-up-right
  • PostgreSQLarrow-up-right

  • circle-exclamation

    Running production-grade stateful workloads (e.g., databases) in Kubernetes is difficult and heavily discouraged due to the following reasons.

    • Operational overhead: Managing PostgreSQL and Elasticsearch on Kubernetes requires expertise in deploying, maintaining, and scaling these databases and search engines effectively. This involves tasks like setting up monitoring, configuring backups, managing updates, and ensuring high availability. Cloud-managed services abstract much of this operational burden away, allowing teams to focus on application development rather than infrastructure management.

    • Resource allocation and scaling: Kubernetes requires careful resource allocation and scaling decisions to ensure that PostgreSQL and Elasticsearch have sufficient CPU, memory, and storage. Properly sizing these resources can be challenging and may require continuous adjustments as workload patterns change. Managed services typically handle this scaling transparently and can automatically adjust based on demand.

    • Data integrity and high availability: PostgreSQL and Elasticsearch deployments need robust strategies for data integrity and high availability. Kubernetes can facilitate high availability through pod replicas and distributed deployments, but ensuring data consistency and durability across database instances and search indexes requires careful consideration and often additional tooling.

    • Performance: Kubernetes networking and storage configurations can introduce performance overhead compared to native cloud services. For latency-sensitive applications or high-throughput workloads, these factors become critical in maintaining optimal performance.

    • Observability: Troubleshooting issues in a Kubernetes environment, especially related to database and search engine performance, can be complex. Managed services typically come with built-in monitoring, logging, and alerting capabilities tailored to the specific service, making it easier to identify and resolve issues.

    • Security and compliance: Kubernetes environments require careful attention to security best practices, including network policies, access controls, and encryption. Managed services often come pre-configured with security features and compliance certifications, reducing the burden on teams to implement and maintain these measures.

    hashtag
    Authenticate with OCI registry

    circle-exclamation

    Helm chart availability

    The deprecated Immuta Helm chart (IHC) is not available from ocir.immuta.com.

    Copy the snippet below and replace the placeholder text with the credentials provided to you by your customer success manager:

    hashtag
    Setup

    1. Create a Kubernetes namespace named immuta for Immuta and its third-party dependencies.

    2. Switch to namespace immuta.

    3. Create a container registry pull secret. Your credentials to authenticate with ocir.immuta.com can be viewed in your user profile at support.immuta.comarrow-up-right.

    hashtag
    Elasticsearch

    1. Create a Helm values file named es-values.yaml with the following content:

    2. Deploy Elasticsearch.

    hashtag
    PostgreSQL

    1. Create a Helm values file named pg-values.yaml with the following content:

    2. Update all placeholder values in the pg-values.yaml file.

    3. Deploy PostgreSQL.

    4. Wait for all pods in the namespace to become ready.

    5. Determine the name of the PostgreSQL database pod. This will be referenced in a subsequent step.

    6. Exec into the PostgreSQL database pod using the psql command and immuta user to configure the PostgreSQL user used by Immuta.

    7. Alter the search_path for the immuta user.

    8. Enable the pgcrypto extension.

    9. Type \q then press Enter to exit.

    hashtag
    Install Immuta

    circle-exclamation

    Audit records

    Preserving legacy audit records

    Immuta does not migrate legacy audit records to the universal audit model (UAM), so when you upgrade Immuta those audit records will be lost unless you enable the following setting in your immuta-values.yaml file:

    Audit record retention

    Immuta defaults to keeping audit records for 7 days. To change this duration, set the following values in the immuta-values.yaml file. The example below configures audit records to be kept for 90 days:

    This section demonstrates how to deploy Immuta using the Immuta Enterprise Helm chart once the prerequisite local services are configured.

    1. Create a Helm values file named immuta-values.yaml with the following content:

    2. Update all placeholder values in the immuta-values.yaml file.

    circle-exclamation

    Avoid these special characters in generated passwords

    whitespace, $, &, :, \, /, '

    1. Deploy Immuta.

    hashtag
    Validation

    1. Wait for all pods in the namespace to become ready.

    2. Determine the name of the Secure service.

    3. Listen on local port 8080, forwarding TCP traffic to the Secure service's port named http.

    4. Navigate to http://localhost:8080 in a web browser.

    hashtag
    Next steps

    • Configure Ingress to complete your installation and access your Immuta application.

    • Configure TLS to secure your Ingress by specifying a Secret that contains a TLS private key and certificate.

    • Learn more about best practices for Immuta in Production.

    Bitnamiarrow-up-right

    ✅

    ❌

    ✅

    ✅

    ✅

    hashtag
    Supported Databricks Cluster Configurations

    The table below outlines the integrations supported for various Databricks cluster configurations. For example, the only integration available to enforce policies on a cluster configured to run on Databricks Runtime 9.1 is the Databricks Spark integration.

    Example cluster
    Databricks Runtime
    Unity Catalog in Databricks
    Databricks Spark integration
    Databricks Unity Catalog integration

    Cluster 1

    9.1

    Unavailable

    ✅

    Unavailable

    Cluster 2

    10.4

    Unavailable

    ✅

    Legend:

    • ✅ The feature or integration is enabled.

    • ⛔ The feature or integration is disabled.

    hashtag
    Databricks-Specific Details

    hashtag
    Prerequisites

    • Databricks instance: Premium tier workspace and Cluster access control enabledarrow-up-right

    • Databricks instance has network level access to Immuta tenant

    • Permissions and access to download (outside Internet access) or transfer files to the host machine

    Recommended Databricks Workspace Configurations:

    • Workspace access control enabledarrow-up-right

    • Personal access tokens enabledarrow-up-right

    Note: Azure Databricks authenticates users with Microsoft Entra ID. Be sure to configure your Immuta tenant with an IAM that uses the same user ID as does Microsoft Entra ID. Immuta's Spark security plugin will look to match this user ID between the two systems. See this Microsoft Entra ID page for details.

    hashtag
    Supported Databricks Runtime Versions

    See this page for a list of Databricks Runtimes Immuta supports.

    hashtag
    Supported Databricks Cluster Types

    • All-purpose (interactive) clustersarrow-up-right

    • Job clustersarrow-up-right

    hashtag
    Supported Access Mode and Languages

    Immuta supports the Custom access mode.

    • Supported Languages:

      • Python

      • SQL

      • R (requires advanced configuration; work with your Immuta support professional to use R)

      • Scala (requires advanced configuration; work with your Immuta support professional to use Scala)

    hashtag
    Supported Features

    The Immuta Databricks integration supports the following Databricks features:

    • Change Data Feed: Databricks users can see the Databricks Change Data Feedarrow-up-right on queried tables if they are allowed to read raw data and meet specific qualifications.

    • Databricks Libraries: Users can register their Databricks Libraries with Immuta as trusted libraries, allowing Databricks cluster administrators to avoid Immuta security manager errors when using third-party libraries.

    • External Metastores: Immuta supports the use of external metastores in local or remote mode.

    • : In addition to supporting direct file reads through workspace and scratch paths, Immuta allows direct file reads in Spark for file paths.

    hashtag
    Workspaces

    Users can have additional write access in their integration using project workspaces. Users can integrate a single or multiple workspaces with a single Immuta tenant. For more details, see the Databricks Project Workspaces page.

    hashtag
    Tag Ingestion

    The Immuta Databricks integration cannot ingest tags from Databricks, but you can connect any of these supported external catalogs to work with your integration.

    hashtag
    User Impersonation

    Impersonation allows users to query data as another Immuta user. To enable user impersonation, see the User Impersonation page.

    hashtag
    Query Audit

    circle-info

    Audit limitations

    Immuta will audit queries that come from interactive notebooks, notebook jobs, and JDBC connections, but will not audit Scala or R submit jobs. Furthermore, Immuta only audits Spark jobs that are associated with Immuta tables. Consequently, Immuta will not audit a query in a notebook cell that does not trigger a Spark job, unless immuta.spark.audit.all.queries is set to true; for more details about this configuration and auditing all queries in Databricks, see Limited Enforcement in Databricks.

    Capturing the code or query that triggers the Spark plan makes audit records more useful in assessing what users are doing.

    To audit the code or query that triggers the Spark plan, Immuta hooks into Databricks where notebook cells and JDBC queries execute and saves the cell or query text. Then, Immuta pulls this information into the audits of the resulting Spark jobs. Examples of a saved cell/query and the resulting audit record are provided on the Databricks query audit logs page.

    hashtag
    Multiple Databricks Instances

    A user can configure multiple integrations of Databricks to a single Immuta tenant and use them dynamically or with workspaces.

    hashtag
    Limitation

    Immuta does not support Databricks clusters with Photon acceleration enabled.

    Project Workspaces

    Databricks Tag Ingestion

    User Impersonation

    Query Audit

    Databricks integration page

    hashtag
    Architecture

    Once an Immuta Application Admin configures the Starburst (Trino) integration, the ImmutaSystemAccessControl plugin is installed on the coordinatorarrow-up-right. This plugin provides policy decisions to the Trino Execution Engine whenever an Immuta user queries a Starburst (Trino) table registered in Immuta. Then, the Trino Execution Engine applies policies to the backing catalogs and retrieves the data with appropriate policy enforcement.

    By default, this integration is designed to be minimally invasive: if a catalog is not registered as an Immuta data source, users will still have access to it in Starburst (Trino). However, this limited enforcement can be changed in the configuration file provided by Immuta. Additionally, you can continue to use Trino's file-based access control provider or Starburst (Trino) built-in access control system on catalogs that are not protected or controlled by Immuta.

    hashtag
    Policy enforcement

    When a user queries a table in Starburst (Trino), the Trino Execution Engine reaches out to the Immuta plugin to determine what the user is allowed to see:

    • masking policies: For each column, Starburst (Trino) requests a view expression from the Immuta plugin. If there is a masking policy on the column, the Immuta plugin returns the corresponding view expression for that column. Otherwise, nothing is returned.

    • row-level policies: For each table, Starburst (Trino) requests the rows a user can see in a table from Immuta. If there is a WHERE clause policy on the data source, Immuta returns the corresponding view expression as a WHERE clause. Otherwise, nothing is returned.

    The Immuta plugin then requests policy information about the tables being queried from the Immuta Web Service and sends this information to the Trino Execution Engine. Finally, the Trino Execution Engine constructs the SQL statement, executes it on the backing tables to apply the policies, and returns the response to the user.

    See the integration support matrix on the Data policy types reference guide for a list of supported data policy types in Starburst (Trino).

    hashtag
    System access control providers

    circle-info

    Users cannot bypass Immuta controls by changing roles in their system access control provider.

    Multiple system access control providers can be configured in the Starburst (Trino) integration. This approach allows Immuta to work with existing Starburst (Trino) installations that already have an access control provider configured.

    Immuta does not manage all permissions in Starburst (Trino) and will default to allowing access to anything Immuta does not manage so that the Starburst (Trino) integration complements existing controls. For example, if the Starburst (Trino) integration is configured to allow users write access to tables that are not protected by Immuta, you can still lock down write access for specific non-Immuta tables using an additional access control provider.

    If you have multiple access control providers configured, those providers interact in the following ways:

    • For a user to have access to a resource (catalog, schema, or a table), that user must have access in all of the configured access control providers.

    • In catalog, schema, or table filtering (such as show catalogs, show schemas, or show tables), the user will see the intersection of all access control providers. For example, if a Starburst (Trino) environment includes the catalogs public, demo, and restricted and one provider restricts a user from accessing the restricted catalog and another provider restricts the user from accessing the demo catalog, running show catalogs will only return the public catalog for that user.

    • Only one column masking policy can be applied per column across all system access control providers. If two or more access control providers return a mask for a column, Starburst (Trino) will throw an error at query time.

    • For row filtering policies, the expression for each system access control provider is applied one after the other.

    See the Starburst (Trino) integration configuration page for instructions on configuring multiple access control providers.

    hashtag
    Starburst (Trino) query passthrough

    Starburst (Trino) query passthrough is available in most connectors using the query table function or raw_query in the Elasticsearch connector. Consequently, Immuta blocks functions named raw_query or query, as those table functions would completely bypass Immuta’s access controls.

    For example, without blocking those functions, this query would access the public.customer table directly:

    select * from table(postgres.system.query(query => 'select * from public.customer limit 10'));

    You can add or remove functions that are blocked by Immuta in the Starburst (Trino) integration configuration file. See the Starburst (Trino) integration configuration page for instructions.

    hashtag
    Data flow

    1. An Immuta Application Administrator configures the Starburst (Trino) integration, adding the ImmutaSystemAccessControl plugin on their Starburst (Trino) node.

    2. A data owner registers Starburst (Trino) tables in Immuta as data sources. A data owner, data governor, or administrator creates or changes a policy or user in Immuta.

    3. Data source metadata, tags, user metadata, and policy definitions are stored in Immuta's Metadata Database.

    4. A Starburst (Trino) user who is subscribed to the data source in Immuta directly in their Starburst catalog.

    5. The Trino Execution Engine calls various methods on the interface to ask the ImmutaSystemAccessControl plugin where the policies should be applied. The masking and row-level security methods apply the actual policy expressions.

    6. The Immuta System Access Control plugin calls the Immuta Web Service to retrieve policy information for that data source for the querying user, using the querying user's project, purpose, and entitlements.

    7. The Immuta System Access Control plugin provides the SQL view expression (for masked columns) or WHERE clause SQL view expression (for row filtering) to the Trino Execution Engine.

    8. The Trino Execution Engine constructs and executes the SQL statement on the backing catalogs and retrieves the data with appropriate policy enforcement.

    9. User sees policy-enforced data.

    hashtag
    Authentication methods

    The Starburst (Trino) integration supports the following authentication methods to create data sources in Immuta:

    • Username and password: You can authenticate with your Starburst (Trino) username and password.

    • OAuth 2.0: You can authenticate with OAuth 2.0. Immuta's OAuth authentication method uses the Client Credentials Flowarrow-up-right; when you register a data source, Immuta reaches out to your OAuth server to generate a JSON web token (JWT) and then passes that token to the Starburst (Trino) cluster.

    hashtag
    OAuth Authentication for creating data sources

    circle-info

    Configure JWT authentication method in Starburst (Trino)

    When using OAuth authentication to create data sources in Immuta, configure your Starburst (Trino) cluster to use JWT authentication, not OpenID Connect or OAuth.

    When users query a Starburst (Trino) data source, Immuta sends a username with the view SQL so that policies apply in the right context. Since OAuth authentication does not require a username to be associated with a data source upon data source creation, Immuta does not send a username and Starburst (Trino) queries fail. To avoid this error, you must configure a global admin username.

    If you are using OAuth or asynchronous authentication to create Starburst (Trino) data sources, see the Starburst (Trino) configuration guide to set the globalAdminUsername property in the advanced configuration section of the Immuta app settings page.

    hashtag
    Supported Starburst (Trino) feature

    hashtag
    Starburst (Trino)-created logical view support

    Immuta policies can be applied to Starburst (Trino)-created logical viewsarrow-up-right.

    The descriptions below provide guidance for applying policies to Starburst (Trino)-created logical views in the

    • DEFINER security mode and

    • INVOKER security mode

    However, there are other approaches you can use to apply policies to Starburst (Trino)-created logical views. The examples below are the simplest approaches.

    hashtag
    Views created in the DEFINER security mode

    For views created using the DEFINER security mode,

    • ensure the user who created the view is configured as an admin user in the Immuta plugin so that policies are never applied to the underlying tables.

    • create Immuta data sources and apply policies to logical views exposing those tables.

    • lock down access to the underlying tables in Starburst (Trino) so that all end user access is provided through the views.

    hashtag
    Views created in the INVOKER security mode

    circle-info

    Applying policies to views or tables

    Avoid creating data policies for both a logical view and its underlying tables. Instead, apply policies to the logical view or the underlying tables.

    For views created using the INVOKER security mode, the querying user needs access to the logical view and underlying tables.

    • If non-Immuta table reads are disabled, provide access to the views and tables through Immuta. To do so, create Immuta data sources for the view and underlying tables, and grant access to the querying user in Immuta. If creating data policies, apply the policies to either the view or underlying tables, not both.

    • If non-Immuta table reads are enabled, the user already has access to the table and view. Create Immuta data sources and apply policies to the underlying table; this approach will enforce access controls for both the table and view in Starburst (Trino).

    hashtag
    Supported Immuta features

    • User impersonation: Impersonation allows users to query data as another Immuta user. To enable user impersonation, see the Integration user impersonation page.

    • Query audit: Immuta audits queries run in Starburst (Trino) against Starburst (Trino) data registered as Immuta data sources.

    • Multiple Starburst (Trino) instances

    hashtag
    Query audit

    The Immuta Trino Event Listener allows Immuta to translate events into comprehensive audit logs for users with the Immuta AUDIT permission to view. For more information about what is included in those audit logs, see the Starburst (Trino) audit logs page.

    Query audit is enabled by default on all Starburst (Trino) integrations, but you can disable it when configuring the integration with the following properties: immuta.audit.legacy.enabled and immuta.audit.uam.enabled.

    hashtag
    Multiple Starburst (Trino) integrations

    You can configure multiple Starburst (Trino) integrations with a single Immuta tenant and use them dynamically. Configure the integration once in Immuta to use it in multiple Starburst (Trino) clusters. However, consider the following limitations:

    • Names of catalogs cannot overlap because Immuta cannot distinguish among them.

    • A combination of cluster types on a single Immuta tenant is supported unless your Trino cluster is configured to use a proxy. In that case, you can only connect either Trino clusters or Starburst clusters to the same Immuta tenant.

    hashtag
    Policy caveat

    Limit your masked joins to columns with matching column types. Starburst truncates the result of the masking expression to conform to the native column type when performing the join, so joining two masked columns with different data types produces invalid results when one of the columns' lengths is less than the length of the masked value.

    For example, if the value of a hashed column is 64 characters, joining a hashed varchar(50) and a hashed varchar(255) column will not be joined correctly, since the varchar(50) value is truncated and doesn’t match the varchar(255) value.

    Starburstarrow-up-right
    Trinoarrow-up-right

    Red Hat OpenShift

    This is an OpenShift-specific guide on how to deploy Immuta with the following managed services:

    • Cloud-managed PostgreSQL

    • Cloud-managed Redis

    • Cloud-managed Elasticsearch

    hashtag
    Prerequisites

    Review the following criteria before proceeding with deploying Immuta.

    hashtag
    PostgreSQL

    1. The PostgreSQL instance has been provisioned and is actively running.

    2. The PostgreSQL instance's hostname/FQDN is .

    3. The PostgreSQL instance is .

    hashtag
    Redis

    1. The Redis instance has been provisioned and is actively running.

    2. The Redis instance's hostname/FQDN is .

    3. The Redis instance is .

    hashtag
    Elasticsearch

    1. The Elasticsearch instance has been provisioned and is actively running.

    2. The Elasticsearch instance's hostname/FQDN is .

    3. The Elasticsearch instance is .

    hashtag
    Authenticate with OCI registry

    circle-exclamation

    Helm chart availability

    The deprecated Immuta Helm chart (IHC) is not available from ocir.immuta.com.

    Copy the snippet below and replace the placeholder text with the credentials provided to you by your customer success manager:

    hashtag
    Setup

    1. Create a new OpenShift project named immuta for Immuta.

    2. Get the UID range allocated to the project. Each running container's UID must fall within this range. This value will be referenced later on.

    3. Get the GID range allocated to the project. Each running container's GID must fall within this range. This value will be referenced later on.

    hashtag
    Cloud-managed PostgreSQL

    circle-info

    Connecting to the database

    There are numerous ways to connect to a PostgreSQL database. This step demonstrates how to connect by creating an ephemeral Kubernetes pod.

    1. Connect to the database as superuser (postgres) by creating an ephemeral container inside the Kubernetes cluster. A shell prompt will not be displayed after executing the oc run command outlined below. Wait 5 seconds, and then proceed by entering a password.

    2. Create an immuta role and database.

    3. Revoke privileges from CURRENT_USER

    hashtag
    Install Immuta

    circle-exclamation

    Audit records

    Preserving legacy audit records

    Immuta does not migrate legacy audit records to the , so when you upgrade Immuta those audit records will be lost unless you enable the following setting in your immuta-values.yaml file:

    This section demonstrates how to deploy Immuta using the Immuta Enterprise Helm chart once the prerequisite cloud-managed services are configured.

    1. Create a Helm values file named immuta-values.yaml with the content below. Because the Ingress resource will be managed by an OpenShift route you will create when and not the Immuta Enterprise Helm chart, ingress is set to false below. TLS comes pre-configured with OpenShift, so tls is also set to false.

    2. Update all in the immuta-values.yaml

    circle-exclamation

    Avoid these special characters in generated passwords

    whitespace, $, &, :, \, /, '

    1. Deploy Immuta.

    hashtag
    Validation

    1. Wait for all pods in the namespace to become ready.

    2. Determine the name of the Secure service.

    3. Listen on local port 8080, forwarding TCP traffic to the Secure service's port named http.

    hashtag
    Next steps

    • to complete your installation and access your Immuta application.

    • .

    Configure a Snowflake Integration

    hashtag
    Permissions

    The permissions outlined in this section are the Snowflake privileges required for a basic configuration. See the Snowflake reference guide for a list of privileges necessary for additional features and settings.

    • APPLICATION_ADMIN Immuta permission

    • The Snowflake user running the installation script must have the following privileges:

      • CREATE DATABASE ON ACCOUNT WITH GRANT OPTION

    • The Snowflake user must have the following privileges on all securables:

      • USAGE on all databases and schemas with registered data sources

      • REFERENCES on all tables and views registered in Immuta

    circle-exclamation

    Different accounts

    The setup account used to enable the integration must be different from the account used to register data sources in Immuta.

    hashtag
    Configure the integration

    circle-exclamation

    Snowflake resource names: Use uppercase for the names of the Snowflake resources you create below.

    1. Click the App Settings icon in the navigation panel.

    2. Click the Integrations tab.

    3. Click the +Add Integration button and select Snowflake from the dropdown menu.

    hashtag
    Select your configuration method

    circle-exclamation

    in Snowflake at the account level may cause unexpected behavior of the Snowflake integration in Immuta

    The must be set to false (the default setting in Snowflake) at the account level. Changing this value to true causes unexpected behavior of the Snowflake integration.

    You have two options for configuring your Snowflake environment:

    • : Grant Immuta one-time use of credentials to automatically configure your Snowflake environment and the integration.

    • : Run the Immuta script in your Snowflake environment yourself to configure your Snowflake environment and the integration.

    hashtag
    Automatic setup

    Required permissions: When performing an automatic setup, the credentials provided must have the .

    The setup will use the provided credentials to create a user called IMMUTA_SYSTEM_ACCOUNT and grant the following privileges to that user:

    • CREATE ROLE ON ACCOUNT WITH GRANT OPTION

    • APPLY MASKING POLICY ON ACCOUNT WITH GRANT OPTION

    • APPLY ROW ACCESS POLICY ON ACCOUNT WITH GRANT OPTION

    Alternatively, you can use the and edit the provided script to grant the Immuta system account OWNERSHIP on the objects that Immuta will secure, instead of granting MANAGE GRANTS ON ACCOUNT. The current role that has OWNERSHIP on the securables will need to be granted to the Immuta system role. However, if granting OWNERSHIP instead of MANAGE GRANTS ON ACCOUNT, Immuta will not be able to manage the role that is granted to the account, so it is recommended to run the script as-is, without changes.

    circle-info

    These credentials will be used to create and configure a new IMMUTA database within the specified Snowflake instance. The credentials are not stored or saved by Immuta, and Immuta doesn’t retain access to them after initial setup is complete.

    You can create a new account for Immuta to use that has these privileges, or you can grant temporary use of a pre-existing account. By default, the pre-existing account with appropriate privileges is ACCOUNTADMIN. If you create a new account, it can be deleted after initial setup is complete.

    From the Select Authentication Method Dropdown, select one of the following authentication methods:

    • Username and Password (): Complete the Username, Password, and Role fields.

    • :

      1. Complete the Username field. This user must be .

    hashtag
    Manual setup

    Required permissions: When performing a manual setup, the Snowflake user running the script must have the .

    It will create a user called IMMUTA_SYSTEM_ACCOUNT, and grant the following privileges to that user:

    • CREATE ROLE ON ACCOUNT WITH GRANT OPTION

    • APPLY MASKING POLICY ON ACCOUNT WITH GRANT OPTION

    • APPLY ROW ACCESS POLICY ON ACCOUNT WITH GRANT OPTION

    Alternatively, you can grant the Immuta system account OWNERSHIP on the objects that Immuta will secure, instead of granting MANAGE GRANTS ON ACCOUNT. The current role that has OWNERSHIP on the securables will need to be granted to the Immuta system role. However, if granting OWNERSHIP instead of MANAGE GRANTS ON ACCOUNT, Immuta will not be able to manage the role that is granted to the account, so it is recommended to run the script as-is, without changes.

    hashtag
    Run the script

    1. Select Manual.

    2. Use the Dropdown Menu to select your Authentication Method:

      • Username and password (): Enter the Username and Password and set them in the bootstrap script for the Immuta system account credentials.

    hashtag
    Select available warehouses (optional)

    If you enabled a Snowflake workspace, select Warehouses from the dropdown menu that will be available to project owners when creating Snowflake workspaces. Select from a list of all the warehouses available to the privileged account entered above. Note that any warehouse accessible by the PUBLIC role does not need to be explicitly added.

    hashtag
    Select excepted roles and users

    Enter the Excepted Roles/User List. Each role or username (both case-sensitive) in this list should be separated by a comma. Wildcards are unsupported.

    circle-exclamation

    Excepted roles/users will have no policies applied to queries

    Any user with the username or acting under the role in this list will have no policies applied to them when querying Immuta protected Snowflake tables in Snowflake. Therefore, this list should be used for service or system accounts and the default role of the account used to create the data sources in the Immuta projects (if you have Snowflake workspace enabled).

    hashtag
    Save the configuration

    Click Save.

    hashtag
    Opt to enable Snowflake tag ingestion

    To allow Immuta to import table and column tags from Snowflake, enable in the external catalog section of the Immuta app settings page.

    hashtag
    Register data

    .

    Managed Public Cloud

    This is a guide on how to deploy Immuta on Kubernetes in the following managed public cloud providers:

    • Amazon Web Services (AWS)

    • Microsoft Azure

    • Google Cloud Platform (GCP)

    hashtag
    Prerequisites

    The following cloud-managed services must be provisioned before proceeding:

    hashtag
    Validation

    hashtag
    PostgreSQL

    1. The PostgreSQL instance's hostname/FQDN is .

    2. The PostgreSQL instance is .

    3. The Helm chart only supports username/password authentication for PostgreSQL. At this time, other authentication mechanisms are not supported.

    hashtag
    Elasticsearch

    1. The Elasticsearch instance's hostname/FQDN is .

    2. The Elasticsearch instance is .

    3. The user must have the .

    hashtag
    Authenticate with OCI registry

    circle-exclamation

    Helm chart availability

    The deprecated Immuta Helm chart (IHC) is not available from ocir.immuta.com.

    Copy the snippet below and replace the placeholder text with the credentials provided to you by your customer success manager:

    hashtag
    Setup

    1. Create a Kubernetes namespace named immuta for Immuta.

    2. Switch to namespace immuta.

    3. Create a container registry pull secret. Your credentials to authenticate with ocir.immuta.com can be viewed in your user profile at .

    hashtag
    PostgreSQL

    circle-info

    Connecting to the database

    There are numerous ways to connect to a PostgreSQL database. This step demonstrates how to connect by creating an ephemeral Kubernetes pod.

    1. Connect to the database as superuser (postgres) by creating an ephemeral container inside the Kubernetes cluster. A shell prompt will not be displayed after executing the kubectl run command outlined below. Wait 5 seconds, and then proceed by entering a password.

    2. Create an immuta role and database.

    3. Revoke privileges from CURRENT_USER

    hashtag
    Install Immuta

    circle-exclamation

    Audit records

    Preserving legacy audit records

    Immuta does not migrate legacy audit records to the , so when you upgrade Immuta those audit records will be lost unless you enable the following setting in your immuta-values.yaml file:

    This section demonstrates how to deploy Immuta using the Immuta Enterprise Helm chart once the prerequisite cloud-managed services are configured.

    1. Create a Helm values file named immuta-values.yaml with the following content:

    2. Update all in the immuta-values.yaml file.

    circle-exclamation

    Avoid these special characters in generated passwords

    whitespace, $, &, :, \, /, '

    1. Deploy Immuta.

    hashtag
    Validation

    1. Wait for all pods in the namespace to become ready.

    2. Determine the name of the Secure service.

    3. Listen on local port 8080, forwarding TCP traffic to the Secure service's port named http.

    hashtag
    Next steps

    • to complete your installation and access your Immuta application.

    • to secure your Ingress by specifying a Secret that contains a TLS private key and certificate.

    • .

    Deployment Requirements

    Immuta comprises three core services (Secure, Discover, and Detect) that rely on PostgreSQL and Elasticsearch to store their states. The illustration below shows the relationships among these services.

    The Immuta Enterprise Helm chart (IEHC) (represented by the yellow box above) does not deploy PostgreSQL or Elasticsearch, so you must deploy and manage them separately.

    Although Immuta recommends using Elasticsearch because it supports several new Immuta features and services, you can deploy Immuta without Elasticsearch. The table below outlines the Immuta features supported with and without Elasticsearch and the dependencies you must deploy and manage yourself.

    Immuta with Elasticsearch
    Immuta without Elasticsearch

    Environment Variables

    This page outlines configuration details for Immuta-enabled Databricks clusters. Databricks Administrators should place the desired configuration in the Spark environment variables (recommended) or immuta_conf.xml (not recommended).

    circle-info

    This page contains references to the term whitelist, which Immuta no longer uses. When the term is removed from the software, it will be removed from this page.

    circle-info

    %sql
    select * from immuta.my_data_source limit 5;
    %sql
    select * from my_data_source limit 5;
    kubectl create namespace immuta
    kubectl config set-context --current --namespace=immuta
    kubectl create secret docker-registry immuta-oci-registry \
        --docker-server=https://ocir.immuta.com \
        --docker-username="<username>" \
        --docker-password="<token>" \
        [email protected]
    master:
        masterOnly: false
        replicaCount: 1
    
    data:
        replicaCount: 0
    
    coordinating:
        replicaCount: 0
    
    ingest:
        replicaCount: 0
    helm install es-db oci://registry-1.docker.io/bitnamicharts/elasticsearch \
        --values es-values.yaml
    auth:
        database: immuta
        username: immuta
        password: <postgres-password>
    helm install pg-db oci://registry-1.docker.io/bitnamicharts/postgresql \
        --values pg-values.yaml
    secure:
      extraEnvVars:
        - name: FeatureFlag_auditLegacyViewHide
          value: "false"
    global:
      imageRegistry: ocir.immuta.com
      imagePullSecrets:
        - name: immuta-oci-registry
      imageRepositoryMap:
        immuta/immuta-service: stable/immuta-service
        immuta/immuta-db: stable/immuta-db
        immuta/immuta-fingerprint: stable/immuta-fingerprint
        immuta/audit-service: stable/audit-service
        immuta/audit-export-cronjob: stable/audit-export-cronjob
        immuta/classify-service: stable/classify-service
        immuta/cache: stable/cache
    
    audit:
      config:
        # Each Kubernetes Service has a DNS record associated with it. See: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
        # The anatomy of a domain name is as follows:
        #   <service>.<namespace>.svc.<cluster-domain>
        #
        # Where the default cluster domain is: cluster.local
        databaseConnectionString: postgres://immuta:<postgres-password>@pg-db-postgresql.immuta.svc.cluster.local:5432/immuta?schema=audit
        elasticsearchEndpoint: http://es-db-elasticsearch.immuta.svc.cluster.local:9200
        elasticsearchUsername: <elasticsearch-username>
        elasticsearchPassword: <elasticsearch-password>
    
    secure:
      ingress:
        enabled: false
      extraEnvVars:
        - name: FeatureFlag_AuditService
          value: "true"
        - name: FeatureFlag_detect
          value: "true"
        - name: FeatureFlag_auditLegacyViewHide
          value: "true"
    
      postgresql:
        # Each Kubernetes Service has a DNS record associated with it. See: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
        # The anatomy of a domain name is as follows:
        #   <service>.<namespace>.svc.<cluster-domain>
        #
        # Where the default cluster domain is: cluster.local
        host: pg-db-postgresql.immuta.svc.cluster.local
        port: 5432
        database: immuta
        username: immuta
        password: <postgres-password>
    helm install immuta oci://ocir.immuta.com/stable/immuta-enterprise \
        --values immuta-values.yaml \
        --version 2024.2.20
    kubectl wait --for=condition=Ready pods --all
    kubectl get service --selector "app.kubernetes.io/component=secure" --output name
    kubectl port-forward service/<name> 8080:http
    echo <token> | helm registry login --password-stdin --username <username> ocir.immuta.com

    Follow the steps in the Instructions to upload the init script to DBFS section.

  • Click Download Policies, and then manually add these Cluster Policies in Databricks.

  • Unavailable

    Cluster 3

    11.3

    ⛔

    ✅ / ⛔

    Unavailable

    Cluster 4

    11.3

    ✅

    ⛔

    ⛔

    Cluster 5

    11.3

    ✅

    ✅

    ✅

    Spark Direct File Reads
    Multiple Integrations
    queries the corresponding table
    The Helm chart only supports username/password authentication for PostgreSQL. At this time, other authentication mechanisms are not supported.
    The user must have the required permissions.
  • The Helm chart only supports username/password authentication for Elasticsearch. At this time, other authentication mechanisms are not supported.

  • Switch to project immuta.

  • Create a container registry pull secret. Your credentials to authenticate with ocir.immuta.com can be viewed in your user profile at support.immuta.comarrow-up-right.

  • as they're no longer required.
  • Enable the pgcrypto extension.

  • Type \q, and then press Enter to exit.

  • Audit record retention

    Immuta defaults to keeping audit records for 7 days. To change this duration, set the following values in the immuta-values.yaml file. The example below configures audit records to be kept for 90 days:

    file.
    resolvable from within the Kubernetes cluster
    accepting connections
    resolvable from within the Kubernetes cluster
    accepting connections
    resolvable from within the Kubernetes cluster
    accepting connections
    universal audit model (UAM)
    configuring Ingress
    placeholder values
    Configure Ingress
    Learn more about best practices for Immuta in Production

    Elastic Cloud on Azurearrow-up-right

    • Google Cloud SQL for PostgreSQLarrow-up-right

    • Elastic Cloud on Google Cloudarrow-up-right

    The Helm chart only supports username/password authentication for Elasticsearch. At this time, other authentication mechanisms are not supported.
    as they're no longer required.
  • Enable the pgcrypto extension.

  • Type \q, and then press Enter to exit.

  • Audit record retention

    Immuta defaults to keeping audit records for 7 days. To change this duration, set the following values in the immuta-values.yaml file. The example below configures audit records to be kept for 90 days:

  • Configure Ingress to complete your installation and access your Immuta application.

  • Configure TLS to secure your Ingress by specifying a Secret that contains a TLS private key and certificate.

  • Learn more about the best practices for Immuta in production.

    • Configure Ingress to complete your installation and access your Immuta application.

    • Configure TLS to secure your Ingress by specifying a Secret that contains a TLS private key and certificate.

    • Learn more about the best practices for Immuta in production.

    Amazon RDS for PostgreSQLarrow-up-right
    Amazon OpenSearcharrow-up-right
    Azure Database for PostgreSQLarrow-up-right
    resolvable from within the Kubernetes cluster
    accepting connections
    resolvable from within the Kubernetes cluster
    accepting connections
    required permissions
    support.immuta.comarrow-up-right
    universal audit model (UAM)
    placeholder values
    Configure Ingress
    Configure TLS
    Learn more about the best practices for Immuta in production
    kubectl wait --for=condition=Ready pods --all
    kubectl get pod --selector "app.kubernetes.io/name=postgresql" --output name
    kubectl exec --stdin --tty pod/<database-pod-name> -- psql -U immuta
    ALTER ROLE immuta SET search_path TO bometadata,public;
    CREATE EXTENSION pgcrypto;
    audit:
      deployment:
          extraEnvVars:
            - name: AUDIT_RETENTION_POLICY_IN_DAYS
              value: "90"
    oc project immuta
    oc create secret docker-registry immuta-oci-registry \
        --docker-server=https://ocir.immuta.com \
        --docker-username="<username>" \
        --docker-password="<token>" \
        [email protected]
    \c immuta
    CREATE EXTENSION pgcrypto;
    echo <token> | helm registry login --password-stdin --username <username> ocir.immuta.com
    oc new-project immuta
    oc get project immuta --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
    oc get project immuta --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
    oc run pgclient \
        --stdin \
        --tty \
        --rm \
        --image docker.io/bitnami/postgresql -- \
        psql --host <postgres-fqdn> --username <postgres-admin> --dbname postgres --port 5432 --password
    CREATE ROLE immuta with login encrypted password '<postgres-password>';
    
    GRANT immuta TO CURRENT_USER;
    
    CREATE DATABASE immuta OWNER immuta;
    
    GRANT all ON DATABASE immuta TO immuta;
    ALTER ROLE immuta SET search_path TO bometadata,public;
    secure:
      extraEnvVars:
        - name: FeatureFlag_auditLegacyViewHide
          value: "false"
    global:
      imageRegistry: ocir.immuta.com
      imagePullSecrets:
        - name: immuta-oci-registry
      imageRepositoryMap:
        immuta/immuta-service: stable/immuta-service
        immuta/immuta-db: stable/immuta-db
        immuta/immuta-fingerprint: stable/immuta-fingerprint
        immuta/audit-service: stable/audit-service
        immuta/audit-export-cronjob: stable/audit-export-cronjob
        immuta/classify-service: stable/classify-service
        immuta/cache: stable/cache
    
    audit:
      config:
        databaseConnectionString: postgres://immuta:<postgres-password>@pg-db-postgresql.immuta.svc.cluster.local:5432/immuta?schema=audit
        elasticsearchEndpoint: http://es-db-elasticsearch.immuta.svc.cluster.local:9200
        elasticsearchUsername: <elasticsearch-username>
        elasticsearchPassword: <elasticsearch-password>
    
      deployment:
        podSecurityContext:
          # A number that is within the project range:
          #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
          runAsUser: <user-id>
          # A number that is within the project range:
          #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
          runAsGroup: <group-id>
          seccompProfile:
            type: RuntimeDefault
          
        containerSecurityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
              - ALL
    
    discover:
      deployment:
        podSecurityContext:
          # A number that is within the project range:
          #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
          runAsUser: <user-id>
          # A number that is within the project range:
          #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
          runAsGroup: <group-id>
          seccompProfile:
            type: RuntimeDefault
          
        containerSecurityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
              - ALL
    
    secure:
      extraEnvVars:
        - name: FeatureFlag_AuditService
          value: "true"
        - name: FeatureFlag_detect
          value: "true"
        - name: FeatureFlag_auditLegacyViewHide
          value: "true"
    
      ingress:
        enabled: false
        tls: false
    
      postgresql:
        host: <postgres-fqdn>
        port: 5432
        database: immuta
        username: immuta
        password: <postgres-password>
        ssl: false
    
      web:
        podSecurityContext:
          # A number that is within the project range:
          #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
          runAsUser: <user-id>
          # A number that is within the project range:
          #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
          runAsGroup: <group-id>
          seccompProfile:
            type: RuntimeDefault
          
        containerSecurityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
              - ALL
    
      backgroundWorker:
        podSecurityContext:
          # A number that is within the project range:
          #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
          runAsUser: <user-id>
          # A number that is within the project range:
          #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
          runAsGroup: <group-id>
          seccompProfile:
            type: RuntimeDefault
          
        containerSecurityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
              - ALL
    helm install immuta oci://ocir.immuta.com/stable/immuta-enterprise \
        --values immuta-values.yaml \
        --version 2024.2.20
    oc wait --for=condition=Ready pods --all
    oc get service --selector "app.kubernetes.io/component=secure" --output name
    oc port-forward service/<name> 8080:http
    REVOKE immuta FROM CURRENT_USER;
    audit:
      deployment:
          extraEnvVars:
            - name: AUDIT_RETENTION_POLICY_IN_DAYS
              value: "90"
    \c immuta
    CREATE EXTENSION pgcrypto;
    echo <token> | helm registry login --password-stdin --username <username> ocir.immuta.com
    kubectl create namespace immuta
    kubectl config set-context --current --namespace=immuta
    kubectl create secret docker-registry immuta-oci-registry \
        --docker-server=https://ocir.immuta.com \
        --docker-username="<username>" \
        --docker-password="<token>" \
        [email protected]
    kubectl run pgclient \
        --stdin \
        --tty \
        --rm \
        --image docker.io/bitnami/postgresql -- \
        psql --host <postgres-fqdn> --username <postgres-admin> --dbname postgres --port 5432 --password
    CREATE ROLE immuta with login encrypted password '<postgres-password>';
    
    GRANT immuta TO CURRENT_USER;
    
    CREATE DATABASE immuta OWNER immuta;
    
    GRANT all ON DATABASE immuta TO immuta;
    ALTER ROLE immuta SET search_path TO bometadata,public;
    secure:
      extraEnvVars:
        - name: FeatureFlag_auditLegacyViewHide
          value: "false"
    global:
      imageRegistry: ocir.immuta.com
      imagePullSecrets:
        - name: immuta-oci-registry
      imageRepositoryMap:
        immuta/immuta-service: stable/immuta-service
        immuta/immuta-db: stable/immuta-db
        immuta/immuta-fingerprint: stable/immuta-fingerprint
        immuta/audit-service: stable/audit-service
        immuta/audit-export-cronjob: stable/audit-export-cronjob
        immuta/classify-service: stable/classify-service
        immuta/cache: stable/cache
    
    audit:
      config:
        databaseConnectionString: postgres://immuta:<postgres-password>@<postgres-fqdn>:5432/immuta?schema=audit
        elasticsearchEndpoint: <elasticsearch-endpoint>
        elasticsearchUsername: <elasticsearch-username>
        elasticsearchPassword: <elasticsearch-password>
    
    secure:
      ingress:
        enabled: false
        tls: false
      extraEnvVars:
        - name: FeatureFlag_AuditService
          value: "true"
        - name: FeatureFlag_detect
          value: "true"
        - name: FeatureFlag_auditLegacyViewHide
          value: "true"
    
      postgresql:
        host: <postgres-fqdn>
        port: 5432
        database: immuta
        username: immuta
        password: <postgres-password>
        ssl: true
    helm install immuta oci://ocir.immuta.com/stable/immuta-enterprise \
        --values immuta-values.yaml \
        --version 2024.2.20
    kubectl wait --for=condition=Ready pods --all
    kubectl get service --selector "app.kubernetes.io/component=secure" --output name
    kubectl port-forward service/<name> 8080:http
    REVOKE immuta FROM CURRENT_USER;
    audit:
      deployment:
          extraEnvVars:
            - name: AUDIT_RETENTION_POLICY_IN_DAYS
              value: "90"
    CREATE ROLE ON ACCOUNT WITH GRANT OPTION
  • CREATE USER ON ACCOUNT WITH GRANT OPTION

  • MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION

  • APPLY MASKING POLICY ON ACCOUNT WITH GRANT OPTION

  • APPLY ROW ACCESS POLICY ON ACCOUNT WITH GRANT OPTION

  • SELECT on all tables and views registered in Immuta

    Complete the Host, Port, and Default Warehouse fields.
  • Opt to check the Enable Project Workspace box. This will allow for managed write access within Snowflake. Note: Project workspaces still use Snowflake views, so the default role of the account used to create the data sources in the project must be added to the Excepted Roles List. This option is unavailable when table grants is enabled.

  • Opt to check the Enable Impersonation box and customize the Impersonation Role to allow users to natively impersonate another user. You cannot edit this choice after you configure the integration.

  • Snowflake query audit is enabled by default.

    1. Configure the audit frequency by scrolling to Integrations Settings and find the Snowflake Audit Sync Schedule section.

    2. Enter how often, in hours, you want Immuta to ingest audit events from Snowflake as an integer between 1 and 24.

    3. Continue with your integration configuration.

  • MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION

    When using an encrypted private key, enter the private key file password in the Additional Connection String Options. Use the following format: PRIV_KEY_FILE_PWD=<your_pw>

  • Click Key Pair (Required), and upload a Snowflake private key pair file.

  • Complete the Role field.

  • MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION

    Key Pair Authenticationarrow-up-right: Upload the Key Pair file and when using an encrypted private key, enter the private key file password in the Additional Connection String Options. Use the following format: PRIV_KEY_FILE_PWD=<your_pw>

  • Snowflake External OAuth:

    1. Create a security integration for your Snowflake External OAutharrow-up-right. Note that if you have an existing security integration, then the Immuta system role must be added to the existing EXTERNAL_OAUTH_ALLOWED_ROLES_LISTarrow-up-right. The Immuta system role will be the Immuta database provided above with _SYSTEM. If you used the default database name it will be IMMUTA_SYSTEM.

    2. Fill out the Token Endpoint. This is where the generated token is sent.

    3. Fill out the Client ID. This is the subject of the generated token.

    4. Select the method Immuta will use to obtain an access token:

      • Certificate

        1. Keep the Use Certificate checkbox enabled.

  • In the Setup section, click bootstrap script to download the script. Then, fill out the appropriate fields and run the bootstrap script in Snowflake.

  • registering data sources
    Altering parametersarrow-up-right
    QUOTED_IDENTIFIERS_IGNORE_CASE parameterarrow-up-right
    Automatic setup
    Manual setup
    permissions listed above
    manual setup method
    Not recommendedarrow-up-right
    Key Pair Authenticationarrow-up-right
    assigned the public key in Snowflakearrow-up-right
    permissions listed above
    Not recommendedarrow-up-right
    Snowflake tag ingestion
    Register Snowflake data in Immuta

    Immuta Detect

    ✅

    ❌

    Audit of Immuta and data platform events

    ✅

    ❌

    Legacy audit

    ✅ ()

    ✅ (Until October 2024)

    Immuta Monitors

    ✅

    ❌

    Sensitive data discovery

    ✅

    ✅

    For guidance on how to configure the IEHC to deploy Immuta with or without Elasticsearch, see one of the guides below:

    • Deploy Immuta with Elasticsearch

    • Deploy Immuta without Elasticsearch

    For more information about legacy features and services no longer enabled in the recommended deployment of Immuta, see the Legacy features and services section.

    hashtag
    Version requirements

    hashtag
    Kubernetes versions

    • Kubernetes 1.29 to 1.32

    hashtag
    Metadata database (PostgreSQL)

    triangle-exclamation

    PostgreSQL incompatibilities

    Immuta is not compatible with PostgreSQL abstraction layers, such as Amazon Aurora.

    • PostgreSQL 15.0 or newer

    • The pgcrypto extension must be enabled

    hashtag
    Elasticsearch

    • Elasticsearch v7 API or newer

    • AWS OpenSearch Service compatible with Elasticsearch v7 API or newer

      • AWS OpenSearch Serverless is not supported

    hashtag
    OpenSearch user

    The user provided during the install must have the following permissionsarrow-up-right:

    • cluster:monitor/health

    • indices:data/write/bulk*

    • indices:data/write/bulk

    • indices:data/read/search

    • indices:admin/exists

    • indices:admin/create

    • indices:admin/delete

    • indices:admin/settings/update

    • indices:admin/get

    • indices:data/write/delete/byquery

    • indices:data/write/index

    • indices:admin/mapping/put

    • indices:data/write/bulk

    • indices:data/write/bulk*

    Follow OpenSearch documentation to create the userarrow-up-right and add permissions.

    hashtag
    Cache (Redis/Memcached)

    circle-info

    Built-in cache

    The IEHC manages its own Memcached deployment inside the cluster. The key-value cache can optionally be externalized post installation.

    • Redis 7.0 or newer

    • Memcached 1.6 or newer

    hashtag
    Infrastructure recommendations

    Kubernetes distribution
    Ingress
    External metadata database
    External Elasticsearch

    Amazon Elastic Kubernetes Service (EKS)

    AWS Load Balancer Controller

    Azure Kubernetes Service (AKS)

    Azure Application Gateway Ingress Controller

    Google Kubernetes Engine (GKE)

    GKE Ingress Controller

    hashtag
    Legacy features and services

    Some legacy services and features are no longer enabled in the recommended configuration of the IEHC. The table below lists these features and provides links to documentation that outlines how to enable them in Immuta.

    Feature
    Immuta Enterprise Helm chart configuration

    Legacy audit

    Set each of the following secure.extraEnvVars in your immuta-values.yaml file to false:

    • FeatureFlag_AuditService

    • FeatureFlag_detect

    Legacy sensitive data discovery

    Data platforms

    • Amazon Redshift

    • Azure Synapse Analytics

    • Google BigQuery

    Policies

    • Masking with format preserving masking (unless using the Snowflake integration)

    • Masking with k-anonymization

    • Masking using randomized response (unless using the Snowflake integration)

    hashtag
    Next step

    Follow the Getting started guide to install Immuta.

    Dependencies

    Environment variable overrides

    Properties in the config file can be overridden during installation using environment variables. The variable names are the config names in all upper case with _ instead of .. For example, to set the value of immuta.base.url via an environment variable, you would set the following in the Environment Variables section of cluster configuration: IMMUTA_BASE_URL=https://immuta.mycompany.com

    • immuta.ephemeral.host.override

      • Default: true

      • Description: Set this to false if ephemeral overrides should not be enabled for Spark. When true, this will automatically override ephemeral data source httpPaths with the httpPath of the Databricks cluster running the user's Spark application.

    • immuta.ephemeral.host.override.httpPath

      • Description: This configuration item can be used if automatic detection of the Databricks httpPath should be disabled in favor of a static path to use for ephemeral overrides.

    • immuta.ephemeral.table.path.check.enabled

      • Default: true

      • Description: When querying Immuta data sources in Spark, the metadata from the Metastore is compared to the metadata for the target source in Immuta to validate that the source being queried exists and is queryable on the current cluster. This check typically validates that the target (database, table) pair exists in the Metastore and that the table’s underlying location matches what is in Immuta. This configuration can be used to disable location checking if that location is dynamic or changes over time.

    • immuta.spark.acl.enabled

      • Default: true

      • Description: Immuta Access Control List (ACL). Controls whether Databricks users are blocked from accessing non-Immuta tables. Ignored if Databricks Table ACLs are enabled (i.e., spark.databricks.acl.dfAclsEnabled=true

    • immuta.spark.acl.whitelist

      • Description: Comma-separated list of Databricks usernames who may access raw tables when the Immuta ACL is in use.

    • immuta.spark.acl.privileged.timeout.seconds

      • Default: 3600

      • Description: The number of seconds to cache privileged user status for the Immuta ACL. A privileged Databricks user is an admin or is whitelisted in immuta.spark.acl.whitelist

    • immuta.spark.acl.assume.not.privileged

      • Default: false

      • Description: Session property that overrides privileged user status when the Immuta ACL is in use. This should only be used in R scripts associated with spark-submit jobs.

    • immuta.spark.audit.all.queries

      • Default: false

      • Description: Enables auditing all queries run on a Databricks cluster, regardless of whether users touch Immuta-protected data or not.

    • immuta.spark.databricks.allow.non.immuta.reads

      • Default: false

      • Description: Allows non-privileged users to SELECT

    • immuta.spark.databricks.allow.non.immuta.writes

      • Default: false

      • Description: Allows non-privileged users to run DDL commands and data-modifying commands against tables or spaces that are not protected by Immuta. See for details about this feature.

    • immuta.spark.databricks.allowed.impersonation.users

      • Description: This configuration is a comma-separated list of Databricks users who are allowed to impersonate Immuta users.

    • immuta.spark.databricks.dbfs.mount.enabled

      • Default: false

      • Description: Exposes the DBFS FUSE mount located at /dbfs

    • immuta.spark.databricks.disabled.udfs

      • Description: Block one or more Immuta from being used on an Immuta cluster. This should be a Java regular expression that matches the set of UDFs to block by name (excluding the immuta database). For example to block all project UDFs, you may configure this to be ^.*_projects?$. For a list of functions, see the .

    • immuta.spark.databricks.filesystem.blacklist

      • Default: hdfs

      • Description: A list of filesystem protocols that this instance of Immuta will not support for workspaces. This is useful in cases where a filesystem is available to a cluster but should not be used on that cluster.

    • immuta.spark.databricks.jar.uri

      • Default: file:///databricks/jars/immuta-spark-hive.jar

      • Description: The location of immuta-spark-hive.jar

    • immuta.spark.databricks.local.scratch.dir.enabled

      • Default: true

      • Description: Creates a world-readable/writable scratch directory on local disk to facilitate the use of dbutils

    • immuta.spark.databricks.log.level

      • Default Value: INFO

      • Description: The SLF4J log level to apply to Immuta's Spark plugins.

    • immuta.spark.databricks.log.stdout.enabled

      • Default: false

      • Description: If true, writes logging output to stdout/the console as well as the log4j-active.txt

    • immuta.spark.databricks.py4j.strict.enabled

      • Default: true

      • Description: Disable to allow the use of the dbutils

    • immuta.spark.databricks.scratch.database

      • Description: This configuration is a comma-separated list of additional databases that will appear as scratch databases when running a SHOW DATABASE query. This configuration increases performance by circumventing the Metastore to get the metadata for all the databases to determine what to display for a SHOW DATABASE query; it won't affect access to the scratch databases. Instead, use immuta.spark.databricks.scratch.paths to control read and write access to the underlying database paths.

    • immuta.spark.databricks.scratch.paths

      • Description: Comma-separated list of remote paths that Databricks users are allowed to directly read/write. These paths amount to unprotected "scratch spaces." You can create a scratch database by configuring its specified location (or configure dbfs:/user/hive/warehouse/<db_name>.db for the default location).

        To create a scratch path to a location or a database stored at that location, configure

    • immuta.spark.databricks.scratch.paths.create.db.enabled

      • Default: false

      • Description: Enables non-privileged users to create or drop scratch databases.

    • immuta.spark.databricks.single.impersonation.user

      • Default: false

      • Description: When true, this configuration prevents users from changing their impersonation user once it has been set for a given Spark session. This configuration should be set when the BI tool or other service allows users to submit arbitrary SQL or issue SET commands.

    • immuta.spark.databricks.submit.tag.job

      • Default: true

      • Description: Denotes whether the Spark job will be run that "tags" a Databricks cluster as being associated with Immuta.

    • immuta.spark.databricks.trusted.lib.uris

      • Description:

    • immuta.spark.non.immuta.table.cache.seconds

      • Default: 3600

      • Description: The number of seconds Immuta caches whether a table has been exposed as a source in Immuta. This setting only applies when immuta.spark.databricks.allow.non.immuta.writes

    • immuta.spark.require.equalization

      • Default: false

      • Description: Requires that users act through a single, equalized project. A cluster should be equalized if users need to run Scala jobs on it, and it should be limited to Scala jobs only via spark.databricks.repl.allowedLanguages

    • immuta.spark.resolve.raw.tables.enabled

      • Default: true

      • Description: Enables use of the underlying database and table name in queries against a table-backed Immuta data source. Administrators or whitelisted users can set immuta.spark.session.resolve.raw.tables.enabled

    • immuta.spark.session.resolve.raw.tables.enabled

      • Default: true

      • Description: Same as above, but a session property that allows users to toggle this functionality. If users run set immuta.spark.session.resolve.raw.tables.enabled=false

    • immuta.spark.show.immuta.database

      • Default: true

      • Description: This shows the immuta database in the configured Databricks cluster. When set to

    • immuta.spark.version.validate.enabled

      • Default: true

      • Description: Immuta checks the versions of its artifacts to verify that they are compatible with each other. When set to true

    • immuta.user.context.class

      • Default: com.immuta.spark.OSUserContext

      • Description: The class name of the UserContext that will be used to determine the current user in immuta-spark-hive

    • immuta.user.mapping.iamid

      • Default: bim

      • Description: Denotes which IAM in Immuta should be used when mapping the current Spark user's username to a userid in Immuta. This defaults to Immuta's internal IAM (bim

    Immuta Integrations

    Immuta does not require users to learn a new API or language to access protected data. Instead, Immuta integrates with existing tools and ongoing work while remaining invisible to downstream consumers.

    The following data platforms integrate with Immuta:

    • : With this integration, policies administered in Immuta are pushed down into Snowflake as (row access policies and masking policies).

    • Databricks:

    Opt to fill out the Resource field with a URI of the resource where the requested token will be used.
  • Enter the x509 Certificate Thumbprint. This identifies the corresponding key to the token and is often abbreviated as `x5t` or is called `sub` (Subject).

  • Upload the PEM Certificate, which is the client certificate that is used to sign the authorization request.

  • Client secret

    1. Uncheck the Use Certificate checkbox.

    2. Enter the Scope (string). The scope limits the operations and roles allowed in Snowflake by the access token. See the OAuth 2.0 scopes documentationarrow-up-right for details about scopes.

    3. Enter the Client Secret (string). Immuta uses this secret to authenticate with the authorization server when it requests a token.

  • Note: This may lead to undefined behavior if the same table names exist in multiple workspaces but do not correspond to the same underlying data.
    ).
    .
    from tables that are not protected by Immuta. See
    for details about this feature.
    . Granular permissions are not possible, so all users will have read/write access to all objects therein.
    Note: Raw, unfiltered source data should never be stored in DBFS.
    on the filesystem for Databricks. This should not need to change unless a custom initialization script that places immuta-spark-hive in a non-standard location is necessary.
    and 3rd party libraries that may write to local disk. Its location is non-configurable and is stored in the environment variable
    IMMUTA_LOCAL_SCRATCH_DIR
    . Note: Sensitive data should not be stored at this location.
    file (default in Databricks).
    API in Python. Note: This setting should only be disabled for customers who employ a homogeneous integration (i.e., all users have the same level of data access).
    Additionally, this configuration will only display the scratch databases that are configured and will not validate that the configured databases exist in the Metastore. Therefore, it is up to the Databricks administrator to properly set this value and keep it current.
    To create a scratch path to a database created using the default location,
    or
    immuta.spark.databricks.allow.non.immuta.reads
    is enabled.
    .
    to
    false
    to bypass resolving raw databases or tables as Immuta data sources. This is useful if an admin wants to read raw data but is also an Immuta user. By default, data policies will be applied to a table even for an administrative user if that admin is also an Immuta user.
    , they will see raw data only (not Immuta data policy-enforced data).
    Note: This property is not set in
    immuta_conf.xml
    .
    false
    Immuta will no longer show this database when a
    SHOW DATABASES
    query is performed. However, queries can still be performed against tables in the
    immuta
    database using the Immuta-qualified table name (e.g.,
    immuta.my_schema_my_table
    ) regardless of whether or not this feature is enabled.
    , if versions are incompatible, that information will be logged to the Databricks driver logs and the cluster will not be usable. If a configuration file or the jar artifacts have been patched with a new version (and the artifacts are known to be compatible), this check can be set to
    false
    so that the versions don't get logged as incompatible and make the cluster unusable.
    . The default implementation gets the OS user running the JVM for the Spark application.
    ) but should be updated to reflect an actual production IAM.
    Limited Enforcement in Databricks
    user-defined functions (UDFs)
    project UDFs page
    Databricks Trusted Libraries
    Limited Enforcement in Databricks
    <property>
        <name>immuta.spark.databricks.scratch.paths</name>
        <value>s3://path/to/the/dir</value>
    </property>
    <property>
        <name>immuta.spark.databricks.scratch.paths</name>
        <value>s3://path/to/the/dir, dbfs:/user/hive/warehouse/any_db_name.db</value>
    </property>
  • FeatureFlag_auditLegacyViewHide

  • Legacy databases

    Google Cloud SQL for PostgreSQLarrow-up-right

    Elastic Cloud on Google Cloudarrow-up-right

    Red Hat OpenShift

    OpenShift Ingress Operator

    Cloud-managed PostgreSQL

    Cloud-managed Elasticsearch

    SUSE Rancher Government (RKE2)

    Ingress NGINX Controller

    Cloud-managed PostgreSQL

    Cloud-managed Elasticsearch

    SUSE K3s - For evaluation purposes only

    Traefik

    Cloud-managed PostgreSQL

    Cloud-managed Elasticsearch

    Externalized PostgreSQL
    Elasticsearch / OpenSearch
    Externalized PostgreSQL
    Disabled by default, but can be enabled
    Amazon RDS for PostgreSQLarrow-up-right
    Amazon OpenSearcharrow-up-right
    Azure Database for PostgreSQLarrow-up-right
    Elastic Cloud on Azurearrow-up-right
    Enable the query engine and fingerprint services
    Enable the query engine
    Enable the query engine and fingerprint services
  • Databricks Unity Catalog integration: This integration allows you to manage multiple Databricks workspaces through Unity Catalog while protecting your data with Immuta policies. Instead of manually creating UDFs or granting access to each table in Databricks, you can author your policies in Immuta and have Immuta manage and enforce Unity Catalog access-control policies on your data in Databricks clusters or SQL warehouse.

  • Databricks Spark integration: This integration enforces policies on Databricks tables registered as data sources in Immuta, allowing users to query policy-enforced data on Databricks clusters (including job clusters). Immuta policies are applied to the plan that Spark builds for users' queries, all executed directly against Databricks tables.

  • Google BigQuery: In this integration, Immuta generates policy-enforced views in your configured Google BigQuery dataset for tables registered as Immuta data sources.

  • Starburst (Trino) integration: The Starburst (Trino) integration allows you to access policy-protected data directly in your Starburst (Trino) catalogs without rewriting queries or changing your workflows. Immuta policies are translated into Starburst (Trino) rules and permissions and applied directly to tables within users’ existing catalogs.

  • Redshift integration: With the Redshift integration, Immuta applies policies directly in Redshift. This allows data analysts to query their data directly in Redshift instead of going through a proxy.

  • Azure Synapse Analytics integration: The Azure Synapse Analytics integration allows Immuta to apply policies directly in Azure Synapse Analytics dedicated SQL pools without needing users to go through a proxy. Instead, users can work within their existing Synapse Studio and have per-user policies dynamically applied at query time.

  • Amazon S3 integration: The Amazon S3 integration allows users to apply subscription policies to data in S3 to restrict what prefixes, buckets, or objects users can access. To enforce access controls on this data, Immuta creates S3 grants that are administered by S3 Access Grants, an AWS feature that defines access permissions to data in S3.

  • hashtag
    Feature support

    The table below outlines the features supported by each of Immuta's integrations.

    Project workspaces
    Tag ingestion
    User impersonation
    Query audit
    Multiple integrations

    Snowflake

    ✅

    ✅

    ✅

    ✅

    ✅

    Databricks Unity Catalog

    ❌

    ✅

    hashtag
    Policy support

    Certain policies are unsupported or supported with caveats*, depending on the integration:

    *Supported with Caveats:

    • On Databricks data sources, joins will not be allowed on data protected with replace with NULL or constant policies.

    • On Starburst data sources, the @iam interpolation function can block the creation of a view.

    For details about each of these policies, see the Policies in Immuta page.

    hashtag
    Audit support for platform queries

    The table below outlines what information is included in the query audit logs for each integration where query audit is supported.

    Snowflake
    Databricks Spark
    Databricks Unity Catalog
    Starburst (Trino)

    Table and user coverage

    Registered data sources and users

    Registered data sources and users

    All tables and users

    Registered data sources and users

    Object queried

    ✅

    ✅

    Limited support

    ✅

    Legend:

    • ✅ This is available and the information is included in audit logs.

    • ❌ This is not available and the information is not included in audit logs.

    Snowflake integration
    Snowflake governance featuresarrow-up-right

    Manual Databricks Configuration

    This guide details the manual installation method for enabling access to Databricks with Immuta policies enforced. Before proceeding, ensure your Databricks workspace, instance, and permissions meet the guidelines outlined in the Installation Introduction.

    circle-exclamation

    Databricks Unity Catalog: If Unity Catalog is enabled in a Databricks workspace, you must use an Immuta cluster policy when you setup the integration to create an Immuta-enabled cluster.

    circle-info

    The immuta_conf.xml file is no longer required

    The immuta_conf.xml file that was previously used to configure the Databricks integration is no longer required to install Immuta, so it is no longer staged as a deployment artifact. However, you can use these snippets if you wish to deploy an immuta_conf.xml file to set properties.

    The required Immuta base URL and Immuta system API key properties, along with any other valid properties, can still be specified as Spark environment variables or in the optional immuta_conf.xml file. As before, if the same property is specified in both locations, the Spark environment variable takes precedence.

    If you have an existing immuta_conf.xml file, you can continue using it. However, it's recommended that you delete any default properties from the file that you have not explicitly overridden, or remove the file completely and rely on Spark environment variables. Either method will ensure that any property defaults changed in upcoming Immuta releases are propagated to your environment.

    hashtag
    1 - Download and Configure Immuta Artifacts

    1. Navigate to the .

    2. Scroll to the release that corresponds to your Immuta version.

    3. Download the .jar file (Immuta plugin) as well as the other scripts listed below, which will load the plugin at cluster startup.

      The immuta-benchmark-suite.dbc is a collection of notebooks packaged as a .dbc file. After you have added cluster policies to your cluster, you can import this file into Databricks to run performance tests and compare a regular Databricks cluster to one protected by Immuta. Detailed instructions are available in the first notebook, which will require an Immuta and non-Immuta cluster to generate test data and perform queries.

    circle-info

    Environment variables with Google Cloud Platform

    Do not use environment variables to set sensitive properties when using Google Cloud Platform. Set them directly in immuta_conf.xml.

    hashtag
    2 - Stage Immuta Artifacts

    When configuring the Databricks cluster, a path will need to be provided to each of the artifacts downloaded/created in the previous step. To do this, those artifacts must be hosted somewhere that your Databricks instance can access. The following methods can be used for this step:

    • Host files in and provide access by the cluster

    • Host files in Gen 1 or Gen 2 and provide access by the cluster

    • Host files on an server accessible by the cluster

    These artifacts will be downloaded to the required location within the clusters file-system by the init script downloaded in the previous step. In order for the init script to find these files, a URI will have to be provided through environment variables configured on the cluster. Each method's URI structure and setup is explained below.

    hashtag
    AWS/S3

    URI Structure: s3://[bucket]/[path]

    1. Create an instance profile for clusters by following .

    2. Upload the configuration file, JSON file, and JAR file to an S3 bucket that the role from step 1 has access to.

    hashtag
    Authenticating with Access Keys or Session Tokens (Optional)

    If you wish to authenticate using access keys, add the following items to the cluster's environment variables:

    If you've assumed a role and received a session token, that can be added here as well:

    hashtag
    Azure

    hashtag
    ADL Gen 2

    URI Structure: abfs(s)://[container]@[account].dfs.core.windows.net/[path]

    Upload the configuration file, JSON file, and JAR file to an .

    Environment Variables:

    If you want to authenticate using an account key, add the following to your cluster's environment variables:

    If you want to authenticate using an Azure SAS token, add the following to your cluster's environment variables:

    hashtag
    ADL Gen 1

    URI Structure: adl://[account].azuredatalakestore.net/[path]

    Upload the configuration file, JSON file, and JAR file to .

    Environment Variables:

    If authenticating as a Microsoft Entra ID user,

    If authenticating using a service principal,

    hashtag
    HTTPS

    URI Structure: http(s)://[host](:port)/[path]

    Artifacts are available for download from Immuta using basic authentication. Your basic authentication credentials can be obtained from your Immuta support professional.

    hashtag
    Environment Variables (Optional)

    hashtag
    DBFS

    circle-exclamation

    DBFS does not support access control

    Any Databricks user can access DBFS via the Databricks command line utility. Files containing sensitive materials (such as Immuta API keys) should not be stored there in plain text. Use other methods described herein to properly secure such materials.

    URI Structure: dbfs:/[path]

    Upload the artifacts directly to using the .

    Since any user has access to everything in DBFS:

    1. The artifacts can be stored anywhere in DBFS.

    2. It's best to have a cluster-specific place for your artifacts in DBFS if you are testing to avoid overwriting or reusing someone else's artifacts accidentally.

    hashtag
    3 - Protect Immuta Environment Variables with Databricks Secrets

    It is important that non-administrator users on an Immuta-enabled Databricks cluster do not have access to view or modify Immuta configuration or the immuta-spark-hive.jar file, as this would potentially pose a security loophole around Immuta policy enforcement. Therefore, use to apply environment variables to an Immuta-enabled cluster in a secure way.

    Databricks secrets can be used in the Environment Variables configuration section for a cluster by referencing the secret path rather than the actual value of the environment variable. For example, if a user wanted to make the following value secret

    they could instead create a Databricks secret and reference it as the value of that variable. For instance, if the secret scope my_secrets was created, and the user added a secret with the key my_secret_env_var containing the desired sensitive environment variable, they would reference it in the Environment Variables section:

    Then, at runtime, {{secrets/my_secrets/my_secret_env_var}} would be replaced with the actual value of the secret if the owner of the cluster has access to that secret.

    circle-info

    Best practice: Replace sensitive variables with secrets

    Immuta recommends that any sensitive environment variables listed below in the various artifact deployment instructions be replaced with secrets.

    hashtag
    4 - Create and Configure the Cluster

    Cluster creation in an Immuta-enabled organization or Databricks workspace should be limited to administrative users to avoid allowing users to create non-Immuta enabled clusters.

    1. Create a cluster in Databricks by following the .

    2. Select the Custom Access mode.

    3. Opt to adjust the Autopilot Options and Worker Type settings. The default values provided here may be more than what is necessary for non-production or smaller use-cases. To reduce resource usage you can enable/disable autoscaling, limit the size and number of workers, and set the inactivity timeout to a lower value.

    hashtag
    Additional Hadoop Configuration File (Optional)

    As mentioned in the "Environment Variables" section of the cluster configuration, there may be some cases where it is necessary to add sensitive configuration to SparkSession.sparkContext.hadoopConfiguration in order to read the data composing Immuta data sources.

    As an example, when accessing external tables stored in Azure Data Lake Gen 2, Spark must have credentials to access the target containers/filesystems in ADLg2, but users must not have access to those credentials. In this case, an additional configuration file may be provided with a storage account key that the cluster may use to access ADLg2.

    To use an additional Hadoop configuration file, you will need to set the IMMUTA_INIT_ADDITIONAL_CONF_URI environment variable referenced in the section to be the full URI to this file.

    The additional configuration file looks very similar to the Immuta Configuration file referenced above. Some example configuration files for accessing different storage layers are below.

    hashtag
    Amazon S3

    circle-info

    IAM role for S3 access

    S3 can also be accessed using an IAM role attached to the cluster. See the for more details.

    hashtag
    Azure Data Lake Gen 2

    hashtag
    Azure Data Lake Gen 1

    circle-info

    ADL prefix: Prior to Databricks Runtime version 6, the following configuration items should have a prefix of dfs.adls rather than fs.adl

    hashtag
    Azure Blob Storage

    hashtag
    5 - Register Data

    .

    hashtag
    6 - Query Immuta Data

    When the Immuta enabled Databricks cluster has been successfully started, users will see a new database labeled "immuta". This database is the virtual layer provided to access data sources configured within the connected Immuta instance.

    Before users can query an Immuta data source, an administrator must give the user Can Attach To permissions on the cluster and GRANT the user access to the immuta database.

    The following SQL query can be run as an administrator within a journal to give the user access to "Immuta":

    Below are example queries that can be run to obtain data from an Immuta-configured data source. Because Immuta supports raw tables in Databricks, you do not have to use Immuta-qualified table names in your queries like the first example. Instead, you can run queries like the second example, which does not reference the .

    hashtag
    Creating a Databricks Data Source

    See the for a detailed walkthrough.

    hashtag
    Databricks to Immuta User Mapping

    By default, the IAM used to map users between Databricks and Immuta is the BIM (Immuta's internal IAM). The Immuta Spark plugin will check the Databricks username against the username within the BIM to determine access. For a basic integration, this means the users email address in Databricks and the connected Immuta tenant must match.

    It is possible within Immuta to have multiple users share the same username if they exist within different IAMs. In this case, the cluster can be configured to lookup users from a specified IAM. To do this, the value of immuta.user.mapping.iamid created and hosted in the previous steps must be updated to be the targeted IAM ID configured within the Immuta tenant. The IAM ID can be found on the . Each Databricks cluster can only be mapped to one IAM.

    Configure a Databricks Unity Catalog Integration

    allows you to manage and access data in your Databricks account across all of your workspaces. With Immuta’s Databricks Unity Catalog integration, you can write your policies in Immuta and have them enforced automatically by Databricks across data in your Unity Catalog metastore.

    hashtag
    Permissions

    The following permissions and personas are used in the registration process.

    ❌

    ✅

    ✅

    Databricks Spark

    ✅

    ❌

    ✅

    ✅

    ✅

    Google BigQuery

    ❌

    ❌

    ❌

    ❌

    ❌

    Starburst

    ❌

    ❌

    ✅

    ✅

    ✅

    Redshift

    ❌

    ❌

    ✅

    ❌

    ✅

    Azure Synapse Analytics

    ❌

    ❌

    ✅

    ❌

    ✅

    Amazon S3

    ❌

    ❌

    ❌

    ❌

    ✅

    Columns returned

    ✅

    ❌

    ❌

    ✅

    Query text

    ✅

    ✅

    Limited support

    ✅

    Unauthorized information

    Limited support

    ✅

    Limited support

    ❌

    Policy details

    ❌

    ✅

    ❌

    ❌

    User's entitlements

    ❌

    ✅

    ❌

    ❌

    Column tags

    ✅

    ❌

    ❌

    ✅

    Table tags

    ✅

    ❌

    ❌

    ❌

    spinner
    Note: Use Spark 2 with Databricks Runtime prior to 7.x. Use Spark 3 with Databricks Runtime 7.x or later. Attempting to use an incompatible jar and Databricks Runtime will fail.
  • Specify the following properties as Spark environment variables or in the optional immuta_conf.xml file. If the same property is specified in both locations, the Spark environment variable takes precedence. The variable names are the config names in all upper case with _ instead of .. For example, to set the value of immuta.base.url via an environment variable, you would set the following in the Environment Variables section of cluster configuration: IMMUTA_BASE_URL=https://immuta.mycompany.com

    • immuta.system.api.key: Obtain this value from the under HDFS > System API Key. You will need to be a user with the APPLICATION_ADMIN role to complete this action. Generating a key will destroy any previously generated HDFS keys. This will cause previously integrated HDFS systems to lose access to your Immuta console. The key will only be shown once when generated.

    • immuta.base.url: The full URL for the target Immuta tenant Ex: https://immuta.mycompany.com.

    • immuta.user.mapping.iamid: If users authenticate to Immuta using an IAM different from Immuta's built-in IAM, you need to update the configuration file to reflect the ID of that IAM. The IAM ID is shown within the Immuta App Settings page within the Identity Management section. See for more details.

  • Host files in DBFS (Not recommended for production)

    In the Advanced Options section, click the Instances tab.

    • IAM Role (AWS ONLY): Select the instance role you created for this cluster. (For access key authentication, you should instead use the environment variables listed in the AWS section.)

  • Click the Spark tab. In Spark Config field, add your configuration.

    • Cluster Configuration Requirements:

  • In the Environment Variables section, add the environment variables necessary for your configuration. Remember that these variables should be protected with Databricks secrets as mentioned above.

  • Click the Init Scripts tab and set the following configurations:

    • Destination: Specify the service you used to host the Immuta artifacts.

    • File Path: Specify the full URI to the immuta_cluster_init_script.sh.

    • Add the new key/value to the configuration.

  • Click the Permissions tab and configure the following setting:

    • Who has access: Users or groups will need to have the permission Can Attach To to execute queries against Immuta configured data sources.

  • (Re)start the cluster.

  • Immuta GitHub repositoryarrow-up-right
    AWS/S3
    Azure ADL
    HTTPS
    Databricks documentationarrow-up-right
    ADL gen 2 blob containerarrow-up-right
    ADL gen 1arrow-up-right
    DBFSarrow-up-right
    Databricks CLIarrow-up-right
    Databricks secretsarrow-up-right
    Databricks documentationarrow-up-right
    Create and configure the cluster
    Databricks documentationarrow-up-right
    Register Databricks securables in Immuta
    immuta database
    Databricks Data Source Creation guide
    App Settings page
    Immuta user: An Immuta user with the APPLICATION_ADMIN Immuta permission must configure the Databricks Unity Catalog integration.
  • Databricks user: The Databricks user running the installation script must have the following privileges.

    • Account admin

    • CREATE CATALOG privilege on the Unity Catalog metastore to create an Immuta-owned catalog and tables

    • Metastore admin (only required if enabling query audit)

  • Databricks service principal:

    • USE CATALOG and MANAGE on all catalogs containing securables registered as Immuta data sources and USE SCHEMA on all schemas containing securables registered as Immuta data sources.

    • MODIFY and SELECT on all securables registered as Immuta data sources. MANAGE and MODIFY are required so that the service principal can apply row filters and column masks on the securable; to do so, the service principal must also have SELECT on the securable as well as USE CATALOG on its parent catalog and USE SCHEMA on its parent schema. Since privileges are inherited, you can grant the service principal the MODIFY and SELECT privilege on all catalogs or schemas containing Immuta data sources, which automatically grants the service principal the

    • Optionally, to include audit, the service principal needs the following additional privileges:

      • USE CATALOG on system catalog

        • USE SCHEMA

  • See the Databricks documentationarrow-up-right for more details about Unity Catalog privileges and securable objects.

    hashtag
    Requirements

    Before you configure the Databricks Unity Catalog integration, ensure that you have fulfilled the following requirements:

    • Unity Catalog metastore createdarrow-up-right and attached to a Databricks workspace. Immuta supports configuring a single metastore for each configured integration, and that metastore may be attached to multiple Databricks workspaces.

    • Unity Catalog enabled on your Databricks cluster or SQL warehouse. All SQL warehouses have Unity Catalog enabled if your workspace is attached to a Unity Catalog metastore. Immuta recommends linking a SQL warehouse to your Immuta tenant rather than a cluster for both performance and availability reasons.

    • A service principalarrow-up-right with the Databricks permissions outlined above has been created for Immuta to use to manage policies in Unity Catalog.

    • for query audit.

    • If you select single user access mode for your cluster, you must

      • use Databricks Runtime 15.4 LTS and above. Unity Catalog row- and column-level security controls are unsupported for single user access mode on Databricks Runtime 15.3 and below. See the for details.

      • enable serverless compute for your workspace.

    circle-info

    Unity Catalog best practices

    Ensure your integration with Unity Catalog goes smoothly by following these guidelines:

    • Use a Databricks SQL warehouse to configure the integration. Databricks SQL warehouses are faster to start than traditional clusters, require less management, and can run all the SQL that Immuta requires for policy administration. A serverless warehouse provides nearly instant startup time and is the preferred option for connecting to Immuta.

    • Move all data into Unity Catalog before configuring Immuta with Unity Catalog. The default catalog used once Unity Catalog support is enabled in Immuta is the hive_metastore, which is not supported by the Unity Catalog integration. Data sources in the Hive Metastore must be managed by the Databricks Spark integration. Existing data sources will need to be re-created after they are moved to Unity Catalog and the Unity Catalog integration is configured.

    hashtag
    Migrate data to Unity Catalog

    1. Ensure that all Databricks clusters that have Immuta installed are stopped and the Immuta configuration is removed from the cluster. Immuta-specific cluster configuration is no longer needed with the Databricks Unity Catalog integration.

    2. Move all data into Unity Catalog before configuring Immuta with Unity Catalog. Existing data sources will need to be re-created after they are moved to Unity Catalog and the Unity Catalog integration is configured. If you don't move all data before configuring the integration, metastore magic will protect your existing data sources throughout the migration process.

    hashtag
    Configure the Databricks Unity Catalog integration

    circle-info

    Existing data source migration: If you have existing Databricks data sources, complete these migration steps before proceeding.

    You have two options for configuring your Databricks Unity Catalog integration:

    • Automatic setup: Immuta creates the catalogs, schemas, tables, and functions using the integration's configured personal access token.

    • Manual setup: Run the Immuta script in Databricks yourself to create the catalog. You can also modify the script to customize your storage location for tables, schemas, or catalogs.

    hashtag
    Automatic setup

    Required permissions: When performing an automatic setup, the credentials provided must have the permissions listed above.

    1. Click the App Settings icon in the left sidebar.

    2. Scroll to the Global Integrations Settings section and check the Enable Databricks Unity Catalog support in Immuta checkbox.

    3. Click the Integrations tab.

    4. Click + Add Integration and select Databricks Unity Catalog from the dropdown menu.

    5. Complete the following fields:

      • Server Hostname is the hostname of your Databricks workspace.

      • HTTP Path is the HTTP path of your Databricks cluster or SQL warehouse.

    circle-exclamation

    Create a separate Immuta catalog for each Immuta tenant

    If multiple Immuta tenants are connected to your Databricks environment, create a separate Immuta catalog for each of those tenants. Having multiple Immuta tenants use the same Immuta catalog causes failures in policy enforcement.

    1. If using a proxy server with Databricks Unity Catalog, click the Enable Proxy Support checkbox and complete the Proxy Host and Proxy Port fields. The username and password fields are optional.

    2. Opt to fill out the Exemption Group field with the name of an account-level group in Databricks that must be exempt from having data policies applied. This group is created and managed in Databricks and should only include privileged users and service accounts that require an unmasked view of data. Create this group in Databricks before configuring the integration in Immuta.

    circle-info

    Exemption group cannot be changed after configuration is saved

    The exemption group field cannot be edited after you save the integration configuration. If you need to change this group name, you can choose one of the following options:

    • Update the group name in Databricks to match what you have configured here.

    • Delete the integration in Immuta and create a new Databricks Unity Catalog integration with the new exemption group name.

    For details about policy exemption groups, see the .

    1. Unity Catalog query audit is enabled by default. Ensure you have enabled system tables in Unity Catalog and provided the required access to the Immuta service principal.

    2. Opt to scope the query audit ingestion by entering in Unity Catalog Workspace IDs. Enter a comma-separated list of the workspace IDs that you want Immuta to ingest audit records for. If left empty, Immuta will audit all tables and users in Unity Catalog.

    3. Configure the audit frequency by scrolling to Integrations Settings and find the Unity Catalog Audit Sync Schedule section.

    4. Enter how often, in hours, you want Immuta to ingest audit events from Unity Catalog as an integer between 1 and 24.

    5. Continue with your integration configuration.

    6. Select your authentication method from the dropdown:

      • Access Token: Enter a Databricks Personal Access Token. This is the access token for the Immuta service principal. This service principal must have the for the metastore associated with the Databricks workspace. If this token is configured to expire, update this field regularly for the integration to continue to function.

      • OAuth machine-to-machine (M2M):

    7. Click Save.

    hashtag
    Manual setup

    Required permissions: When performing a manual setup, the Databricks user running the script must have the permissions listed above.

    1. Click the App Settings icon in the left sidebar.

    2. Scroll to the Global Integrations Settings section and check the Enable Databricks Unity Catalog support in Immuta checkbox.

    3. Click the Integrations tab.

    4. Click + Add Integration and select Databricks Unity Catalog from the dropdown menu.

    5. Complete the following fields:

      • Server Hostname is the hostname of your Databricks workspace.

      • HTTP Path is the HTTP path of your Databricks cluster or SQL warehouse.

    circle-exclamation

    Create a separate Immuta catalog for each Immuta tenant

    If multiple Immuta tenants are connected to your Databricks environment, create a separate Immuta catalog for each of those tenants. Having multiple Immuta tenants use the same Immuta catalog causes failures in policy enforcement.

    1. If using a proxy server with Databricks Unity Catalog, click the Enable Proxy Support checkbox and complete the Proxy Host and Proxy Port fields. The username and password fields are optional.

    2. Opt to fill out the Exemption Group field with the name of an account-level group in Databricks that must be exempt from having data policies applied. This group is created and managed in Databricks and should only include privileged users and service accounts that require an unmasked view of data. Create this group in Databricks before configuring the integration in Immuta.

    circle-info

    Exemption group cannot be changed after configuration is saved

    The exemption group field cannot be edited after you save the integration configuration. If you need to change this group name, you can choose one of the following options:

    • Update the group name in Databricks to match what you have configured here.

    • Delete the integration in Immuta and create a new Databricks Unity Catalog integration with the new exemption group name.

    For details about policy exemption groups, see the .

    1. Unity Catalog query audit is enabled by default. Ensure you have enabled system tables in Unity Catalog and provided the required access to the Immuta service principal.

    2. Opt to scope the query audit ingestion by entering in Unity Catalog Workspace IDs. Enter a comma-separated list of the workspace IDs that you want Immuta to ingest audit records for. If left empty, Immuta will audit all tables and users in Unity Catalog.

    3. Configure the audit frequency by scrolling to Integrations Settings and find the Unity Catalog Audit Sync Schedule section.

    4. Enter how often, in hours, you want Immuta to ingest audit events from Unity Catalog as an integer between 1 and 24.

    5. Continue with your integration configuration.

    6. Select your authentication method from the dropdown:

      • Access Token: Enter a Databricks Personal Access Token. This is the access token for the Immuta service principal. This service principal must have the for the metastore associated with the Databricks workspace. If this token is configured to expire, update this field regularly for the integration to continue to function.

      • OAuth machine-to-machine (M2M):

    7. Select the Manual toggle and copy or download the script. You can modify the script to customize your storage location for tables, schemas, or catalogs.

    8. Run the script in Databricks.

    9. Click Save.

    hashtag
    Enable query audit for Unity Catalog

    To enable query audit for Unity Catalog, complete the following steps before configuring the integration:

    1. Enable a system schema where the <SCHEMA_NAME> is accessarrow-up-right.

    2. Grant the Immuta service principal access to the Databricks Unity Catalog system tablesarrow-up-right. For Databricks Unity Catalog audit to work, Immuta must have, at minimum, the following access.

      • USE CATALOG on the system catalog

      • USE SCHEMA on the system.access schema

      • SELECT on the following system tables:

        • system.access.audit

        • system.access.table_lineage

    3. .

    4. Use the Databricks Personal Access Token in the configuration above for the account you just granted system table access. This account will be the Immuta service principal.

    hashtag
    Map Databricks users to Immuta

    If the usernames in Immuta do not match usernames in Databricks, map each Databricks username to each Immuta user account to ensure Immuta properly enforces policies using one of the methods linked below:

    • Map the external IDs from an external identity manager

    • Manually map the external IDs on the user's profile page

    If the Databricks user doesn't exist in Databricks when you configure the integration, manually link their Immuta username to Databricks after they are created in Databricks. Otherwise, policies will not be enforced correctly for them in Databricks. Databricks user identities for Immuta users are automatically marked as invalid when the user is not found during policy application, preventing them from being affected by Databricks policy until their Immuta user identity is manually mapped to their Databricks identity.

    hashtag
    Register data

    Register Databricks securables in Immuta.

    Databricks Unity Catalogarrow-up-right

    Databricks Unity Catalog Integration Reference Guide

    Immuta’s integration with Unity Catalog allows you to enforce fine-grained access controls on Unity Catalog securable objects with Immuta policies. Instead of manually creating UDFs or granting access to each table in Databricks, you can author your policies in Immuta and have Immuta manage and orchestrate Unity Catalog access-control policies on your data in Databricks clusters or SQL warehouses:

    • Subscription policies: Immuta subscription policies automatically grant and revoke access to specific Databricks securable objects.

    • : Immuta data policies enforce row- and column-level security.

    spark.executor.extraJavaOptions -Djava.security.manager=com.immuta.security.ImmutaSecurityManager /
        -Dimmuta.security.manager.classes.config=file:///databricks/immuta/allowedCallingClasses.json /
        -Dimmuta.spark.encryption.fpe.class=com.immuta.spark.encryption.ff1.ImmutaFF1Service
    spark.driver.extraJavaOptions -Djava.security.manager=com.immuta.security.ImmutaSecurityManager /
        -Dimmuta.security.manager.classes.config=file:///databricks/immuta/allowedCallingClasses.json /
        -Dimmuta.spark.encryption.fpe.class=com.immuta.spark.encryption.ff1.ImmutaFF1Service
    spark.databricks.repl.allowedLanguages python,sql
    spark.databricks.pyspark.enableProcessIsolation true
    spark.databricks.isv.product Immuta
    # Specify the URI to the artifacts that were hosted in the previous steps
    # The URI must adhere to the supported types for each service mentioned above
    IMMUTA_INIT_JAR_URI=<Full URI to immuta-spark-hive.jar>
    IMMUTA_INIT_CONF_URI=<Full URI to Immuta configuration file>
    IMMUTA_INIT_ALLOWED_CALLING_CLASSES_URI=<full URI to allowedCallingClasses.json>
    IMMUTA_INIT_OBSCURED_COMMANDS_URI=<full URI to obscuredCommands.yaml>
    
    # (OPTIONAL)
    # Specify an additional configuration file to be added to the spark.sparkContext.hadoopConfiguration.
    # This file allows administrators to add sensitive configuration needed by the SparkSession that
    # should not viewable by users.
    # Further explanation of this variable as well as examples are provided below.
    IMMUTA_INIT_ADDITIONAL_CONF_URI=<full URI to additional configuration file>
    allowedCallingClasses.json
    immuta-benchmark-suite.dbc
    immuta-spark-hive-X.X.X_YYYYMMDD-hadoop-Z.Z.Z-public.jar
    immuta_cluster_init_script.sh
    obscuredCommands.yaml
    IMMUTA_INIT_AWS_SECRET_ACCESS_KEY=<aws secret key>
    IMMUTA_INIT_AWS_ACCESS_KEY_ID=<aws access key id>
    IMMUTA_INIT_AWS_SESSION_TOKEN=<aws session token>
    IMMUTA_INIT_AZCOPY_CRED_TYPE=SharedKey
    IMMUTA_INIT_ACCOUNT_NAME=<ADLg2 account name>
    IMMUTA_INIT_ACCOUNT_KEY=<ADLg2 account key>
    IMMUTA_INIT_AZURE_SAS_TOKEN=<SAS token>
    IMMUTA_INIT_AZURE_AD_USER=<Microsoft Entra ID username>
    IMMUTA_INIT_AZURE_PASSWORD=<Microsoft Entra ID password>
    IMMUTA_INIT_AZURE_SERVICE_PRINCIPAL=<azure service principal>
    IMMUTA_INIT_AZURE_PASSWORD=<azure service principal password>
    IMMUTA_INIT_AZURE_TENANT=<tenant ID where principal was created>
    IMMUTA_INIT_HTTPS_USER=<basic auth username>
    IMMUTA_INIT_HTTPS_PASSWORD=<basic auth password>
    MY_SECRET_ENV_VAR=super_secret_stuff
    MY_SECRET_ENV_VAR={{secrets/my_secrets/my_secret_env_var}}
    <configuration>
        <property>
            <name>fs.s3n.awsAccessKeyId</name>
            <value>[AWS access key ID]</value>
        </property>
        <property>
            <name>fs.s3n.awsSecretAccessKey</name>
            <value>[AWS secret key]</value>
        </property>
    </configuration>
    <configuration>
        <property>
            <name>fs.azure.account.key.[storage account name].dfs.core.windows.net</name>
            <value>[storage account key]</value>
        </property>
    </configuration>
    <configuration>
        <property>
            <name>fs.adl.oauth2.refresh.url</name>
            <value>https://login.microsoftonline.com/[directory ID]/oauth2/token</value>
        </property>
        <property>
            <name>fs.adl.oauth2.access.token.provider.type</name>
            <value>ClientCredential</value>
        </property>
        <property>
            <name>fs.adl.oauth2.credential</name>
            <value>[client secret from Azure]</value>
        </property>
        <property>
            <name>fs.adl.oauth2.client.id</name>
            <value>[client ID from Azure]</value>
        </property>
    </configuration>
    <configuration>
        <property>
            <name>fs.azure.account.key.[storage account name].blob.core.windows.net</name>
            <value>[storage account key]</value>
        </property>
    </configuration>
    %sql
    GRANT SELECT,READ_METADATA ON DATABASE immuta TO `[email protected]`
    %sql
    select * from immuta.my_data_source limit 5;
    %sql
    select * from my_data_source limit 5;
    Immuta Configuration UI
    Databricks to Immuta User Mapping
    MODIFY
    and
    SELECT
    privilege on all current and future securables in the catalog or schema. The service principal also inherits
    MANAGE
    from the parent catalog for the purpose of applying row filters and column masks, but that privilege must be set directly on the parent catalog in order for grants to be fully applied.
    on
    system.access
    schema
  • SELECT on system.access.audit table

  • SELECT on system.access.table_lineage table

  • SELECT on system.access.column_lineage table

  • Access to system tables is governed by Unity Catalog. No user has access to these system schemas by default. To grant access, a user that is both a metastore admin and an account admin must grant USE and SELECT permissions on the system schemas to the service principal. See Manage privileges in Unity Catalogarrow-up-right. The system.access schema must also be enabledarrow-up-right on the metastore before it can be used.

    Immuta Catalog is the name of the catalog Immuta will create to store internal entitlements and other user data specific to Immuta. This catalog will only be readable for the Immuta service principal and should not be granted to other users. The catalog name may only contain letters, numbers, and underscores and cannot start with a number.

    AWS Databricks:

    • Follow Databricks documentation to create a client secretarrow-up-right for the Immuta service principal and assign this service principal the privileges listed above for the metastore associated with the Databricks workspace.

    • Fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.cloud.databricks.com/oidc/v1/token.

    • Fill out the Client ID. This is a combination of letters, numbers, or symbols, used as a public identifier and is the .

    • Enter the Scope (string). The scope limits the operations and roles allowed in Databricks by the access token. See the for details about scopes.

    • Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.

  • Azure Databricks:

    • Follow Databricks documentationarrow-up-right to create a service principal within Azure and then populate to your Databricks account and workspace.

    • Assign this service principal the privileges listed above for the metastore associated with the Databricks workspace.

    • Within Databricks, . This completes your Databricks-based service principal setup.

    • Within Immuta, fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.azuredatabricks.net/oidc/v1/token.

    • Fill out the Client ID. This is a combination of letters, numbers, or symbols, used as a public identifier and is the (note that Azure Databricks uses the Azure SP Client ID; it will be identical).

    • Enter the Scope (string). The scope limits the operations and roles allowed in Databricks by the access token. See the for details about scopes.

    • Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.

  • Immuta Catalog is the name of the catalog Immuta will create to store internal entitlements and other user data specific to Immuta. This catalog will only be readable for the Immuta service principal and should not be granted to other users. The catalog name may only contain letters, numbers, and underscores and cannot start with a number.

    AWS Databricks:

    • Follow Databricks documentation to create a client secretarrow-up-right for the Immuta service principal and assign this service principal the privileges listed above for the metastore associated with the Databricks workspace.

    • Fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.cloud.databricks.com/oidc/v1/token.

    • Fill out the Client ID. This is a combination of letters, numbers, or symbols, used as a public identifier and is the .

    • Enter the Scope (string). The scope limits the operations and roles allowed in Databricks by the access token. See the for details about scopes.

    • Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.

  • Azure Databricks:

    • Follow Databricks documentationarrow-up-right to create a service principal within Azure and then populate to your Databricks account and workspace.

    • Assign this service principal the privileges listed above for the metastore associated with the Databricks workspace.

    • Within Databricks, . This completes your Databricks-based service principal setup.

    • Within Immuta, fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.azuredatabricks.net/oidc/v1/token.

    • Fill out the Client ID. This is a combination of letters, numbers, or symbols, used as a public identifier and is the (note that Azure Databricks uses the Azure SP Client ID; it will be identical).

    • Enter the Scope (string). The scope limits the operations and roles allowed in Databricks by the access token. See the for details about scopes.

    • Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.

  • system.access.column_lineage

    Unity Catalog system tables enabled
    Databricks documentationarrow-up-right
    Databricks Unity Catalog reference guide
    metastore privileges listed above
    Databricks Unity Catalog reference guide
    metastore privileges listed above
    Enable verbose audit logs in Unity Catalogarrow-up-right
    hashtag
    Unity Catalog object model

    Unity Catalog uses the following hierarchy of data objects:

    • Metastore: Created at the account level and is attached to one or more Databricks workspaces. The metastore contains metadata of all the catalogs, schemas, and tables available to query. All clusters on that workspace use the configured metastore and all workspaces that are configured to use a single metastore share those objects.

    • Catalog: Sits on top of schemas (also called databases) and tables to manage permissions across a set of schemas

    • Schema: Organizes tables and views

    • Table-etc: Table (managed or external tables), view, volume, model, and function

    For details about the Unity Catalog object model, see the Databricks Unity Catalog documentationarrow-up-right.

    hashtag
    Feature support

    The Databricks Unity Catalog integration supports

    • managing and accessing data across multiple Databricks workspaces

    • enforcing Unity Catalog row-, column-, and table-level access controls on Databricks clusters and SQL warehouses:

      • applying column masks and row filters on specific securable objects

      • applying subscription policies on tables and views

    • enforcing Unity Catalog access controls, even if Immuta becomes disconnected

    • allowing non-Immuta reads and writes

    • using Photon

    • using a proxy server

    hashtag
    Architecture

    Unity Catalog supports managing permissions account-wide in Databricks through controls applied directly to objects in the metastore. To establish a connection with Databricks and apply controls to securable objects within the metastore, Immuta requires a service principal with privileges to manage all data protected by Immuta. Databricks OAuth for service principalsarrow-up-right (OAuth M2M) or a personal access token (PAT) can be provided for Immuta to authenticate as the service principal. See the permissions requirements section for a list of specific Databricks privileges.

    Immuta uses this service principal to run queries that set up user-defined functions (UDFs) and other data necessary for policy enforcement. Upon enabling the integration, Immuta will create a catalog that contains these schemas:

    • immuta_system: Contains internal Immuta data.

    • immuta_policies_n: Contains policy UDFs.

    When policies require changes to be pushed to Unity Catalog, Immuta updates the internal tables in the immuta_system schema with the updated policy information. If necessary, new UDFs are pushed to replace any out-of-date policies in the immuta_policies_n schemas and any row filters or column masks are updated to point at the new policies. Many of these operations require compute on the configured Databricks cluster or SQL warehouse, so compute must be available for these policies to succeed.

    hashtag
    Policy enforcement

    Immuta’s Unity Catalog integration applies Databricks table-, row-, and column-level security controls that are enforced natively within Databricks. Immuta's management of these Databricks security controls is automated and ensures that they synchronize with Immuta policy or user entitlement changes.

    • Table-level security: Immuta manages REVOKEarrow-up-right and GRANTarrow-up-right privileges on Databricks securable objects that have been registered as Immuta data sources. When you register a data source in Immuta, Immuta uses the Unity Catalog API to issue GRANTS or REVOKES against the catalog, schema, or table in Databricks for every user registered in Immuta.

    • Row-level security: Immuta applies SQL UDFs to restrict access to rows for querying users.

    • Column-level security: Immuta applies column-mask SQL UDFs to tables for querying users. These column-mask UDFs run for any column that requires masking.

    circle-exclamation

    Policy behavior

    If you enable a Databricks Unity Catalog object in Immuta and it has no subscription policy set on it, Immuta will REVOKE access to that object in Databricks for all Immuta users, even if they had been directly granted access to that table outside of Immuta.

    If you disable a Unity Catalog data source in Immuta, all existing grants and policies on that object will be removed in Databricks for all Immuta users. All existing grants and policies will be removed, regardless of whether they were set in Immuta or in Unity Catalog directly.

    If a user is not registered in Immuta, Immuta will have no effect on that user's access to data in Unity Catalog.

    hashtag
    Supported policies

    The Unity Catalog integration supports the following policy types:

    • Subscription policies

    • Select masking policies

      • Conditional masking

      • Constant

      • Custom masking

      • Hashing

      • Null

      • Regex: You must use the global regex flag (g) when creating a regex masking policy in this integration. You cannot use the case insensitive regex flag (i) when creating a regex masking policy in this integration. See the for examples.

      • Rounding (date and numeric rounding)

      • Matching (only show rows where)

        • Custom WHERE

    hashtag
    Project-scoped purpose exceptions for Databricks Unity Catalog

    circle-info

    Public preview: This feature is available to select accounts. Reach out to your Immuta representative to enable this feature.

    Project-scoped purpose exceptions for Databricks Unity Catalog integrations allow you to apply purpose-based policies to Databricks data sources in a project. As a result, users can only access that data when they are working within that specific project.

    hashtag
    Databricks Unity Catalog views

    If you are using views in Databricks Unity Catalog, one of the following must be true for project-scoped purpose exceptions to apply to the views in Databricks:

    • The view and underlying table are registered as Immuta data sources and added to a project: If a view and its underlying table are both added as Immuta data sources, both of these assets must be added to the project for the project-scoped purpose exception to apply. If a view and underlying table are both added as data sources but the table is not added to an Immuta project, the purpose exception will not apply to the view because Databricks does not support fine-grained access controls on views.

    • Only the underlying table is registered as an Immuta data source and added to a project: If only the underlying table is registered as an Immuta data source but the view is not registered, the purpose exception will apply to both the table and corresponding view in Databricks. Views are the only Databricks object that will have Immuta policies applied to them even if they're not registered as Immuta data sources (as long as their underlying tables are registered).

    hashtag
    Policy exemption group

    The Databricks group configured as the policy exemption group in Immuta will be exempt from Immuta data policy enforcement. This account-level group is created and managed in Databricks, not in Immuta.

    If you have service or system accounts that need to be exempt from masking and row-level policy enforcement, add them to an account-level group in Databricks and include this group name in the Databricks Unity Catalog configuration in Immuta. Then, group members will be excluded from having data policies applied to them when they query Immuta-protected tables in Databricks.

    Typically, service or system accounts that perform the following actions are added to an exemption group in Databricks:

    • Automated queries

    • ETL

    • Report generation

    If you have multiple groups that must be exempt from data policies, add each group to a single group in Databricks that you then set as the policy exemption group in Immuta.

    The service principal used to register data sources in Immuta will be automatically added to the exemption group for the Databricks securables it registers. Consequently, accounts added to the exemption group and used to register data sources in Immuta should be limited to service accounts.

    For guidance on configuring a policy exemption group on the Immuta app settings page, see the Configure a Databricks Unity Catalog integration guide. Alternatively, this group can be configured via the integrations API using the groupPattern object.

    hashtag
    Policy support with hive_metastore

    When enabling Unity Catalog support in Immuta, the catalog for all Databricks data sources will be updated to point at the default hive_metastore catalog. Internally, Databricks exposes this catalog as a proxy to the workspace-level Hive metastore that schemas and tables were kept in before Unity Catalog. Since this catalog is not a real Unity Catalog catalog, it does not support any Unity Catalog policies. Therefore, Immuta will ignore any data sources in the hive_metastore in any Databricks Unity Catalog integration, and policies will not be applied to tables there.

    However, with Databricks metastore magic you can use hive_metastore and enforce subscription and data policies with the Databricks Spark integration.

    hashtag
    Authentication methods

    The Databricks Unity Catalog integration supports the following authentication methods to configure the integration and create data sources:

    • Personal access token (PAT): This is the access token for the Immuta service principal. This service principal must have the metastore privileges listed in the permissions section for the metastore associated with the Databricks workspace. If this token is configured to expire, update this field regularly for the integration to continue to function.

    • OAuth machine-to-machine (M2M): Immuta uses the Client Credentials Flowarrow-up-right to integrate with Databricks OAuth machine-to-machine authenticationarrow-up-right, which allows Immuta to authenticate with Databricks using a client secret. Once Databricks verifies the Immuta service principal’s identity using the client secret, Immuta is granted a temporary OAuth token to perform token-based authentication in subsequent requests. When that token expires (after one hour), Immuta requests a new temporary token. See the Databricks OAuth machine-to-machine (M2M) authentication pagearrow-up-right for more details.

    hashtag
    Immuta data sources in Unity Catalog

    The Unity Catalog data object model introduces a 3-tiered namespace, as outlined above. Consequently, your Databricks tables registered as data sources in Immuta will reference the catalog, schema (also called a database), and table.

    hashtag
    External data connectors and query-federated tables

    External data connectors and query-federated tables are preview features in Databricks. See the Databricks documentationarrow-up-right for details about the support and limitations of these features before registering them as data sources in the Unity Catalog integration.

    hashtag
    Query audit

    circle-info

    Access requirements

    For Databricks Unity Catalog audit to work, Immuta must have, at minimum, the following access.

    • USE CATALOG on the system catalog

    • USE SCHEMA on the system.access schema

    • SELECT on the following system tables:

      • system.access.audit

      • system.access.table_lineage

    The Databricks Unity Catalog integration audits all user queries run in the integration's clusters or SQL warehouses. See the Databricks Unity Catalog audit page for details about the contents of the logs.

    The audit ingest is set when configuring the integration and can be scoped to only ingest specific workspaces if needed. The default ingest frequency is every hour, but this can be configured to a different frequency on the Immuta app settings page. Additionally, audit ingestion can be manually requested at any time from the Immuta audit page. When manually requested, it will only search for new queries that were created since the last query that had been audited. The job is run in the background, so the new queries will not be immediately available.

    hashtag
    Configuration requirements

    See the Enable Unity Catalog guide for a list of requirements.

    hashtag
    Supported Databricks cluster configurations

    The table below outlines the integrations supported for various Databricks cluster configurations. For example, the only integration available to enforce policies on a cluster configured to run on Databricks Runtime 9.1 is the Databricks Spark integration.

    Example cluster
    Databricks Runtime
    Unity Catalog in Databricks
    Databricks Spark integration
    Databricks Unity Catalog integration

    Cluster 1

    9.1

    Unavailable

    ✅

    Unavailable

    Cluster 2

    10.4

    Unavailable

    ✅

    Legend:

    • ✅ The feature or integration is enabled.

    • ⛔ The feature or integration is disabled.

    hashtag
    Unity Catalog caveats

    • Row access policies with more than 1023 columns are unsupported. This is an underlying limitation of UDFs in Databricks. Immuta will only create row access policies with the minimum number of referenced columns. This limit will therefore apply to the number of columns referenced in the policy and not the total number in the table.

    • If you disable table grants, Immuta revokes the grants. Therefore, if users had access to a table before enabling Immuta, they’ll lose access.

    • If multiple Immuta tenants are connected to your Databricks environment, you must create a separate Immuta catalog for each of those tenants during configuration. Having multiple Immuta tenants use the same Immuta catalog causes failures in policy enforcement.

    • You must use the global regex flag (g) when creating a regex masking policy in this integration, and you cannot use the case insensitive regex flag (i) when creating a regex masking policy in this integration. See the examples below for guidance:

      • regex with a global flag (supported): /^ssn|social ?security$/g

      • regex without a global flag (unsupported): /^ssn|social ?security$/

    hashtag
    Azure Databricks Unity Catalog limitation

    If a registered data source is owned by a Databricks group at the table level, then the Unity Catalog integration cannot apply data masking policies to that table in Unity Catalog.

    Therefore, set all table-level ownership on your Unity Catalog data sources to an individual user or service principal instead of a Databricks group. Catalogs and schemas can still be owned by a Databricks group, as ownership at that level doesn't interfere with the integration.

    hashtag
    Feature limitations

    The following features are currently unsupported:

    • Immuta projects (Enable the project-scoped purpose exceptions feature to allow you to apply purpose-based policies to Databricks data sources in a project.)

    • Multiple IAMs on a single cluster

    • Column masking policies on views

    • Mixing masking policies on the same column

    • Row-redaction policies on views

    • R and Scala cluster support

    • Scratch paths

    • User impersonation

    • Policy enforcement on raw Spark reads

    • Python UDFs for advanced masking functions

    • Direct file-to-SQL reads

    • Data policies on ARRAY, MAP, or STRUCT type columns

    • Shallow clones

    hashtag
    Known issue

    Snippets for Databricks data sources may be empty in the Immuta UI.

    hashtag
    Next

    Configure the Databricks Unity Catalog integration.

    Data policies
    client ID displayed in Databricks when creating the client secret for the service principalarrow-up-right
    OAuth 2.0 documentationarrow-up-right
    create an OAuth client secret for the service principalarrow-up-right
    client ID displayed in Databricks when creating the client secret for the service principalarrow-up-right
    OAuth 2.0 documentationarrow-up-right
    client ID displayed in Databricks when creating the client secret for the service principalarrow-up-right
    OAuth 2.0 documentationarrow-up-right
    create an OAuth client secret for the service principalarrow-up-right
    client ID displayed in Databricks when creating the client secret for the service principalarrow-up-right
    OAuth 2.0 documentationarrow-up-right
    Never
  • Where user

  • Where value in column

  • Minimization

  • Time-based restrictions

  • system.access.column_lineage

  • regex with a case insensitive flag (unsupported): /^ssn|social ?security$/gi

  • regex without a case insensitive flag (supported): /^ssn|social ?security$/g

  • Unavailable

    Cluster 3

    11.3

    ⛔

    ✅ / ⛔

    Unavailable

    Cluster 4

    11.3

    ✅

    ⛔

    ⛔

    Cluster 5

    11.3

    ✅

    ✅

    ✅

    auditing activity of both Immuta users and non-Immuta users
    limitations section
    Row-level policies

    Deploy Immuta without Elasticsearch

    circle-exclamation

    Feature availability

    If you deploy Immuta without Elasticsearch, several core services and features will be unavailable. See the Deployment requirements page for details.

    The guides below outline how to deploy Immuta without Elasticsearch.

    This is a guide on how to deploy Immuta on Kubernetes in the following managed public cloud providers:

    • Amazon Web Services (AWS)

    • Microsoft Azure

    • Google Cloud Platform (GCP)

    hashtag
    Prerequisites

    The following cloud-managed services must be provisioned before proceeding:

    • Amazon Web Services (AWS):

    • Microsoft Azure:

    • Google Cloud Platform (GCP):

    Validation

    1. The PostgreSQL instance's hostname/FQDN is .

    2. The PostgreSQL instance is .

    hashtag
    Authenticate with OCI registry

    circle-exclamation

    Helm chart availability

    The deprecated Immuta Helm chart (IHC) is not available from ocir.immuta.com.

    Copy the snippet below and replace the placeholder text with the credentials provided to you by your customer success manager:

    hashtag
    Setup

    1. Create a Kubernetes namespace named immuta for Immuta.

    2. Switch to namespace immuta.

    3. Create a container registry pull secret. Your credentials to authenticate with ocir.immuta.com can be viewed in your user profile at .

    PostgreSQL

    circle-info

    Connecting to the database

    There are numerous ways to connect to a PostgreSQL database. This step demonstrates how to connect by creating an ephemeral Kubernetes pod.

    1. Connect to the database as superuser (postgres) by creating an ephemeral container inside the Kubernetes cluster. A shell prompt will not be displayed after executing the kubectl run command outlined below. Wait 5 seconds, and then proceed by entering a password.

    2. Create an immuta role and database.

    3. Revoke privileges from CURRENT_USER

    hashtag
    Install Immuta

    This section demonstrates how to deploy Immuta using the Immuta Enterprise Helm chart once the prerequisite cloud-managed services are configured.

    1. Create a Helm values file named immuta-values.yaml with the following content:

    2. Update all in the immuta-values.yaml file.

    circle-exclamation

    Avoid these special characters in generated passwords

    whitespace, $, &, :, \, /, '

    1. Deploy Immuta.

    hashtag
    Validation

    1. Wait for all pods in the namespace to become ready.

    2. Determine the name of the Secure service.

    3. Listen on local port 8080, forwarding TCP traffic to the Secure service's port named http.

    hashtag
    Next steps

    • Amazon Web Services (AWS)

      • to complete your installation and access your Immuta application.

      • to secure your Ingress by specifying a Secret that contains a TLS private key and certificate.

    This is an OpenShift-specific guide on how to deploy Immuta with the following managed services:

    • Cloud-managed PostgreSQL

    • Cloud-managed Redis

    hashtag

    This is a generic guide that demonstrates how to deploy Immuta into any Kubernetes cluster without dependencies on any particular cloud provider.

    hashtag
    Considerations

    For the purposes of this guide, the state stores are deployed in Kubernetes using third-party Helm charts maintained by .

    circle-exclamation

    Snowflake Integration

    circle-info

    Snowflake Enterprise Edition required

    In this integration, Immuta manages access to Snowflake tables by administering Snowflake and on those tables, allowing users to query tables directly in Snowflake while dynamic policies are enforced.

    Like with all Immuta integrations, Immuta can inject its ABAC model into policy building and administration to remove policy management burden and significantly reduce role explosion.

    Configure Starburst (Trino) Integration

    The plugin comes pre-installed with Starburst Enterprise, so this page provides separate sets of guidelines for configuration:

    • : These instructions are specific to Starburst Enterprise clusters.

    • : These instructions are specific to open-source Trino clusters.

    as they're no longer required.
  • Enable the pgcrypto extension.

  • Type \q, and then press Enter to exit.

  • Learn more about the best practices for Immuta in production.

  • Microsoft Azure

    • Configure Ingress to complete your installation and access your Immuta application.

    • Configure TLS to secure your Ingress by specifying a Secret that contains a TLS private key and certificate.

    • .

  • Google Cloud Platform (GCP)

    • Configure Ingress to complete your installation and access your Immuta application.

    • Configure TLS to secure your Ingress by specifying a Secret that contains a TLS private key and certificate.

    • .

  • Prerequisites

    Review the following criteria before proceeding with deploying Immuta.

    PostgreSQL

    1. The PostgreSQL instance has been provisioned and is actively running.

    2. The PostgreSQL instance's hostname/FQDN is resolvable from within the Kubernetes cluster.

    3. The PostgreSQL instance is accepting connections.

    Redis

    1. The Redis instance has been provisioned and is actively running.

    2. The Redis instance's hostname/FQDN is resolvable from within the Kubernetes cluster.

    3. The Redis instance is accepting connections.

    hashtag
    Authenticate with OCI registry

    circle-exclamation

    Helm chart availability

    The deprecated Immuta Helm chart (IHC) is not available from ocir.immuta.com.

    Copy the snippet below and replace the placeholder text with the credentials provided to you by your customer success manager:

    hashtag
    Setup

    1. Create a new OpenShift project named immuta for Immuta.

    2. Get the UID range allocated to the project. Each running container's UID must fall within this range. This value will be referenced later on.

    3. Get the GID range allocated to the project. Each running container's GID must fall within this range. This value will be referenced later on.

    4. Switch to project immuta.

    5. Create a container registry pull secret. Your credentials to authenticate with ocir.immuta.com can be viewed in your user profile at .

    Cloud-managed PostgreSQL

    circle-info

    Connecting to the database

    There are numerous ways to connect to a PostgreSQL database. This step demonstrates how to connect by creating an ephemeral Kubernetes pod.

    1. Connect to the database as superuser (postgres) by creating an ephemeral container inside the Kubernetes cluster. A shell prompt will not be displayed after executing the oc run command outlined below. Wait 5 seconds, and then proceed by entering a password.

    2. Create an immuta role and database.

    3. Revoke privileges from CURRENT_USER as they're no longer required.

    4. Enable the pgcrypto extension.

    5. Type \q, and then press Enter to exit.

    hashtag
    Install Immuta

    This section demonstrates how to deploy Immuta using the Immuta Enterprise Helm chart once the prerequisite cloud-managed services are configured.

    1. Create a Helm values file named immuta-values.yaml with the content below. Because the Ingress resource will be managed by an OpenShift route you will create when configuring Ingress and not the Immuta Enterprise Helm chart, ingress is set to false below. TLS comes pre-configured with OpenShift, so tls is also set to false.

    2. Update all placeholder values in the immuta-values.yaml file.

    circle-exclamation

    Avoid these special characters in generated passwords

    whitespace, $, &, :, \, /, '

    1. Deploy Immuta.

    hashtag
    Validation

    1. Wait for all pods in the namespace to become ready.

    2. Determine the name of the Secure service.

    3. Listen on local port 8080, forwarding TCP traffic to the Secure service's port named http.

    hashtag
    Next steps

    • Configure Ingress to complete your installation and access your Immuta application.

    • Learn more about best practices for Immuta in Production.

    Running production-grade stateful workloads (e.g, databases) in Kubernetes is difficult and heavily discouraged due to the following reasons.
    • Operational overhead: Managing PostgreSQL on Kubernetes requires expertise in deploying, maintaining, and scaling these databases and search engines effectively. This involves tasks like setting up monitoring, configuring backups, managing updates, and ensuring high availability. Cloud-managed services abstract much of this operational burden away, allowing teams to focus on application development rather than infrastructure management.

    • Resource allocation and scaling: Kubernetes requires careful resource allocation and scaling decisions to ensure that PostgreSQL has sufficient CPU, memory, and storage. Properly sizing these resources can be challenging and may require continuous adjustments as workload patterns change. Managed services typically handle this scaling transparently and can automatically adjust based on demand.

    • Data integrity and high availability: PostgreSQL deployments need robust strategies for data integrity and high availability. Kubernetes can facilitate high availability through pod replicas and distributed deployments, but ensuring data consistency and durability across database instances and search indexes requires careful consideration and often additional tooling.

    • Performance: Kubernetes networking and storage configurations can introduce performance overhead compared to native cloud services. For latency-sensitive applications or high-throughput workloads, these factors become critical in maintaining optimal performance.

    • Observability: Troubleshooting issues in a Kubernetes environment, especially related to database and search engine performance, can be complex. Managed services typically come with built-in monitoring, logging, and alerting capabilities tailored to the specific service, making it easier to identify and resolve issues.

    • Security and compliance: Kubernetes environments require careful attention to security best practices, including network policies, access controls, and encryption. Managed services often come pre-configured with security features and compliance certifications, reducing the burden on teams to implement and maintain these measures.

    hashtag
    Authenticate with OCI registry

    circle-exclamation

    Helm chart availability

    The deprecated Immuta Helm chart (IHC) is not available from ocir.immuta.com.

    Copy the snippet below and replace the placeholder text with the credentials provided to you by your customer success manager:

    hashtag
    Setup

    1. Create a Kubernetes namespace named immuta for Immuta and its third-party dependencies.

    2. Switch to namespace immuta.

    3. Create a container registry pull secret. Your credentials to authenticate with ocir.immuta.com can be viewed in your user profile at support.immuta.comarrow-up-right.

    PostgreSQL

    1. Create a Helm values file named pg-values.yaml with the following content:

    2. Update all placeholder values in the pg-values.yaml file.

    circle-exclamation

    Avoid these special characters in generated passwords

    whitespace, $, &, :, \, /, '

    1. Deploy PostgreSQL.

    2. Wait for all pods in the namespace to become ready.

    3. Determine the name of the PostgreSQL database pod. This will be referenced in a subsequent step.

    4. Exec into the PostgreSQL database pod using the psql command and immuta user to configure the PostgreSQL user used by Immuta.

    5. Alter the search_path for the immuta user.

    6. Enable the pgcrypto extension.

    7. Type \q then press Enter to exit.

    hashtag
    Install Immuta

    This section demonstrates how to deploy Immuta using the Immuta Enterprise Helm chart once the prerequisite local services are configured.

    1. Create a Helm values file named immuta-values.yaml with the following content:

    2. Update all placeholder values in the immuta-values.yaml file.

    circle-exclamation

    Avoid these special characters in generated passwords

    whitespace, $, &, :, \, /, '

    1. Deploy Immuta.

    hashtag
    Validation

    1. Wait for all pods in the namespace to become ready.

    2. Determine the name of the Secure service.

    3. Listen on local port 8080, forwarding TCP traffic to the Secure service's port named http.

    4. Navigate to http://localhost:8080 in a web browser.

    hashtag
    Next steps

    • Configure Ingress to complete your installation and access your Immuta application.

    • Configure TLS to secure your Ingress by specifying a Secret that contains a TLS private key and certificate.

    • Learn more about best practices for Immuta in Production.

    Amazon RDS for PostgreSQLarrow-up-right
    Azure Database for PostgreSQLarrow-up-right
    Google Cloud SQL for PostgreSQLarrow-up-right
    resolvable from within the Kubernetes cluster
    accepting connections
    support.immuta.comarrow-up-right
    placeholder values
    Configure Ingress
    Configure TLS
    PostgreSQLarrow-up-right
    Bitnamiarrow-up-right
    echo <token> | helm registry login --password-stdin --username <username> ocir.immuta.com
    hashtag
    How the integration works

    When an administrator configures the Snowflake integration with Immuta, Immuta creates an IMMUTA database and schemas (immuta_procedures, immuta_policies, and immuta_functions) within Snowflake to contain policy definitions and user entitlements. Immuta then creates a system role and gives that system account the privileges required to orchestrate policies in Snowflake and maintain state between Snowflake and Immuta. See the Snowflake privileges section for a list of privileges, the user they must be granted to, and an explanation of why they must be granted.

    hashtag
    Data flow

    1. An Immuta application administrator configures the Snowflake integration and registers Snowflake warehouse and databases with Immuta.

    2. Immuta creates a database inside the configured Snowflake warehouse that contains Immuta policy definitions and user entitlements.

    3. A data owner registers Snowflake tables in Immuta as data sources.

    4. If was enabled during the configuration, Immuta uses the host provided in the configuration and ingests internal tags on Snowflake tables registered as Immuta data sources.

    5. A data owner, data governor, or administrator creates or changes a policy or a in Immuta.

    6. The Immuta web service calls a stored procedure that modifies the user entitlements or policies.

    7. Immuta manages and applies and to Snowflake tables that are registered as Immuta data sources.

    8. If is not enabled, Snowflake object owner or user with the global MANAGE GRANTS privilege grants on relevant Snowflake tables to users. Note: Although they are GRANTed access, if they are not subscribed to the table via Immuta-authored policies, they will not see data.

    9. A Snowflake user who is subscribed to the data source in Immuta queries the corresponding table directly in Snowflake and sees policy-enforced data.

    hashtag
    Policy enforcement

    When Immuta users create policies, they are then pushed into the Immuta database within Snowflake; there, the Immuta system account orchestrates Snowflake row access policiesarrow-up-right and column masking policiesarrow-up-right directly onto Snowflake tables. Changes in Immuta policies, user attributes, or data sources trigger webhooks that keep the Snowflake policies up-to-date.

    For a user to query Immuta-protected data, they must meet two qualifications:

    1. They must be subscribed to the Immuta data source.

    2. They must be granted SELECT access on the table by the Snowflake object owner or automatically via the Snowflake table grants feature.

    After a user has met these qualifications they can query Snowflake tables directly.

    See the integration support matrix on the Data policy types reference guide for a list of supported data policy types in Snowflake.

    hashtag
    Comply with column length and precision requirements in a Snowflake masking policy

    When a user applies a masking policy to a Snowflake data source, Immuta truncates masked values to align with Snowflake column length (VARCHAR(X)arrow-up-right types) and precision (NUMBER (X,Y)arrow-up-right types) requirements.

    Consider these columns in a data source that have the following masking policies applied:

    • Column A (VARCHAR(6)): Mask using hashing for everyone

    • Column B (VARCHAR(5)): Mask using a constant REDACTED for everyone

    • Column C (VARCHAR(6)): Mask by making null for everyone

    • Column D (NUMBER(3, 0)): Mask by rounding to the nearest 10 for everyone

    Querying this data source in Snowflake would return the following values:

    A
    B
    C
    D

    5w4502

    REDAC

    null

    990

    6e3611

    REDAC

    null

    750

    9s7934

    REDAC

    null

    circle-info

    Hashing collisions

    Hashing collisions are more likely to occur across or within Snowflake columns restricted to short lengths, since Immuta truncates the hashed value to the limit of the column. (Hashed values truncated to 5 characters have a higher risk of collision than hashed values truncated to 20 characters.) Therefore, avoid applying hashing policies to Snowflake columns with such restrictions.

    For more details about Snowflake column length and precision requirements, see the Snowflake behavior change releasearrow-up-right documentation.

    hashtag
    Query performance

    When a policy is applied to a column, Immuta uses Snowflake memoizable functionsarrow-up-right to cache the result of the called function. Then, when a user queries a column that has that policy applied to it, Immuta uses that cached result to dramatically improve query performance.

    hashtag
    Snowflake privileges

    The privilege grants the Snowflake integration requires align to the least privilege security principle. The table below describes each privilege required in Snowflake for the setup user, the IMMUTA_SYSTEM_ACCOUNT user, or the metadata registration user. The references to IMMUTA_DB , IMMUTA_WH, and IMMUTA_IMPERSONATOR_ROLE in the table can be replaced with what you chose for the name of your Immuta database, warehouse, and impersonation role when setting up the integration, respectively.

    Snowflake privilege
    User requiring privilege
    Features
    Explanation

    CREATE DATABASE ON ACCOUNT WITH GRANT OPTION

    Setup user

    All

    The setup script this user runs creates an Immuta database in your organization's Snowflake account where all Immuta managed objects (UDFs, masking policies, row access policies, and user entitlements) will be written and stored.

    CREATE ROLE ON ACCOUNT WITH GRANT OPTION

    Setup user

    All

    The setup script this user runs creates a ROLE for Immuta that will be used to manage the integration once it has been initialized.

    CREATE USER ON ACCOUNT WITH GRANT OPTION

    Setup user

    hashtag
    Integration health status

    hashtag
    Registering data sources

    Register Snowflake data sources using a dedicated Snowflake role. Avoid using individual user accounts for data source onboarding. Instead, create a service account (Snowflake user account TYPE=SERVICE) with SELECT access for onboarding data sources. No policies will apply to that role, ensuring that your integration works with the following use cases:

    • Snowflake project workspaces: Snowflake workspaces generate static views with the credentials used to register the table as an Immuta data source. Those tables must be registered in Immuta by an excepted role so that policies applied to the backing tables are not applied to the project workspace views.

    • Using views and tables within Immuta: Because this integration uses Snowflake governance policies, users can register tables and views as Immuta data sources. However, if you want to register views and apply different policies to them than their backing tables, the owner of the view must be an excepted role; otherwise, the backing table’s policies will be applied to that view.

    hashtag
    Snowflake bulk data source creation

    circle-info

    Private preview: This feature is available to select accounts. Contact your Immuta representative to enable this feature.

    Bulk data source creation is the more efficient process when loading more than 5000 data sources from Snowflake and allows for data sources to be registered in Immuta before running sensitive data discovery or applying policies.

    To use this feature, see the Bulk create Snowflake data sources guide.

    hashtag
    Resource allocations

    Based on performance tests that create 100,000 data sources, the following minimum resource allocations need to be applied to the appropriate pods in your Kubernetes environment for successful bulk data source creation.

    Web
    Database

    Memory

    4Gi

    16Gi

    CPU

    2

    4

    Storage

    8Gi

    24Gi

    hashtag
    Limitations

    • Performance gains are limited when enabling sensitive data discovery at the time of data source creation.

    • External catalog integrations are not recognized during bulk data source creation. Users must manually trigger a catalog sync for tags to appear on the data source through the data source's health check.

    hashtag
    Excepted roles/users

    Excepted roles and users are assigned when the integration is installed, and no policies will apply to these users' queries, despite any Immuta policies enforced on the tables they are querying. Credentials used to register a data source in Immuta will be automatically added to this excepted list for that Snowflake table. Consequently, roles and users added to this list and used to register data sources in Immuta should be limited to service accounts.

    Immuta excludes the listed roles and users from policies by wrapping all policies in a CASE statement that will check if a user is acting under one of the listed usernames or roles. If a user is, then the policy will not be acted on the queried table. If the user is not, then the policy will be executed like normal. Immuta does not distinguish between role and username, so if you have a role and user with the exact same name, both the user and any user acting under that role will have full access to the data sources and no policies will be enforced for them.

    hashtag
    Authentication methods

    The Snowflake integration supports the following authentication methods to configure the integration and create data sources:

    • Username and password: Users can authenticate with their Snowflake username and password.

    • Key pair: Users can authenticate with a Snowflake key pair authenticationarrow-up-right.

    • Snowflake External OAuth: Users can authenticate with Snowflake External OAutharrow-up-right.

    hashtag
    Snowflake External OAuth

    Immuta's OAuth authentication method uses the Client Credentials Flowarrow-up-right to integrate with Snowflake External OAuth. When a user configures the Snowflake integration or connects a Snowflake data source, Immuta uses the token credentials (obtained using a certificate or passing a client secret) to craft an authenticated access token to connect with Snowflake. This allows organizations that already use Snowflake External OAuth to use that secure authentication with Immuta.

    hashtag
    Workflow

    1. An Immuta application administrator configures the Snowflake integration or creates a data source.

    2. Immuta creates a custom token and sends it to the authorization server.

    3. The authorization server confirms the information sent from Immuta and issues an access token to Immuta.

    4. Immuta sends the access token it received from the authorization server to Snowflake.

    5. Snowflake authenticates the token and grants access to the requested resources from Immuta.

    6. The integration is connected and users can query data.

    hashtag
    Supported Snowflake feature

    The Immuta Snowflake integration supports Snowflake external tablesarrow-up-right. However, you cannot add a masking policy to an external table column while creating the external table in Snowflake because masking policies cannot be attached to virtual columns.

    hashtag
    Supported Immuta features

    The Snowflake integration supports the Immuta features outlined below. Click the links provided for more details.

    • Immuta project workspaces: Users can have additional write access in their integration using project workspaces.

    • Tag ingestion: Immuta automatically ingests Snowflake object tags from your Snowflake instance and adds them to the appropriate data sources.

    • User impersonation: Impersonation allows users to query data as another Immuta user. To enable user impersonation, see the Integration user impersonation page.

    • : Immuta audits queries run in Snowflake against Snowflake data registered as Immuta data sources.

    • : The Snowflake low row access policy mode improves query performance in Immuta's Snowflake integration by decreasing the number of Snowflake row access policies Immuta creates.

    • : This feature allows Immuta to manage privileges on your Snowflake tables and views according to the subscription policies on the corresponding Immuta data sources.

    hashtag
    Immuta project workspaces

    circle-info

    Immuta system account required Snowflake privileges

    • CREATE [OR REPLACE] PROCEDURE

    • DROP ROLE

    • REVOKE ROLE

    Users can have additional write access in their integration using project workspaces. For more details, see the Snowflake project workspaces page.

    hashtag
    Caveat

    To use project workspaces with the Snowflake integration, the default role of the account used to create data sources in the project must be added to the "Excepted Roles/Users List." If the role is not added, you will not be able to query the equalized view using the project role in Snowflake.

    hashtag
    Tag ingestion

    You can enable Snowflake tag ingestion so that Immuta will ingest Snowflake object tags from your Snowflake instance into Immuta and add them to the appropriate data sources.

    The Snowflake tags' key and value pairs will be reflected in Immuta as two levels: the key will be the top level and the value the second. As Snowflake tags are hierarchical, Snowflake tags applied to a database will also be applied to all of the schemas in that database, all of the tables within those schemas, and all of the columns within those tables. For example: If a database is tagged PII, all of the tables and columns in that database will also be tagged PII.

    To enable Snowflake tag ingestion, see the enable Snowflake tag ingestion documentation.

    hashtag
    Caveats

    Snowflake has some natural data latencyarrow-up-right. If you manually refresh the governance page to see all tags created globally, users can experience a delay of up to two hours. However, if you run schema detection or a health check to find where those tags are applied, the delay will not occur because Immuta will only refresh tags for those specific tables.

    hashtag
    Query audit

    Once this feature has been enabled with the Snowflake integration, Immuta will query Snowflake to retrieve user query histories. These histories provide audit records for queries against Snowflake data sources that are queried natively in Snowflake.

    The audit ingest is set when configuring the integration. The default ingest frequency is every hour, but this can be configured to a different frequency on the Immuta app settings page. Additionally, audit ingestion can be manually requested at any time from the Immuta audit page. When manually requested, it will only search for new queries that were created since the last query that had been audited. The job is run in the background, so the new queries will not be immediately available.

    hashtag
    Multiple Snowflake instances

    A user can configure multiple integrations of Snowflake to a single Immuta tenant and use them dynamically or with workspaces.

    hashtag
    Caveats

    • There can only be one integration connection with Immuta per host.

    • The host of the data source must match the host of the integration for the view to be created.

    • Projects can only be configured to use one Snowflake host.

    hashtag
    Limitations

    • If there are errors in generating or applying policies natively in Snowflake, the data source will be locked and only users on the excepted roles/users list and the credentials used to create the data source will be able to access the data.

    • Once a Snowflake integration is disabled in Immuta, the user must remove the access that was granted in Snowflake. If that access is not revoked, users will be able to access the raw table in Snowflake.

    • Migration must be done using the credentials and credential method (automatic or bootstrap) used to configure the integration.

    • When configuring one Snowflake instance with multiple Immuta tenants, the user or system account that enables the integration on the app settings page must be unique for each Immuta tenant.

    • You cannot add a masking policy to an external table column while creating the external table because a masking policy cannot be attached to a virtual column.

    • If you create an Immuta data source from a Snowflake view created using a select * from query, Immuta column detection will not work as expected because . To remedy this, you can create views that have the specific columns you want or you can CREATE AND REPLACE the view in Snowflake whenever the backing table is updated and on the data source page.

    • If a user is created in Snowflake after that user is already registered in Immuta, Immuta does not grant usage on the per-user role automatically - meaning Immuta does not govern this user's access without manual intervention. If a Snowflake user is created after that user is registered in Immuta, the user account must be and re-enabled to trigger a sync of Immuta policies to govern that user. Whenever possible, Snowflake users should be created before registering those users in Immuta.

    • Snowflake tables from imported databases are not supported. Instead, create a view of the table and register that view as a data source.

    hashtag
    Custom WHERE clause limitations

    The Immuta Snowflake integration uses Snowflake governance features to let users query data natively in Snowflake. This means that Immuta also inherits some Snowflake limitations using correlated subqueries with row access policiesarrow-up-right and column-level securityarrow-up-right. These limitations appear when writing custom WHERE policies, but do not remove the utility of row-level policies.

    hashtag
    Requirements for a custom WHERE policy

    1. All column names must be fully qualified: Any column names that are unqualified (i.e., just the column name) will default to a column of the data source the policy is being applied to (if one matches the name).

    2. The Immuta system account must have SELECT privileges on all tables/views referenced in a subquery: The Immuta system role name is specified by the user, and the role is created when the Snowflake instance is integrated.

    hashtag
    Subquery limitations

    Any subqueries that error in Snowflake will also error in Immuta.

    1. Including one or more subqueries in the Immuta policy condition may cause errors in Snowflake. If an error occurs, it may happen during policy creation or at query-time. To avoid these errors, limit the number of subqueries, limit the number of JOIN operations, and simplify WHERE clause conditions.

    2. For more information on the Snowflake subquery limitations see

      • Understanding column-level securityarrow-up-right

    row access policiesarrow-up-right
    column masking policiesarrow-up-right
    hashtag
    Starburst Cluster Configuration

    hashtag
    Requirement

    A valid Starburst Enterprise licensearrow-up-right.

    circle-exclamation

    Starburst does not support using Starburst built-in access control (BIAC) concurrently with any other access control providers such as Immuta. If Starburst BIAC is in use, it must be disabled to allow Immuta to enforce policies on cluster.

    hashtag
    1 - Enable the Integration

    1. Click the App Settings icon in the left sidebar.

    2. Click the Integrations tab.

    3. Click Add Integration and select Trino from the Integration Type dropdown menu.

    4. Click Save.

    hashtag
    OAuth Authentication

    If you are using OAuth or asynchronous authentication to create Starburst (Trino) data sources, configure the globalAdminUsername property in the advanced configuration section of the Immuta app settings page.

    1. Click the App Settings page icon.

    2. Click Advanced Settings and scroll to Advanced Configuration.

    3. Paste the following YAML configuration snippet in the text box, replacing the email address below with your admin username:

    hashtag
    2 - Configure the Immuta System Access Control Plugin in Starburst

    circle-info

    Default Configuration Property Values

    If you use the default property values in the configuration file described in this section,

    • you will give users read and write access to tables that are not registered in Immuta and

    • results for SHOW queries will not be filtered on table metadata.

    These default settings help ensure that a new Starburst integration installation is minimally disruptive for existing Starburst deployments, allowing you to then add Immuta data sources and update configuration to enforce more controls as you see fit.

    However, the access-control.config-files property can be configured to allow Immuta to work with existing Starburst installations that have already configured an access control provider. For example, if the Starburst integration is configured to allow users write access to tables that are not protected by Immuta, you can still lock down write access for specific non-Immuta tables using an additional access control provider.

    circle-exclamation

    TLS Certificate Generation

    If you provided your own TLS certificates during Immuta installation, you must ensure that the hostname in your certificate matches the hostname specified in the Starburst (Trino) configuration.

    If you did not provide your own TLS certificates, Immuta generated these certificates for you during installation. See notes about your specific deployment method below for details.

    • : Immuta generates a local certificate authority (CA) that signs certificates for each service by default. Ensure that the externalHostname you specified in the Immuta Enterprise Helm chart matches the Immuta hostname name specified in the Starburst (Trino) configuration.

    If the hostnames in your certificate don't match the hostname specified in your Starburst (Trino) integration, you can set immuta.disable-hostname-verification to true in the Immuta access control config file to get the integration working in the interim.

    The Starburst (Trino) integration uses the immuta.ca-file property to communicate with Immuta. When configuring the plugin in Starburst (outlined below), specify a path to your CA file using the immuta.ca-file property in the Immuta access control configuration file.

    1. Create the Immuta access control configuration file in the Starburst configuration directory (/etc/starburst/immuta-access-control.properties for Docker installations or <starburst_install_directory>/etc/immuta-access-control.properties for standalone installations).

      The table below describes the properties that can be set during configuration.

      Property
      Starburst version
      Required or optional
      Description
    2. Enable the Immuta access control plugin in Starburst's configuration file (/etc/starburst/config.properties for Docker installations or <starburst_install_directory>/etc/config.properties for standalone installations). For example,

    hashtag
    Example Immuta System Access Control Configuration

    The example configuration snippet below uses the default configuration settings for immuta.allowed.immuta.datasource.operations and immuta.allowed.non.immuta.datasource.operations, which allow read access for data registered as Immuta data sources and read and write access on data that is not registered in Immuta. See the Granting Starburst (Trino) privileges section for details about customizing and enforcing read and write access controls in Starburst.

    hashtag
    3 - Add Starburst Users to Immuta

    1. Configure your external IAM to add users to Immuta.

    2. Map their Starburst usernames when configuring your IAM (or map usernames manually) to Immuta.

      • All Starburst users must map to Immuta users or match the immuta.user.admin regex configured on the cluster, and their Starburst username must be mapped to Immuta so they can query policy-enforced data.

      • A user impersonating a different user in Starburst requires the IMPERSONATE_USER permission in Immuta. Both users must be mapped to an Immuta user, or the querying user must match the configured immuta.user.admin regex.

    hashtag
    4 - Register data

    Register Starburst (Trino) data in Immuta.

    hashtag
    Trino Cluster Configuration

    hashtag
    1 - Enable the Integration

    1. Click the App Settings icon in the left sidebar.

    2. Click the Integrations tab.

    3. Click Add Integration and select Trino from the dropdown menu.

    4. Click Save.

    hashtag
    OAuth Authentication

    If you are using OAuth or asynchronous authentication to create Starburst (Trino) data sources, configure the globalAdminUsername property in the advanced configuration section of the Immuta app settings page.

    1. Click the App Settings page icon.

    2. Click Advanced Settings and scroll to Advanced Configuration.

    3. Paste the following YAML configuration snippet in the text box, replacing the email address below with your admin username:

    hashtag
    2 - Configure the Immuta System Access Control Plugin in Trino

    circle-info

    Default Configuration Property Values

    If you use the default property values in the configuration file described in this section,

    • you will give users read and write access to tables that are not registered in Immuta and

    • results for SHOW queries will not be filtered on table metadata.

    These default settings help ensure that a new Starburst integration installation is minimally disruptive for existing Trino deployments, allowing you to then add Immuta data sources and update configuration to enforce more controls as you see fit.

    However, the access-control.config-files property can be configured to allow Immuta to work with existing Trino installations that have already configured an access control provider. For example, if the Starburst (Trino) integration is configured to allow users write access to tables that are not protected by Immuta, you can still lock down write access for specific non-Immuta tables using an additional access control provider.

    circle-exclamation

    TLS Certificate Generation

    If you provided your own TLS certificates during Immuta installation, you must ensure that the hostname in your certificate matches the hostname specified in the Starburst (Trino) configuration.

    If you did not provide your own TLS certificates, Immuta generated these certificates for you during installation. See notes about your specific deployment method below for details.

    • : Immuta generates a local certificate authority (CA) that signs certificates for each service by default. Ensure that the externalHostname you specified in the Immuta Helm Chart matches the Immuta hostname name specified in the Starburst (Trino) configuration.

    If the hostnames in your certificate don't match the hostname specified in your Starburst (Trino) integration, you can set immuta.disable-hostname-verification to true in the Immuta access control config file to get the integration working in the interim.

    The Starburst (Trino) integration uses the immuta.ca-file property to communicate with Immuta. When configuring the plugin in Starburst (outlined below), specify a path to your CA file using the immuta.ca-file property in the Immuta access control configuration file.

    1. The Immuta Trino plugin version matches the version of the corresponding Trino releases. For example, the Immuta plugin version supporting Trino version 403 is simply version 403. Navigate to the Immuta GitHub repositoryarrow-up-right for a list of supported Trino versions. Immuta follows Starburst's release cyclearrow-up-right, but you can contact your Immuta representative for a specific Trino OSS release.

    2. Download the assets for the release that corresponds to your Trino version.

    3. Enable Immuta on your cluster. Select the tab below that corresponds to your installation method for instructions:

    Docker installations

    1. Follow Trino's documentationarrow-up-right to install the plugin archive on all nodes in your cluster.

    2. Create the Immuta access control configuration file in the Trino configuration directory: /etc/trino/immuta-access-control.properties.

    chevron-rightimmuta-trino Docker imagehashtag

    For Trino versions 414 and newer, an immuta-trino Docker image that includes the Trino plugin jars is available from ocir.immuta.com. Before using this image, consider the following factors:

    • This image was designed to provide a method for organizations to quickly set up and validate the integration, so it should be used in a development environment. Use the Docker installation method above for production environments.

    • Immuta only supports the Immuta Trino plugin on the Docker image, not any other software packaged on the image.

    Standalone installations

    1. Follow to install the plugin archive on all nodes in your cluster.

    2. Create the Immuta access control configuration file in the Trino configuration directory: <trino_install_directory>/etc/immuta-access-control.properties.

    1. Configure the properties described in the table below.

    Property
    Trino version
    Required or optional
    Description

    access-control.name

    392 and newer

    Required

    This property enables the integration.

    access-control.config-files

    392 and newer

    Optional

    Trino allows you to enable multiple system access control providers at the same time. To do so, add providers to this property as comma-separated values. This approach allows Immuta to work with existing Trino installations that have already configured an access control provider. Immuta does not manage all permissions in Trino and will default to allowing access to anything Immuta does not manage so that the Starburst (Trino) integration complements existing controls. For example, if the Starburst (Trino) integration is configured to allow users write access to tables that are not protected by Immuta, you can still lock down write access for specific non-Immuta tables using an additional access control provider.

    immuta.allowed.immuta.datasource.operations

    413 and newer

    Optional

    1. Enable the Immuta access control plugin in Trino's configuration file (/etc/trino/config.properties for Docker installations or <trino_install_directory>/etc/config.properties for standalone installations). For example,

    hashtag
    Example Immuta System Access Control Configuration

    The example configuration snippet below uses the default configuration settings for immuta.allowed.immuta.datasource.operations and immuta.allowed.non.immuta.datasource.operations, which allow read access for data registered as Immuta data sources and read and write access on data that is not registered in Immuta. See the Granting Starburst (Trino) privileges section for details about customizing and enforcing read and write access controls in Starburst.

    hashtag
    3 - Add Trino Users to Immuta

    1. Configure your external IAM to add users to Immuta.

    2. Map their Trino usernames when configuring your IAM (or map usernames manually) to Immuta.

      • All Trino users must map to Immuta users or match the immuta.user.admin regex configured on the cluster, and their Trino username must be mapped to Immuta so they can query policy-enforced data.

      • A user impersonating a different user in Trino requires the IMPERSONATE_USER permission in Immuta. Both users must be mapped to an Immuta user, or the querying user must match the configured immuta.user.admin regex.

    hashtag
    4 - Register data

    Register Starburst (Trino) data in Immuta.

    Starburst Cluster Configuration
    Trino Cluster Configuration
    \c immuta
    CREATE EXTENSION pgcrypto;
    oc new-project immuta
    oc get project immuta --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
    oc get project immuta --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
    oc run pgclient \
        --stdin \
        --tty \
        --rm \
        --image docker.io/bitnami/postgresql -- \
        psql --host <postgres-fqdn> --username postgres --port 5432 --password
    CREATE ROLE immuta with login encrypted password '<postgres-password>';
    
    GRANT immuta TO CURRENT_USER;
    
    CREATE DATABASE immuta OWNER immuta;
    
    GRANT all ON DATABASE immuta TO immuta;
    ALTER ROLE immuta SET search_path TO bometadata,public;
    global:
      imageRegistry: ocir.immuta.com
      imagePullSecrets:
        - name: immuta-oci-registry
      imageRepositoryMap:
        immuta/immuta-service: stable/immuta-service
        immuta/immuta-db: stable/immuta-db
        immuta/immuta-fingerprint: stable/immuta-fingerprint
        immuta/audit-service: stable/audit-service
        immuta/audit-export-cronjob: stable/audit-export-cronjob
        immuta/classify-service: stable/classify-service
        immuta/cache: stable/cache
    
    audit:
      enabled: false
    
      deployment:
        podSecurityContext:
          # A number that is within the project range:
          #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
          runAsUser: <user-id>
          # A number that is within the project range:
          #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
          runAsGroup: <group-id>
          seccompProfile:
            type: RuntimeDefault
    
        containerSecurityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
              - ALL
    
    discover:
      deployment:
        podSecurityContext:
          # A number that is within the project range:
          #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
          runAsUser: <user-id>
          # A number that is within the project range:
          #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
          runAsGroup: <group-id>
          seccompProfile:
            type: RuntimeDefault
    
        containerSecurityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
              - ALL
    
    secure:
      extraEnvVars:
        - name: FeatureFlag_AuditService
          value: "false"
        - name: FeatureFlag_detect
          value: "false"
        - name: FeatureFlag_auditLegacyViewHide
          value: "false"
    
      ingress:
        enabled: false
        tls: false
    
      postgresql:
        host: <postgres-fqdn>
        port: 5432
        database: immuta
        username: immuta
        password: <postgres-password>
        ssl: true
    
      web:
        podSecurityContext:
          # A number that is within the project range:
          #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
          runAsUser: <user-id>
          # A number that is within the project range:
          #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
          runAsGroup: <group-id>
          seccompProfile:
            type: RuntimeDefault
    
        containerSecurityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
              - ALL
    
      backgroundWorker:
        podSecurityContext:
          # A number that is within the project range:
          #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.uid-range"}}{{"\n"}}'
          runAsUser: <user-id>
          # A number that is within the project range:
          #   oc get project <project-name> --output template='{{index .metadata.annotations "openshift.io/sa.scc.supplemental-groups"}}{{"\n"}}'
          runAsGroup: <group-id>
          seccompProfile:
            type: RuntimeDefault
    
        containerSecurityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
              - ALL
    helm install immuta immuta/immuta-enterprise \
        --values immuta-values.yaml
    oc wait --for=condition=Ready pods --all
    oc get service --selector "app.kubernetes.io/component=secure" --output template='{{ .metadata.name }}'
    oc port-forward service/<name> 8080:http
    kubectl create namespace immuta
    kubectl config set-context --current --namespace=immuta
    oc create secret docker-registry immuta-oci-registry \
        --docker-server=https://ocir.immuta.com \
        --docker-username="<username>" \
        --docker-password="<token>" \
        [email protected]
    auth:
        database: immuta
        username: immuta
        password: <postgres-password>
    helm install pg-db oci://registry-1.docker.io/bitnamicharts/postgresql \
        --values pg-values.yaml
    kubectl wait --for=condition=Ready pods --all
    kubectl get pod --selector "app.kubernetes.io/name=postgresql" --output template='{{ .metadata.name }}'
    global:
      imageRegistry: ocir.immuta.com
      imagePullSecrets:
        - name: immuta-oci-registry
      imageRepositoryMap:
        immuta/immuta-service: stable/immuta-service
        immuta/immuta-db: stable/immuta-db
        immuta/immuta-fingerprint: stable/immuta-fingerprint
        immuta/audit-service: stable/audit-service
        immuta/audit-export-cronjob: stable/audit-export-cronjob
        immuta/classify-service: stable/classify-service
        immuta/cache: stable/cache
    
    audit:
      enabled: false
    
    secure:
      ingress:
        enabled: false
      extraEnvVars:
        - name: FeatureFlag_AuditService
          value: "false"
        - name: FeatureFlag_detect
          value: "false"
        - name: FeatureFlag_auditLegacyViewHide
          value: "false"
    
      postgresql:
        # Each Kubernetes Service has a DNS record associated with it. See: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
        # The anatomy of a domain name is as follows:
        #   <service>.<namespace>.svc.<cluster-domain>
        #
        # Where the default cluster domain is: cluster.local
        host: pg-db-postgresql.immuta.svc.cluster.local
        port: 5432
        database: immuta
        username: immuta
        password: <postgres-password>
        ssl: true
    helm install immuta immuta/immuta-enterprise \
        --values immuta-values.yaml
    kubectl wait --for=condition=Ready pods --all
    kubectl get service --selector "app.kubernetes.io/component=secure" --output template='{{ .metadata.name }}'
    kubectl port-forward service/<name> 8080:http
    kubectl create namespace immuta
    kubectl config set-context --current --namespace=immuta
    kubectl create secret docker-registry immuta-oci-registry \
        --docker-server=https://ocir.immuta.com \
        --docker-username="<username>" \
        --docker-password="<token>" \
        [email protected]
    kubectl run pgclient \
        --stdin \
        --tty \
        --rm \
        --image docker.io/bitnami/postgresql -- \
        psql --host <postgres-fqdn> --username postgres --port 5432 --password
    CREATE ROLE immuta with login encrypted password '<postgres-password>';
    
    GRANT immuta TO CURRENT_USER;
    
    CREATE DATABASE immuta OWNER immuta;
    
    GRANT all ON DATABASE immuta TO immuta;
    ALTER ROLE immuta SET search_path TO bometadata,public;
    global:
      imageRegistry: ocir.immuta.com
      imagePullSecrets:
        - name: immuta-oci-registry
      imageRepositoryMap:
        immuta/immuta-service: stable/immuta-service
        immuta/immuta-db: stable/immuta-db
        immuta/immuta-fingerprint: stable/immuta-fingerprint
        immuta/audit-service: stable/audit-service
        immuta/audit-export-cronjob: stable/audit-export-cronjob
        immuta/classify-service: stable/classify-service
        immuta/cache: stable/cache
    
    audit:
      enabled: false
    
    secure:
      ingress:
        enabled: false
        tls: false
      extraEnvVars:
        - name: FeatureFlag_AuditService
          value: "false"
        - name: FeatureFlag_detect
          value: "false"
        - name: FeatureFlag_auditLegacyViewHide
          value: "false"
    
      postgresql:
        host: <postgres-fqdn>
        port: 5432
        database: immuta
        username: immuta
        password: <postgres-password>
        ssl: true
    helm install immuta immuta/immuta-enterprise \
        --values immuta-values.yaml
    kubectl wait --for=condition=Ready pods --all
    kubectl get service --selector "app.kubernetes.io/component=secure" --output template='{{ .metadata.name }}'
    kubectl port-forward service/<name> 8080:http
    REVOKE immuta FROM CURRENT_USER;
    echo <token> | helm registry login --password-stdin --username <username> ocir.immuta.com
    echo <token> | helm registry login --password-stdin --username <username> ocir.immuta.com
    trino:
      globalAdminUsername: "[email protected]"
    trino:
      globalAdminUsername: "[email protected]"
    access-control.config-files=/etc/trino/immuta-access-control.properties
    access-control.config-files=/etc/starburst/immuta-access-control.properties
    # Enable the Immuta System Access Control (v2) implementation.
    access-control.name=immuta
    
    # The Immuta endpoint that was displayed when enabling the Starburst integration in Immuta.
    immuta.endpoint=http://service.immuta.com:3000
    
    # The Immuta API key that was displayed when enabling the Starburst integration in Immuta.
    immuta.apikey=45jdljfkoe82b13eccfb9c
    
    # The administrator user regex. Starburst usernames matching this regex will not be subject to
    # Immuta policies. This regex should match the user name provided at Immuta data source
    # registration.
    immuta.user.admin=immuta_system_account
    
    # Optional argument (default is shown).
    # A CSV list of operations allowed on schemas/tables registered as Immuta data sources.
    immuta.allowed.immuta.datasource.operations=READ
    
    # Optional argument (default is shown).
    # A CSV list of operations allowed on schemas/tables not registered as Immuta data sources.
    # Set to empty to allow no operations on non-Immuta data sources.
    immuta.allowed.non.immuta.datasource.operations=READ,WRITE
    
    # Optional argument (default is shown).
    # Controls table metadata filtering for inaccessible tables.
    #   - When this property is enabled and non-Immuta reads are also enabled, a user performing
    #     'show catalogs/schemas/tables' will not see metadata for a table that is registered as
    #     an Immuta data source but the user does not have access to through Immuta.
    #   - When this property is enabled and non-Immuta reads and writes are disabled, a user
    #     performing 'show catalogs/schemas/tables' will only see metadata for tables that the
    #     user has access to through Immuta.
    #   - When this property is disabled, a user performing 'show catalogs/schemas/tables' can see
    #     all metadata.
    immuta.filter.unallowed.table.metadata=false
    # Enable the Immuta System Access Control (v2) implementation.
    access-control.name=immuta
    
    # The Immuta endpoint that was displayed when enabling the Starburst integration in Immuta.
    immuta.endpoint=http://service.immuta.com:3000
    
    # The Immuta API key that was displayed when enabling the Starburst integration in Immuta.
    immuta.apikey=45jdljfkoe82b13eccfb9c
    
    # The administrator user regex. Starburst usernames matching this regex will not be subject to
    # Immuta policies. This regex should match the user name provided at Immuta data source
    # registration.
    immuta.user.admin=immuta_system_account
    
    # Optional argument (default is shown).
    # A CSV list of operations allowed on schemas/tables registered as Immuta data sources.
    immuta.allowed.immuta.datasource.operations=READ
    
    # Optional argument (default is shown).
    # A CSV list of operations allowed on schemas/tables not registered as Immuta data sources.
    # Set to empty to allow no operations on non-Immuta data sources.
    immuta.allowed.non.immuta.datasource.operations=READ,WRITE
    
    # Optional argument (default is shown).
    # Controls table metadata filtering for inaccessible tables.
    #   - When this property is enabled and non-Immuta reads are also enabled, a user performing
    #     'show catalogs/schemas/tables' will not see metadata for a table that is registered as
    #     an Immuta data source but the user does not have access to through Immuta.
    #   - When this property is enabled and non-Immuta reads and writes are disabled, a user
    #     performing 'show catalogs/schemas/tables' will only see metadata for tables that the
    #     user has access to through Immuta.
    #   - When this property is disabled, a user performing 'show catalogs/schemas/tables' can see
    #     all metadata.
    immuta.filter.unallowed.table.metadata=false

    380

    All

    The setup script this user runs creates the IMMUTA_SYSTEM_ACCOUNT user that Immuta will use to manage the integration.

    MANAGE GRANTS ON ACCOUNT

    Setup user

    All

    The user configuring the integration must be able to GRANT global privileges and access to objects within the Snowflake account. All privileges that are documented here are granted to the IMMUTA_SYSTEM_ACCOUNT user by this setup user.

    OWNERSHIP ON ROLE IMMUTA_IMPERSONATOR_ROLE

    IMMUTA_SYSTEM_ACCOUNT user

    Impersonation

    If impersonation is enabled, Immuta must be able to manage the Snowflake roles used for impersonation, which is created when the setup script runs, in order to manage the impersonation feature.

    • ALL PRIVILEGES ON DATABASE IMMUTA_DB

    • ALL PRIVILEGES ON ALL SCHEMAS IN DATABASE IMMUTA_DB

    • USAGE ON FUTURE PROCEDURES IN SCHEMA IMMUTA_DB.IMMUTA_PROCEDURES

    IMMUTA_SYSTEM_ACCOUNT user

    All

    The setup script grants the Immuta system account user these privileges because Immuta must have full ownership of the Immuta database where Immuta objects are managed.

    USAGE ON WAREHOUSE IMMUTA_WH

    IMMUTA_SYSTEM_ACCOUNT user

    All

    To make changes to state in the Immuta database, Immuta requires access to compute (a Snowflake warehouse). Some state changes are DDL operations, and others are DML and require compute.

    IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE

    IMMUTA_SYSTEM_ACCOUNT user

    Audit

    To ingest audit information from Snowflake, Immuta must have access to the SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY view. See the Snowflake documentationarrow-up-right for details.

    • APPLY MASKING POLICY ON ACCOUNT

    • APPLY ROW ACCESS POLICY ON ACCOUNT

    IMMUTA_SYSTEM_ACCOUNT user

    Snowflake integration with governance features enabled

    Immuta must be able to apply policies to objects throughout your organization's Snowflake account and query for existing policies on objects using the POLICY_REFERENCES table functionarrow-up-right.

    MANAGE GRANTS ON ACCOUNT

    IMMUTA_SYSTEM_ACCOUNT user

    Table grants

    Immuta must be able to MANAGE GRANTS on objects throughout your organization's Snowflake account.

    CREATE ROLE ON ACCOUNT

    IMMUTA_SYSTEM_ACCOUNT user

    Table grants

    When using the table grants feature, Immuta must be able to create roles as targets for Immuta subscription policy permissions in your organization’s Snowflake account.

    • USAGE on all databases and schemas with registered data sources

    • REFERENCES on all tables and views registered in Immuta

    Metadata registration user

    Data source registration

    Immuta must be able to see metadata on securables to register them as data sources and populate the data dictionary.

    SELECT on all tables and views registered in Immuta

    Metadata registration user

    Sensitive data discovery and specialized masking policies that require fingerprinting

    Immuta must have this privilege to run the necessary queries for sensitive data discovery on your data sources.

    APPLY TAG ON ACCOUNT

    Metadata registration user

    Tag ingestion

    To ingest table, view, and column tag information from Snowflake, Immuta must have this permission. Immuta reads from the TAG_REFERENCES table functionarrow-up-right.

    IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE

    Metadata registration user

    Tag ingestion

    To ingest table, view, and column tag information from Snowflake, Immuta must have access to the SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY view. See the Snowflake documentationarrow-up-right for details.

    • USAGE ON DATABASE IMMUTA_DB

    • USAGE ON SCHEMA IMMUTA_DB.IMMUTA_PROCEDURES

    • USAGE ON SCHEMA IMMUTA_DB.IMMUTA_FUNCTIONS

    • USAGE ON FUTURE FUNCTIONS IN SCHEMA IMMUTA_DB.IMMUTA_FUNCTIONS

    • USAGE ON SCHEMA IMMUTA_DB.IMMUTA_SYSTEM

    • SELECT ON IMMUTA_DB.IMMUTA_SYSTEM.USER_PROFILE

    PUBLIC role

    All

    Immuta has stored procedures and functions that are used for policy enforcement and do not expose or contain any sensitive information. These objects must be accessible by all users to facilitate the use and creation of policies or views to enforce Immuta policies in Snowflake.

    SELECT ON IMMUTA_DB.IMMUTA_SYSTEM.ALLOW_LIST

    PUBLIC role

    All

    Immuta retains a list of excepted roles and users when using the Snowflake integration. The roles and users in this list will be exempt from policies applied to tables in Snowflake to give organizations flexibility in case there are entities that should not be bound to Immuta policies in Snowflake (for example, a system or application role or user).

    Snowflake tag ingestion
    user's attributes change
    Snowflake governance columnarrow-up-right
    row access policiesarrow-up-right
    Snowflake table grants
    SELECT privilegearrow-up-right
    Query audit
    Multiple Snowflake instances
    Snowflake low row access policy mode
    Snowflake table grants
    Snowflake views are not automatically updated based on backing table changesarrow-up-right
    manually run the column detection job
    disabled
    Understanding row access policiesarrow-up-right

    Required

    This should be set to the Immuta API key displayed when enabling the integration on the app settings page.

    immuta.audit.legacy.enabled

    435 and newer

    Optional

    This property allows you to turn off Starburst (Trino) audit. Must set both immuta.audit.legacy.enabled and immuta.audit.uam.enabled to false to fully disable query audit.

    immuta.audit.uam.enabled

    435 and newer

    Optional

    This property allows you to turn off Starburst (Trino) audit. Must set both immuta.audit.legacy.enabled and immuta.audit.uam.enabled to false to fully disable query audit.

    immuta.ca-file

    392 and newer

    Optional

    This property allows you to specify a path to your CA file.

    immuta.cache.views.seconds

    392 and newer

    Optional

    Amount of time in seconds for which a user's specific representation of an Immuta data source will be cached for. Changing this will impact how quickly policy changes are reflected for users actively querying Starburst. By default, cache expires after 30 seconds.

    immuta.cache.datasource.seconds

    392 and newer

    Optional

    Amount of time in seconds for which a user's available Immuta data sources will be cached for. Changing this will impact how quickly data sources will be available due to changing projects or subscriptions. By default, cache expires after 30 seconds.

    immuta.endpoint

    392 and newer

    Required

    The protocol and fully qualified domain name (FQDN) for the Immuta instance used by Starburst (for example, https://my.immuta.instance.io). This should be set to the endpoint displayed when enabling the integration on the app settings page.

    immuta.filter.unallowed.table.metadata

    392 and newer

    Optional

    When set to false, Immuta won't filter unallowed table metadata, which helps ensure Immuta remains noninvasive and performant. If this property is set to true, running show catalogs, for example, will reflect what that user has access to instead of returning all catalogs. By default, this property is set to false.

    immuta.group.admin

    420 and newer

    Required if immuta.user.admin is not set

    This property identifies the Starburst group that is the Immuta administrator. The users in this group will not have Immuta policies applied to them. Therefore, data sources should be created by users in this group so that they have access to everything. This property can be used in conjunction with the immuta.user.admin property, and regex filtering can be used (with a | delimiter at the end of each expression) to assign multiple groups as the Immuta administrator. Note that you must escape regex special characters (for example, john\\.doe+svcacct@immuta\\.com).

    immuta.user.admin

    392 and newer

    Required if immuta.group.admin is not set

    This property identifies the Starburst user who is an Immuta administrator (for example, immuta.user.admin=immuta_system_account). This user will not have Immuta policies applied to them because this account will run the subqueries. Therefore, data sources should be created by this user so that they have access to everything. This property can be used in conjunction with the immuta.group.admin property, and regex filtering can be used with a | delimiter at the end of each expression) to assign multiple users as the Immuta administrator. Note that you must escape regex special characters (for example, john\\.doe+svcacct@immuta\\.com).

    If you experience an issue with the image outside of the scope of the Immuta plugin, you must rebuild your own version of the image using the Docker installation method above.

    To use this image,

    1. Pull the image and start the container. The example below specifies the Immuta Trino plugin version 414 with the 414 tag, but any supported Trino version newer than 414 can be used:

    2. Create the Immuta access control configuration file in the Trino configuration directory: /etc/trino/immuta-access-control.properties.

    access-control.name

    392 and newer

    Required

    This property enables the integration.

    access-control.config-files

    392 and newer

    Optional

    Starburst allows you to enable multiple system access control providers at the same time. To do so, add providers to this property as comma-separated values. Immuta has tested the Immuta system access control provider alongside the Starburst built-in access control systemarrow-up-right. This approach allows Immuta to work with existing Starburst installations that have already configured an access control provider. Immuta does not manage all permissions in Starburst and will default to allowing access to anything Immuta does not manage so that the Starburst integration complements existing controls. For example, if the Starburst integration is configured to allow users write access to tables that are not protected by Immuta, you can still lock down write access for specific non-Immuta tables using an additional access control provider.

    immuta.allowed.immuta.datasource.operations

    413 and newer

    Optional

    This property defines a comma-separated list of allowed operations for Starburst (Trino) users on tables registered as Immuta data sources: READ,WRITE, and OWN. (See the Customize read and write access policies for Starburst (Trino) guide for details about the OWN operation.) When set to WRITE, all querying users are allowed read and write operations to data source schemas and tables. By default, this property is set to READ, which blocks write operations on data source tables and schemas. If write policies are enabled for your Immuta tenant, this property is set to READ,WRITE by default, so users are allowed read and write operations to data source schemas and tables.

    immuta.allowed.non.immuta.datasource.operations

    392 and newer

    Optional

    This property defines a comma-separated list of allowed operations users will have on tables not registered as Immuta data sources: READ, WRITE, CREATE, and OWN. (See the Customize read and write access policies for Starburst (Trino) guide for details about CREATE and OWN operations.) When set to READ, users are allowed read operations on tables not registered as Immuta data sources. When set to WRITE, users are allowed read and write operations on tables not registered as Immuta data sources. If this property is left empty, users will not get access to any tables outside Immuta. By default, this property is set to READ,WRITE. If write policies are enabled for your Immuta tenant, this property is set to READ,WRITE,OWN,CREATE by default.

    immuta.apikey

    This property defines a comma-separated list of allowed operations for Starburst (Trino) users on tables registered as Immuta data sources: READ,WRITE, and OWN. (See the Customize read and write access policies for Starburst (Trino) guide for details about the OWN operation.) When set to WRITE, all querying users are allowed read and write operations to data source schemas and tables. By default, this property is set to READ, which blocks write operations on data source tables and schemas. If write policies are enabled for your Immuta tenant, this property is set to READ,WRITE by default, so users are allowed read and write operations to data source schemas and tables.

    immuta.allowed.non.immuta.datasource.operations

    392 and newer

    Optional

    This property defines a comma-separated list of allowed operations users will have on tables not registered as Immuta data sources: READ, WRITE, CREATE, and OWN. (See the Customize read and write access policies for Starburst (Trino) guide for details about CREATE and OWN operations.) When set to READ, users are allowed read operations on tables not registered as Immuta data sources. When set to WRITE, users are allowed read and write operations on tables not registered as Immuta data sources. If this property is left empty, users will not get access to any tables outside Immuta. By default, this property is set to READ,WRITE. If write policies are enabled for your Immuta tenant, this property is set to READ,WRITE,OWN,CREATE by default.

    immuta.apikey

    392 and newer

    Required

    This should be set to the Immuta API key displayed when enabling the integration on the app settings page.

    immuta.audit.legacy.enabled

    435 and newer

    Optional

    This property allows you to turn off Starburst (Trino) audit. Must set both immuta.audit.legacy.enabled and immuta.audit.uam.enabled to false to fully disable query audit.

    immuta.audit.uam.enabled

    435 and newer

    Optional

    This property allows you to turn off Starburst (Trino) audit. Must set both immuta.audit.legacy.enabled and immuta.audit.uam.enabled to false to fully disable query audit.

    immuta.ca-file

    392 and newer

    Optional

    This property allows you to specify a path to your CA file.

    immuta.cache.views.seconds

    392 and newer

    Optional

    Amount of time in seconds for which a user's specific representation of an Immuta data source will be cached for. Changing this will impact how quickly policy changes are reflected for users actively querying Trino. By default, cache expires after 30 seconds.

    immuta.cache.datasource.seconds

    392 and newer

    Optional

    Amount of time in seconds for which a user's available Immuta data sources will be cached for. Changing this will impact how quickly data sources will be available due to changing projects or subscriptions. By default, cache expires after 30 seconds.

    immuta.endpoint

    392 and newer

    Required

    The protocol and fully qualified domain name (FQDN) for the Immuta instance used by Trino (for example, https://my.immuta.instance.io). This should be set to the endpoint displayed when enabling the integration on the app settings page.

    immuta.filter.unallowed.table.metadata

    392 and newer

    Optional

    When set to false, Immuta won't filter unallowed table metadata, which helps ensure Immuta remains noninvasive and performant. If this property is set to true, running show catalogs, for example, will reflect what that user has access to instead of returning all catalogs. By default, this property is set to false.

    immuta.group.admin

    420 and newer

    Required if immuta.user.admin is not set

    This property identifies the Trino group that is the Immuta administrator. The users in this group will not have Immuta policies applied to them. Therefore, data sources should be created by users in this group so that they have access to everything. This property can be used in conjunction with the immuta.user.admin property, and regex filtering can be used (with a | delimiter at the end of each expression) to assign multiple groups as the Immuta administrator. Note that you must escape regex special characters (for example, john\\.doe+svcacct@immuta\\.com).

    immuta.user.admin

    392 and newer

    Required if immuta.group.admin is not set

    This property identifies the Trino user who is an Immuta administrator (for example, immuta.user.admin=immuta_system_account). This user will not have Immuta policies applied to them because this account will run the subqueries. Therefore, data sources should be created by this user so that they have access to everything. This property can be used in conjunction with the immuta.group.admin property, and regex filtering can be used with a | delimiter at the end of each expression) to assign multiple users as the Immuta administrator. Note that you must escape regex special characters (for example, john\\.doe+svcacct@immuta\\.com).

    Kubernetes Deployment
    Kubernetes Deployment
    Trino's documentationarrow-up-right

    392 and newer

    Learn more about the best practices for Immuta in production
    Learn more about the best practices for Immuta in production
    support.immuta.comarrow-up-right
    docker run ocir.immuta.com/immuta/immuta-trino:414
    oc project immuta
    oc create secret docker-registry immuta-oci-registry \
        --docker-server=https://ocir.immuta.com \
        --docker-username="<username>" \
        --docker-password="<token>" \
        [email protected]
    REVOKE immuta FROM CURRENT_USER;
    \c immuta
    CREATE EXTENSION pgcrypto;
    kubectl exec --stdin --tty pod/<database-pod-name> -- psql -U immuta
    ALTER ROLE immuta SET search_path TO bometadata,public;
    CREATE EXTENSION pgcrypto;