arrow-left
All pages
gitbookPowered by GitBook
triangle-exclamation
Couldn't generate the PDF for 169 pages, generation stopped at 100.
Extend with 50 more pages.
1 of 100

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Reference Guides

Reference Guides

Getting Started with the Amazon Redshift Integration

circle-info

Public preview: This feature is available to select accounts. Contact your Immuta representative to enable this feature.

The how-to guides linked on this page illustrate how to use Amazon Redshift with Immuta. See the reference guide for information about the Amazon Redshift integration.

1

Connect your technology

These guides provide instructions on getting your data set up in Immuta for the Marketplace and Governance apps.

  1. : Using a single setup process, connect Amazon Redshift to Immuta. This will register your data objects in Immuta and allow you to start dictating access through Marketplace or global policies.

  2. : Use domains to segment your data and assign responsibilities to the appropriate team members. These domains will then be used in Marketplace, policies, and audit.

2

Register your users

These guides provide instructions on getting your users set up in Immuta for the Marketplace and Governance apps.

  1. : Bring the IAM your organization already uses and allow Immuta to register your users for you.

  2. : Ensure the user IDs in Immuta and your data platform are aligned so that the right policies impact the right users. This step can be completed during initial configuration of your IAM or after it has been connected to Immuta.

3

Start using Marketplace

These guides provide instructions on using Marketplace for the first time.

  1. : Once you register your data and users, you can immediately start publishing data products in Marketplace.

  2. : Users must then request access to your data products in Marketplace.

4

Add data metadata

This guide provides instructions for getting your data metadata set up in Immuta for the Governance app.

  • : Bring the external catalog your organization already uses and allow Immuta to continually sync your tags with your data sources for you.

5

Start using the Governance app

These guides provide instructions on using the Governance app for the first time.

  1. : Once you add your data metadata to Immuta, you can immediately create policies that use your tags and apply to your data sources. Subscription policies can be created to dictate access to data sources.

  2. : Data metadata can also be used to create data policies that apply to data sources as they are registered in Immuta. Data policies dictate what data a user can see once they are granted access to a data source. By using catalog tags, you can create proactive policies that will apply to data sources as they are added to Immuta.

Accessing Data

Once data is registered through the Amazon Redshift connection, you will access your data through Amazon Redshift as you normally would. If you are subscribed to the data source, Immuta grants you access to the data in Amazon Redshift.

When you submit a query, the SQL client submits the query to Amazon Redshift, which then processes the query and determines what data your role is allowed to see. Then, Amazon Redshift queries the database and returns the query results to the SQL client, which then returns policy-enforced data to you.

The diagram below illustrates how Immuta, Amazon Redshift, and the SQL client interact when a user queries data registered in Immuta.

hashtag
Querying data

Because subscription policies are managed through roles, you must be acting under the role Immuta creates for you (immuta_<username>) to get access to the data sources you are subscribed to.

Amazon Redshift Spectrum Integration

In this integration, Immuta generates policy-enforced views in your configured Amazon Redshift schema for tables registered as Immuta data sources.

hashtag
Getting started

This guide outlines how to integrate Amazon Redshift Spectrum with Immuta.

hashtag
How-to guide

hashtag
Reference guide

: This guide describes the design of the integration and policy enforcement.

Amazon Redshift

Immuta offers two integrations for Amazon Redshift:

  • : In this integration, Immuta uses to configure the integration and register data objects in a single step. Once data is registered, Immuta can enforce access controls on that data.

  • : In this integration, Immuta generates policy-enforced views in your configured Redshift schema for tables registered as Immuta data sources. The integration is configured separately from data source registration.

Amazon Redshift Integration

circle-info

Public preview: This feature is available to select accounts. Contact your Immuta representative to enable this feature.

The Amazon Redshift integration allows you to configure your integration and register data from Amazon Redshift in Immuta in a single step. Once data is registered, Immuta can enforce access controls on that data.

hashtag

Register an Amazon Redshift Connection

circle-info

Public preview: This feature is available to select accounts. Contact your Immuta representative to enable this feature.

hashtag
Permissions

The user registering the connection must have the permissions below.

Protecting Data

In the Amazon Redshift integration, Immuta administers Amazon Redshift privileges on data registered in Immuta. Then, Immuta users who have been granted access to the data sources can query them.

The sequence diagram below outlines the events that occur when an Immuta user who is subscribed to a data source queries it in Amazon Redshift.

hashtag
Registering a connection

The Amazon Redshift integration is configured and data is registered through , an Immuta feature that allows administrators to register data objects in a technology through a single connection to make data registration more scalable for your organization.

Once the Amazon Redshift connection is registered, you can author subscription and data policies in Immuta to enforce access controls.

AWS Lake Formation

circle-info

Public preview: This feature is available to all accounts.

In the Lake Formation integration, Immuta orchestrates Lake Formation access controls on data registered in the Glue Data Catalog. Then, Immuta users who have been granted access to the Glue Data Catalog table or view can query it using one of these analytic engines:

  • Amazon Athena

Security and Compliance

Immuta offers several features to provide security for your users and to prove compliance and monitor for anomalies.

hashtag
Security

hashtag
Data processing and encryption

See the

Amazon EMR Spark

  • Amazon Redshift Spectrum

  • hashtag
    Getting started with AWS Lake Formation

    This getting started guide outlines how to integrate AWS Lake Formation with Immuta.

    hashtag
    How-to guide

    Register an AWS Lake Formation connection

    hashtag
    Reference guides

    • AWS Lake Formation: This guide describes the design and components of the integration.

    • Security and compliance: This guide provides an overview of the Immuta features that provide security for your users and that allow you to prove compliance and monitor for anomalies.

    • Protecting data: This guide provides an overview of how to protect AWS securables with Immuta policies.

    • Accessing data: This guide provides an overview of how AWS users access data registered in Immuta.

    Respond to an access request: To grant access to a data product and its data sources, respond to the access request.

    Configure audit: Once you have your data sources and users, and policies granting them access, you can set up audit export. This will export the audit logs from policy changes and tagging updates.

    Register your Amazon Redshift connection
    Organize your data sources into domains and assign domain permissions to accountable teams
    Connect an IAM
    Map external user IDs to Immuta
    Publish a data product
    Request access to a data product
    Connect an external catalog
    Author a global subscription policy
    Author a global data policy
    Configure an Amazon Redshift Spectrum integration
    Amazon Redshift Spectrum integration overview
    Amazon Redshift integration
    connections
    Amazon Redshift Spectrum integration
    This getting started guide outlines how to connect Amazon Redshift to Immuta.

    hashtag
    How-to guide

    Register an Amazon Redshift connection

    hashtag
    Reference guides

    • Amazon Redshift integration reference guide: This guide describes the design and components of the integration.

    • Security and compliance: This guide provides an overview of the Immuta features that provide security for your users and that allow you to prove compliance and monitor for anomalies.

    • Protecting data: This guide provides an overview of how to protect securables with Immuta policies.

    • Accessing data: This guide provides an overview of how Amazon Redshift users access data registered in Immuta.

    Getting started with the Amazon Redshift integration
    and the
    guides for information about transmission of policy decision data, encryption of data in transit and at rest, and encryption key management.

    hashtag
    Authentication

    hashtag
    Registering the connection

    The Amazon Redshift connection supports username and password authentication to register a connection. The credentials provided must be for an account with the permissions listed in the Register an Amazon Redshift connection guide.

    hashtag
    Identity providers for user authentication

    The built-in Immuta IAM can be used as a complete solution for authentication and user entitlement. However, you can connect your existing identity management provider to Immuta to use that system for authentication and user entitlement instead.

    Each of the supported identity providers includes a specific set of configuration options that enable Immuta to communicate with the IAM system and map the users, permissions, groups, and attributes into Immuta.

    See the Identity managers guide for a list of supported providers and details.

    See the Amazon Redshift integration reference guide for details about mapping user accounts to Immuta.

    hashtag
    Auditing and compliance

    Immuta provides governance reports so that data owners and governors can monitor users' access to data and detect anomalies in behavior.

    Immuta governance reports allow users with the GOVERNANCE Immuta permission to use a natural language builder to instantly create reports that delineate user activity across Immuta. These reports can be based on various entity types, including users, groups, projects, data sources, purposes, policy types, or connection types.

    See the Governance report types page for a list of report types and guidance.

    Data processing
    Encryption and masking practices

    Reference Guides

    How-to Guides

    APPLICATION_ADMIN Immuta permission

  • The Amazon Redshift user registering the connection must be a superuser or have the following Amazon Redshift privileges:

    • CREATEDB

    • CREATE USER

    • sys:secadmin role

    • USAGE on all databases and schemas that contain data you want to register

    • The following privileges WITH GRANT OPTION on objects registered in Immuta:

      • DELETE

      • INSERT

    For descriptions and explanations of privileges Immuta needs to enforce policies and maintain state in Amazon Redshift, see the .

  • hashtag
    Prerequisites

    Enable Amazon Redshift masking on data objects Immuta will protect using the ALTER TABLE command with the MASKING ON clause.

    See the Amazon Redshift documentationarrow-up-right for details.

    hashtag
    Create the database user

    1. Create a new database user in Redshift to serve as the Immuta system accountarrow-up-right. Immuta will use this system account continuously to crawl the connection.

    2. Grant this account the following Redshift privilegesarrow-up-right:

      • USAGE on all databases and schemas that contain data you want to register

      • CREATE ROLE

      • sys:secadmin role

      • The following privileges WITH GRANT OPTION on objects registered in Immuta:

        • DELETE

        • INSERT

    hashtag
    Create the exemption role

    1. Create a new role in Amazon Redshift called immuta_exemption.

    2. Grant any users who should be exempt from Immuta data policies to this role.

    hashtag
    Register the connection

    1. In your Amazon Redshift environment, create an Immuta database that Immuta can use to connect to your Amazon Redshift instance to register the connection and maintain state with Amazon Redshift.

      Having this separate database for Immuta prevents custom ETL processes or jobs deleting the database you use to register the connection, which would break the connection.

    2. In Immuta, click Data and select Connections in the navigation menu.

    3. Click the + Add Connection button.

    4. Select the Amazon Redshift tile.

    5. Enter the host connection information:

      1. Display Name: This is the name of your new connection. This name will be used in the API (connectionKey), in data source names from the host, and on the connections page.

      2. Hostname: URL of your Amazon Redshift instance.

      3. Port

    6. Enter the username and password of the .

    7. Click Save connection.

    hashtag
    Map users

    Requirement: USER_ADMIN Immuta permission

    Map Amazon Redshift usernames to each Immuta user account to ensure Immuta properly enforces policies.

    The instructions below illustrate how to do this for individual users, but you can also configure user mapping in your IAM connection on the app settings page.

    1. Click People and select Users in the navigation menu.

    2. Click the user's name to navigate to their page and scroll to the External User Mapping section.

    3. Click Edit in the Redshift User row.

    4. Enter the user's Redshift username.

    5. Click Save.

    See the Amazon Redshift integration reference guide for more details about registering a connection.

    hashtag
    Protecting data

    After data is registered in Immuta, you can author subscription and data policies in Immuta to enforce access controls.

    When a subscription policy is applied to a data source, users who meet the conditions of the policy will be automatically subscribed to the data source. Immuta creates roles for those users (if an Immuta-generated role for them does not already exist) and grants Amazon Redshift privileges to that role. Once a data policy is applied to a data source, Immuta generates a masking or row-level policy in Amazon Redshift and attaches that policy to the data object it applies to.

    Consider the following example that illustrates how Immuta enforces a subscription policy that only allows users in the analysts group to access the yellow-table. When this policy is authored and applied to the data source, Immuta issues a SQL statement in Amazon Redshift that grants the SELECT privilege on yellow-table to users registered in Immuta that are part of the analysts group.

    In the image above, the user in the analysts group accesses yellow-table , while the user who is a part of the research group is denied access. See the Subscription policies page or the Data policies page for guidance on applying policies to a data source. See the Amazon Redshift integration page for details about the supported policy types.

    connections

    Connect Integrations

    Immuta integrates with your data platforms so you can register your data and effectively manage access controls on that data.

    This section includes guidance for connecting your data platform and keeping it synced with Immuta.

    hashtag
    Integrations overview

    This reference guide outlines the features, policies, and audit capabilities of each data platform Immuta supports.

    hashtag
    Integrations

    The guides in these sections include information about how to connect your data platform to Immuta:

    hashtag

    This reference guide outlines the actions and features that trigger Immuta queries in your remote platform that may incur cost.

    hashtag

    Immuta integrates with your data platforms so you can register your data and effectively manage access controls on that data. This section includes concept, reference, and how-to guides for registering and managing data sources.

    Getting Started with the Amazon Redshift Spectrum Integration

    The how-to guides linked on this page illustrate how to integrate Amazon Redshift Spectrum with Immuta. See the reference guide for information about the Amazon Redshift Spectrum integration.

    1

    Connect your technology

    These guides provide instructions on getting your data set up in Immuta for the Marketplace and Governance apps.

    1. Configure your Amazon Redshift Spectrum integration: Configure an Amazon Redshift Spectrum integration with Immuta so that Immuta can create policy-protected views for your users to query.

    2. : This will register your data objects into Immuta and allow you to start dictating access through Marketplace or global policies.

    3. : Use domains to segment your data and assign responsibilities to the appropriate team members. These domains will then be used in Marketplace, policies, and identification.

    2

    Register your users

    These guides provide instructions on getting your users set up in Immuta for the Marketplace and Governance apps.

    1. : Bring the IAM your organization already uses and allow Immuta to register your users for you.

    2. : Ensure the user IDs in Immuta, Redshift, and your IAM are aligned so that the right policies impact the right users.

    3

    Start using Marketplace

    These guides provide instructions on using Marketplace for the first time.

    1. : Once you register your tables and users, you can immediately start publishing data products in Marketplace.

    2. : Users must then request access to your data products in Marketplace.

    4

    Add data metadata

    These guides provide instructions on getting your data metadata set up in Immuta for the Governance app.

    1. : Bring the external catalog your organization already uses and allow Immuta to continually sync your tags with your data sources for you.

    2. : Identification allows you to automate data tagging using identifiers that detect certain data patterns.

    5

    Start using the Governance app

    These guides provide instructions on using the Governance app for the first time.

    1. : Once you add your data metadata to Immuta, you can immediately create policies that utilize your tags and apply to your tables. Subscription policies can be created to dictate access to data sources.

    2. : Data metadata can also be used to create data policies that apply to data sources as they are registered in Immuta. Data policies dictate what data a user can see once they are granted access to a data source. Using catalog and identification tags you can create proactive policies, knowing that they will apply to data sources as they are added to Immuta with the automated tagging.

    Security and Compliance

    Immuta offers several features to provide security for your users and to prove compliance and monitor for anomalies.

    hashtag
    Security

    hashtag
    Data processing and encryption

    See the and the guides for more information about transmission of policy decision data, encryption of data in transit and at rest, and encryption key management.

    hashtag
    Authentication

    hashtag
    Registering the connection

    The Lake Formation integration supports the following authentication methods to register a connection:

    • Access using AWS IAM role (recommended): Immuta will assume this role when interacting with the AWS API. This option allows you to provide Immuta with an IAM role from your AWS account that is granted a trust relationship with Immuta's IAM role. Immuta will assume this IAM role from Immuta's AWS account in order to perform any operations in your AWS account.

    • Access using access key and secret access key: These credentials are used temporarily by Immuta to register the connection.

    hashtag
    Identity providers for user authentication

    The built-in Immuta IAM can be used as a complete solution for authentication and user entitlement. However, you can connect your existing identity management provider to Immuta to use that system for authentication and user entitlement instead.

    Each of the supported identity providers includes a specific set of configuration options that enable Immuta to communicate with the IAM system and map the users, permissions, groups, and attributes into Immuta.

    See the for a list of supported providers and details.

    See the for details about user provisioning and mapping AWS user accounts to Immuta.

    hashtag
    Auditing and compliance

    Immuta provides governance reports so that data owners and governors can monitor users' access to data and detect anomalies in behavior.

    Immuta governance reports allow users with the GOVERNANCE Immuta permission to use a natural language builder to instantly create reports that delineate user activity across Immuta. These reports can be based on various entity types, including users, groups, projects, data sources, purposes, policy types, or connection types.

    See the page for a list of report types and guidance.

    Protecting Data

    In the AWS Lake Formation integration, Immuta orchestrates Lake Formation access controls on data registered in the Glue Data Catalog. Then, Immuta users who have been granted access to the Glue Data Catalog table or view can query it using one of these analytic engines:

    • Amazon Athena

    • Amazon EMR Spark

    • Amazon Redshift Spectrum

    The sequence diagram below outlines the events that occur when an Immuta user who is subscribed to a data source submits a query in their AWS analytic engine.

    See the for more details about Lake Formation access controls.

    hashtag
    Registering a connection

    AWS Lake Formation is configured and data is registered through , an Immuta feature that allows administrators to register data objects in a technology through a single connection to make data registration more scalable for your organization.

    Once the Lake Formation connection is registered, you can author policies in Immuta to orchestrate Lake Formation access controls.

    See the for more details about registering a connection.

    hashtag
    Protecting data

    After Glue Data Catalog views and tables are registered in Immuta, you can author subscription policies in Immuta to orchestrate Lake Formation access controls. Once a subscription policy is applied, users can be subscribed to data sources in the following ways:

    • Manually subscribed: If a data owner , Immuta issues a grant directly to the data object in AWS.

    • Automatically subscribed through policy logic: When a policy is applied to a data source, users who meet the conditions of the policy will be automatically subscribed to the data source. Then, Immuta generates a Lake Formation tag and applies it to the corresponding data object in AWS and grants subscribers access to that tag, which in turn grants them access to the data. See the for details about this process.

    Consider the following example that illustrates how Immuta enforces a subscription policy that only allows users in the analysts group to access the yellow-table. When this policy is authored and applied to the data source, Immuta generates a Lake Formation (LF) tag that is applied to the Glue Data Catalog yellow-table and permissions on that tag are granted to all AWS users (registered in Immuta) that are part of the analysts group.

    In the image above, the user in the analysts group accesses yellow-table , while the user who is a part of the research group is denied access.

    See the for guidance on applying a subscription policy to a data source. See the page for details about the subscription policy types supported and permissions Immuta grants on securables registered as Immuta data sources.

    Getting Started with AWS Lake Formation

    circle-info

    Public preview: This feature is available to all accounts.

    The how-to guides linked on this page illustrate how to use AWS Lake Formation with Immuta. See the for information about the AWS Lake Formation integration.

    1

    Connect your technology

    These guides provide instructions on getting your data set up in Immuta for the Marketplace and Governance apps.

    SELECT
  • TRUNCATE

  • UPDATE

  • SELECT
  • TRUNCATE

  • UPDATE

  • : Port configured for Amazon Redshift.
  • Database: The Redshift database you created for Immuta. All databases in the host will be registered.

  • Amazon Redshift integration reference guide
    Amazon Redshift database user you created above
    Respond to an access request: To grant access to a data product and its tables, respond to the access request.
  • Configure audit: Once you have your data sources and users, and policies granting them access, you can set up audit export. This will export the audit logs from policy changes and tagging updates.

  • Register Amazon Redshift Spectrum data sources
    Organize your data sources into domains and assign domain permissions to accountable teams
    Connect an IAM
    Map external user IDs from Redshift to Immuta
    Publish a data product
    Request access to a data product
    Connect an external catalog
    Run identification
    Author a global subscription policy
    Author a global data policy
    Data processing
    Encryption and masking practices
    Identity managers guide
    AWS Lake Formation reference guide
    Governance report types
    AWS Lake Formation documentationarrow-up-right
    connections
    AWS Lake Formation reference guide
    manually adds a user to the data source
    AWS Lake Formation reference guide
    Author a subscription policy page
    Subscription policy access types
    If a user is automatically subscribed to the data source by a policy, Immuta creates a Lake Formation tag for that user and data source they are subscribed to. If the user is manually added to a data source by the data owner, Immuta grants direct access to the table in Lake Formation.
    Databricks: This section includes guides for Databricks Lakebase, Databricks Spark, and Databricks Unity Catalog integrations.
  • Google BigQuery

  • MariaDB

  • MySQL

  • Oracle

  • PostgreSQL

  • Snowflake

  • SQL Server

  • Starburst (Trino)

  • Teradata

  • Amazon Redshift
    Amazon S3
    AWS Lake Formation
    Azure Synapse Analytics
    Queries Immuta runs in remote platforms
    Connect your data

    Reference Guides

    How-to Guides

    Reference Guides

    How-to Guides

    Reference Guides

    Integration Settings

    Reference Guides

  • Register your AWS Lake Formation connection: Using a single setup process, connect AWS Lake Formation to Immuta. This will register your data objects in Immuta and allow you to start dictating access through Marketplace or global policies.

  • Organize your data sources into domains and assign domain permissions to accountable teams: Use domains to segment your data and assign responsibilities to the appropriate team members. These domains will then be used in Marketplace, policies, audit, and identification.

  • 2

    Register your users

    These guides provide instructions on getting your users set up in Immuta for the Marketplace and Governance apps.

    1. Connect an IAM: Bring the IAM your organization already uses and allow Immuta to register your users for you.

    2. Map external user IDs from AWS to Immuta: Ensure the user IDs in Immuta, AWS, and your IAM are aligned so that the right policies impact the right users.

    3

    Start using Marketplace

    These guides provide instructions on using Marketplace for the first time.

    1. Publish a data product: Once you register your tables and users, you can immediately start publishing data products in Marketplace.

    2. Request access to a data product: Users must then request access to your data products in Marketplace.

    3. : To grant access to a data product and its tables, respond to the access request.

    4

    Add data metadata

    These guides provide instructions on getting your data metadata set up in Immuta for the Governance app.

    1. Connect an external catalog: Bring the external catalog your organization already uses and allow Immuta to continually sync your tags with your data sources for you.

    2. Run identification: Identification allows you to automate data tagging using identifiers that detect certain data patterns.

    5

    Start using the Governance app

    These guides provide instructions on using the Governance app for the first time.

    1. Author a global subscription policy: Once you add your data metadata to Immuta, you can immediately create policies that utilize your tags and apply to your tables. Subscription policies can be created to dictate access to data sources.

    2. Configure audit: Once you have your data sources and users, and policies granting them access, you can set up audit export. This will export the audit logs from policy changes and tagging updates.

    reference guide

    Azure Synapse Analytics

    In this integration, Immuta generates policy-enforced views in a schema in your configured Azure Synapse Analytics Dedicated SQL pool for tables registered as Immuta data sources.

    hashtag
    Getting started

    This guide outlines how to integrate Azure Synapse Analytics with Immuta.

    hashtag
    How-to guide

    : Configure the integration in Immuta.

    hashtag
    Reference guide

    : This guide describes the design and components of the integration.

    Databricks Unity Catalog

    This integration allows you to manage and access data in your Databricks account across all of your workspaces. With Immuta’s Databricks Unity Catalog integration, you can write your policies in Immuta and have them enforced automatically by Databricks across data in your Unity Catalog metastore.

    hashtag
    Getting started

    This getting started guide outlines how to integrate Databricks Unity Catalog with Immuta.

    hashtag
    How-to guides

    • : Configure the Databricks Unity Catalog integration.

    • : Migrate from the legacy Databricks Spark integrations to the Databricks Unity Catalog integration.

    hashtag
    Reference guide

    : This guide describes the design and components of the integration.

    Migrating to Unity Catalog

    When you enable Unity Catalog, Immuta automatically migrates your existing Databricks data sources in Immuta to reference the legacy hive_metastore catalog to account for Unity Catalog's three-level hierarchy. New data sources will reference the Unity Catalog metastore you create and attach to your Databricks workspace.

    Because the hive_metastore catalog is not managed by Unity Catalog, existing data sources in the hive_metastore cannot have Unity Catalog access controls applied to them. Data sources in the Hive Metastore must be managed by the Databricks Spark integration.

    To allow Immuta to administer Unity Catalog access controls on that data, move the data to Unity Catalog and re-register those tables in Immuta by completing the steps below. If you don't move all data before configuring the integration, metastore magic will protect your existing data sources throughout the migration process.

    1. Ensure that all Databricks clusters that have Immuta installed are stopped and the Immuta configuration is removed from the cluster. Immuta-specific cluster configuration is no longer needed with the Databricks Unity Catalog integration.

    2. Move all data into Unity Catalog before configuring Immuta with Unity Catalog. Existing data sources will need to be re-created after they are moved to Unity Catalog and the Unity Catalog integration is configured.

    3. .

    Security and Compliance

    Immuta offers several features to provide security for your users and to prove compliance and monitor for anomalies.

    hashtag
    Security

    hashtag
    Data processing and encryption

    See the and the guides for information about transmission of policy decision data, encryption of data in transit and at rest, and encryption key management.

    hashtag
    Authentication

    hashtag
    Registering the connection

    The Databricks Lakebase connection supports OAuth machine-to-machine (M2M) authentication to register a connection.

    The Databricks Lakebase connection authenticates as a Databricks identity and generates an OAuth token. Immuta then uses that token as a password when connecting to PostgreSQL. To enable secure, automated machine-to machine access to the database instance, the connection must obtain an OAuth token using a Databricks service principal. See the for more details.

    hashtag
    Identity providers for user authentication

    The built-in Immuta IAM can be used as a complete solution for authentication and user entitlement. However, you can connect your existing identity management provider to Immuta to use that system for authentication and user entitlement instead.

    Each of the supported identity providers includes a specific set of configuration options that enable Immuta to communicate with the IAM system and map the users, permissions, groups, and attributes into Immuta.

    See the for a list of supported providers and details.

    See the for details about user user provisioning and mapping user accounts to Immuta.

    hashtag
    Auditing and compliance

    Immuta provides governance reports so that data owners and governors can monitor users' access to data and detect anomalies in behavior.

    Immuta governance reports allow users with the GOVERNANCE Immuta permission to use a natural language builder to instantly create reports that delineate user activity across Immuta. These reports can be based on various entity types, including users, groups, projects, data sources, purposes, policy types, or connection types.

    See the page for a list of report types and guidance.

    Accessing Data

    Once data is registered through the AWS Lake Formation connection, you will access your data in one of these AWS analytic engines as you normally would:

    • Amazon Athena

    • Amazon EMR Spark

    • Amazon Redshift Spectrum

    If you are subscribed to the data source, Immuta either directly grants you access to the resource through Lake Formation or generates and assigns a Lake Formation tag to that resource to grant you access. See the Protecting data page for details about how policies are enforced.

    When you submit a query, the analytic engine requests metadata from Glue Data Catalog, which then queries Lake Formation to determine what data you are allowed to see. Then, the analytic engine requests temporary access from Lake Formation, retrieves the data from S3, and filters the data to returns policy-enforced data to you.

    The diagram below illustrates how the analytic engine interacts with Glue Data Catalog and Lake Formation to access data.

    Accessing Data

    Once data is registered through the Databricks Lakebase connection, you will access your data as you normally would. If you are subscribed to the data source, Immuta grants you access to the data through a PostgreSQL role.

    When you submit a query (through PostgreSQL or through the Databricks Lakebase instance), the PostgreSQL client submits the SQL query to the PostgreSQL server, which then processes the query and determines what data your role is allowed to see. Then, the PostgreSQL server queries the database and returns the query results to the PostgreSQL client, which then returns policy-enforced data to you.

    The diagram below illustrates how Immuta, the PostgreSQL server, and PostgreSQL client interact to access data.

    Enable Snowflake Low Row Access Policy Mode

    circle-exclamation

    If you have Snowflake low row access policy mode enabled in private preview and have impersonation enabled, see these upgrade instructions. Otherwise, query performance will be negatively affected.

    1. Click the App Settings icon in the navigation menu and scroll to the Global Integration Settings section.

    2. Click the Enable Snowflake Low Row Access Policy Mode checkbox to enable the feature.

    3. Confirm to allow Immuta to automatically disable impersonation for the Snowflake integration. If you do not confirm, you will not be able to enable Snowflake low row access policy mode.

    4. Click Save.

    hashtag
    Configure your Snowflake integration

    If you already have a configured, you don't need to reconfigure your integration. Your Snowflake policies automatically refresh when you enable Snowflake low row access policy mode.

    1. . Note that you will not be able to enable project workspaces or user impersonation with Snowflake low row access policy mode enabled.

    2. Click Save and Confirm your changes.

    Accessing Data

    Once data is registered through the PostgreSQL connection, you will access your data through your PostgreSQL client as you normally would. If you are subscribed to the data source, Immuta grants you access to the data in PostgreSQL.

    When you submit a query, the PostgreSQL client submits the SQL query to the PostgreSQL server, which then processes the query and determines what data your role is allowed to see. Then, the PostgreSQL server queries the database and returns the query results to the PostgreSQL client, which then returns policy-enforced data to you.

    The diagram below illustrates how Immuta, the PostgreSQL server, and PostgreSQL client interact to access data.

    MySQL

    circle-info

    Immuta policies will not be automatically enforced in MySQL

    While you can author and apply subscription and data policies on MySQL data sources within Immuta, these policies will not be enforced natively in the MySQL platform. You can use Immuta webhooks to be notified about changes to user access and make appropriate access updates in MySQL using your own process.

    To use this integration, contact your Immuta representative.

    The MySQL integration uses connections to register data from MySQL in Immuta.

    hashtag
    How-to guide

    hashtag
    Reference guide

    : This guide describes the design and components of the integration.

    Amazon Redshift Spectrum Integration Overview

    This page provides an overview of the Amazon Redshift Spectrum integration in Immuta. For a tutorial detailing how to enable this integration, see the .

    hashtag
    How the integration works

    The Amazon Redshift Spectrum integration is a policy push integration that allows Immuta to apply policies directly on Immuta-created views in Redshift. This allows data analysts to query Redshift views directly instead of going through a proxy and have per-user policies dynamically applied at query time.

    The Amazon Redshift Spectrum integration creates views from the tables within the database specified when configured. Then, the user can choose the name for the schema where all the Immuta-generated views will reside. Immuta will also create the schemas immuta_system, immuta_functions

    Amazon Redshift Integration Reference Guide

    circle-info

    Public preview: This feature is available to select accounts. Contact your Immuta representative to enable this feature.

    The Amazon Redshift integration allows you to configure your integration and register data from Amazon Redshift in Immuta in a single step. Once data is registered, Immuta can enforce policies on that data.

    hashtag
    What does Immuta do in my environment?

    Manually Update Your Databricks Cluster

    If a Databricks cluster needs to be manually updated to reflect changes in the Immuta init script or cluster policies, you can remove and set up your integration again to get the updated policies and init script.

    1. Log in to Immuta as an Application Admin.

    2. Click the App Settings icon in the navigation menu and scroll to the Integration Settings section.

    3. Your existing Databricks Spark integration should be listed here; expand it and note the configuration values. Now select Remove to remove your integration.

    Oracle

    circle-info

    Immuta policies will not be automatically enforced in Oracle

    While you can author and apply subscription and data policies on Oracle data sources within Immuta, these policies will not be enforced natively in the Oracle platform. You can use to be notified about changes to user access and make appropriate access updates in Oracle using your own process.

    To use this integration, contact your Immuta representative.

    The Oracle integration allows you to register data from Oracle in Immuta.

    Project UDFs Cache Settings

    This page outlines the configuration for setting up project UDFs, which allow users to set their current project in Immuta through Spark. For details about the specific functions available and how to use them, see the .

    circle-info

    Use project UDFs in Databricks Spark

    Currently, caches are not all invalidated outside of Databricks because Immuta caches information pertaining to a user's current project. Consequently, this feature should only be used in Databricks.

    1. Lower the web service cache timeout in Immuta:

    MariaDB

    circle-info

    Immuta policies will not be automatically enforced in MariaDB

    While you can author and apply subscription and data policies on MariaDB data sources within Immuta, these policies will not be enforced natively in the MariaDB platform. You can use to be notified about changes to user access and make appropriate access updates in MariaDB using your own process.

    To use this integration, contact your Immuta representative.

    The MariaDB integration allows you to register data from MariaDB in Immuta.

    Databricks Lakebase

    circle-info

    Public preview: This feature is available to all accounts. Contact your Immuta representative for details.

    The Databricks Lakebase integration registers data from Databricks Lakebase in Immuta and enforces subscription policies on that data.

    hashtag

    Ephemeral Overrides

    In the context of the Databricks Spark integration, Immuta uses the term ephemeral to describe data sources where the associated compute resources can vary over time. This means that the compute bound to these data sources is not fixed and can change. All Databricks data sources in Immuta are ephemeral.

    Ephemeral overrides are specific to each data source and user. They effectively bind cluster compute resources to a data source for a given user. Immuta uses these overrides to determine which cluster compute to use when connecting to Databricks for various maintenance operations.

    The operations that use the ephemeral overrides include

    • Visibility checks on the data source for a particular user. These checks assess how to apply row-level policies for specific users.

    • Stats collection triggered by a specific user

    Enable Snowflake Table Grants

    1. Navigate to the App Settings page.

    2. Scroll to the Global Integrations Settings section.

    3. Ensure the Snowflake Table Grants checkbox is checked. It is enabled by default.

    PostgreSQL

    circle-info

    Public preview: This feature is available to all accounts. Contact your Immuta representative for details.

    In the PostgreSQL integration, Immuta registers data from PostgreSQL and enforces subscription policies on that data.

    hashtag

    This getting started guide outlines how to integrate PostgreSQL with Immuta.

    Upgrade Snowflake Low Row Access Policy Mode

    hashtag
    Prerequisites

    This upgrade step is necessary if you meet both of the following criteria:

    • You have the Snowflake low row access policy mode enabled in private preview.

    Snowflake Table Grants Private Preview Migration

    To migrate from the private preview version of table grants (available before September 2022) to the GA version, complete the steps below.

    1. Navigate to the App Settings page.

    2. Scroll to the Global Integrations Settings section.

    3. Uncheck the Snowflake Table Grants checkbox to disable the feature.

    Install a Trusted Library

    1. In the Databricks Clusters UI, install your third-party library .jar or Maven artifact with Library Source Upload, DBFS, DBFS/S3, or Maven. Alternatively, use the Databricks libraries API.

    2. In the Databricks Clusters UI, add the IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URIS property as a Spark environment variable and set it to your artifact's URI:

    .
  • Validating a custom WHERE clause policy against a data source. When owners or governors create custom WHERE clause policies, Immuta uses compute resources to validate the SQL in the policy. In this case, the ephemeral overrides for the user writing the policy are used to contact a cluster for SQL validation.

  • High cardinality column detection. Certain advanced policy types (e.g., minimization) in Immuta require a high cardinality column, and that column is computed on data source creation. It can be recomputed on demand and, if so, will use the ephemeral overrides for the user requesting computation.

  • hashtag
    Triggering an ephemeral override request

    An ephemeral override request can be triggered when a user queries the securable corresponding to a data source in a Databricks cluster with the Spark plug-in configured. The actual triggering of this request depends on the configuration settings.

    Ephemeral overrides can also be set for a data source in the Immuta UI by navigating to a data source page, clicking on the data source actions button, and selecting Ephemeral overrides from the dropdown menu.

    Ephemeral override requests made from a cluster for data sources and users where ephemeral overrides were set in the UI will not be successful.

    If ephemeral overrides are never set (either through the user interface or the cluster configuration), the system will continue to use the connection details directly associated with the data source, which are set during data source registration.

    hashtag
    Configuring overrides in Immuta-enabled clusters

    Ephemeral overrides can be problematic in environments that have a dedicated cluster to handle maintenance activities, since ephemeral overrides can cause these operations to execute on a different cluster than the dedicated one.

    To reduce the risk that a user has overrides set to a cluster (or multiple clusters) that aren't currently up, complete one of the following actions:

    • Direct all clusters' HTTP paths for overrides to a cluster dedicated for metadata queries using the IMMUTA_EPHEMERAL_HOST_OVERRIDE_HTTPPATH Spark environment variable.

    • Disable ephemeral overrides completely by setting the IMMTUA_EPHEMERAL_HOST_OVERRIDE Spark environment variable to false.

    circle-info

    Ephemeral overrides best practices

    1. Disable ephemeral overrides for clusters when using multiple workspaces and dedicate a single cluster to serve queries from Immuta in a single workspace.

    2. If you use multiple E2 workspaces without disabling ephemeral overrides, avoid applying the where user row-level policy to data sources.

    Opt to change the Role Prefix. Snowflake table grants creates a new Snowflake role for each Immuta user. To ensure these Snowflake role names do not collide with existing Snowflake roles, each Snowflake role created for Snowflake table grants requires a common prefix. When using multiple Immuta accounts within a single Snowflake account, the Snowflake table grants role prefix should be unique for each Immuta account. The prefix must adhere to Snowflake identifier requirementsarrow-up-right and be less than 50 characters. Once the configuration is saved, the prefix cannot be modified; however, the Snowflake table grants feature can be disabled and re-enabled to change the prefix.
  • Finish configuring your integration by following one of these guidelines:

    • New Snowflake integration: Set up a new Snowflake integration by following the configuration tutorial.

    • Existing Snowflake integration (automatic setup): You will be prompted to enter connection information for a Snowflake user. Immuta will execute the migration to Snowflake table grants using a connection established with this Snowflake user. The Snowflake user you provide here must have Snowflake privileges to run these privilege grants.

    • Existing Snowflake integration (manual setup): Immuta will display a link to a migration script you must run in Snowflake and a link to a rollback script for use in the event of a failed migration. Important: Execute the migration script in Snowflake before clicking Save on the app settings page.

  • circle-info

    Snowflake table grants private preview migration

    To migrate from the private preview version of Snowflake table grants (available before September 2022) to the generally available version of Snowflake table grants, follow the steps in the migration guide.

    You have user impersonation enabled.

    If you do not meet this criteria, follow the instructions on the configuration guide.

    hashtag
    Upgrade to Snowflake low row access policy mode

    To upgrade to the generally available version of the feature, disable your Snowflake integration on the app settings page and then re-enable it.

    Click Save. Wait for about 1 minute per 1000 users. This gives time for Immuta to drop all the previously created user roles.

  • Use the Enable Snowflake table grants tutorial to re-enable the feature.

  • Respond to an access request
    Azure Synapse Analytics configuration
    Azure Synapse Analytics integration reference guide
    Databricks Unity Catalog configuration
    Migrate to Databricks Unity Catalog
    Databricks Unity Catalog integration reference guide
    Enable Unity Catalog
    Data processing
    Encryption and masking practices
    Databricks OAuth machine-to-machine (M2M) authentication pagearrow-up-right
    Identity managers guide
    Databricks Lakebase integration reference guide
    Governance report types
    Snowflake integration
    Configure your Snowflake integration
    Register a MySQL connection
    MySQL integration reference guide
    hashtag
    How-to guide

    Register an Oracle connection

    hashtag
    Reference guide

    Oracle integration reference guide: This guide describes the design and components of the integration.

    Immuta webhooks
  • Click the App Settings icon and scroll to the HDFS Cache Settings section.

  • Lower the Cache TTL of HDFS user names (ms) to 0.

  • Click Save.

  • Raise the cache timeout on your Databricks cluster: In the Spark environment variables section, set the IMMUTA_CURRENT_PROJECT_CACHE_TIMEOUT_SECONDS and IMMUTA_PROJECT_CACHE_TIMEOUT_SECONDS to high values (like 10000).

    Note: These caches will be invalidated on cluster when a user calls immuta.set_current_project, so they can effectively be cached permanently on cluster to avoid periodically reaching out to the web service.

  • Use Project UDFs (Databricks) page
    hashtag
    How-to guide

    Register a MariaDB connection

    hashtag
    Reference guide

    MariaDB integration reference guide: This guide describes the design and components of the integration.

    Immuta webhooks
    This getting started guide outlines how to connect Databricks Lakebase to Immuta.

    hashtag
    How-to guide

    Register a Databricks Lakebase connection

    hashtag
    Reference guides

    • Databricks Lakebase integration: This guide describes the design and components of the integration.

    • Security and compliance: This guide provides an overview of the Immuta features that provide security for your users and that allow you to prove compliance and monitor for anomalies.

    • Protecting data: This guide provides an overview of how to protect securables with Immuta policies.

    • Accessing data: This guide provides an overview of how Databricks Lakebase users access data registered in Immuta.

    Getting started with Databricks Lakebase

    hashtag
    How-to guide

    Register a PostgreSQL connection

    hashtag
    Reference guides

    • PostgreSQL integration: This guide describes the design and components of the integration.

    • Security and compliance: This guide provides an overview of the Immuta features that provide security for your users and that allow you to prove compliance and monitor for anomalies.

    • Protecting data: This guide provides an overview of how to protect securables with Immuta policies.

    • Accessing data: This guide provides an overview of how PostgreSQL users access data registered in Immuta.

    Getting started with PostgreSQL
    , and
    immuta_procedures
    to contain the tables, views, UDFs, and stored procedures that support the integration. Immuta then creates a system role and gives that system account the following privileges:
    • ALL PRIVILEGES ON DATABASE IMMUTA_DB

    • ALL PRIVILEGES ON ALL SCHEMAS IN DATABASE IMMUTA_DB

    • USAGE ON FUTURE PROCEDURES IN SCHEMA IMMUTA_DB.IMMUTA_PROCEDURES

    • USAGE ON LANGUAGE PLPYTHONU

    Additionally the PUBLIC role will be granted the following privileges:

    • USAGE ON DATABASE IMMUTA_DB

    • TEMP ON DATABASE IMMUTA_DB

    • USAGE ON SCHEMA IMMUTA_DB.IMMUTA_PROCEDURES

    • USAGE ON SCHEMA IMMUTA_DB.IMMUTA_FUNCTIONS

    • USAGE ON FUTURE FUNCTIONS IN SCHEMA IMMUTA_DB.IMMUTA_FUNCTIONS

    • USAGE ON SCHEMA IMMUTA_DB.IMMUTA_SYSTEM

    • SELECT ON TABLES TO public

    Once the integration is configured, data owners must register Redshift Spectrum data sources using the Immuta CLI or V2 API.

    hashtag
    Data flow

    1. An Immuta application administrator, creates an immuta database in Amazon Redshift (that will contain Immuta policy definitions and user entitlements), configures the Redshift Spectrum integration, and registers Redshift warehouse and databases with Immuta.

    2. A data owner registers Redshift tables in Immuta as data sources.

    3. A data owner, data governor, or administrator creates or changes a policy or user in Immuta.

    4. Data source metadata, tags, user metadata, and policy definitions are stored in Immuta's Metadata Database.

    5. The Immuta Web Service calls a stored procedure that modifies the user entitlements or policies.

    6. A Redshift user who is subscribed to the data source in Immuta directly in Redshift through the immuta database and sees policy-enforced data.

    hashtag
    Policy enforcement

    SQL statements are used to create all views, including a join to the secure view: immuta_system.user_profile. This secure view is a select from the immuta_system.profile table (which contains all Immuta users and their current groups, attributes, projects, and a list of valid tables they have access to) with a constraint immuta__userid = current_user() to ensure it only contains the profile row for the current user. The immuta_system.user_profile view is readable by all users, but will only display the data that corresponds to the user executing the query.

    The Amazon Redshift Spectrum integration uses webhooks to keep views up-to-date with Immuta data sources. When a data source or policy is created, updated, or disabled, a webhook will be called that will create, modify, or delete the dynamic view. The immuta_system.profile table is updated through webhooks when a user's groups or attributes change, they switch projects, they acknowledge a purpose, or when their data source access is approved or revoked. The profile table can only be read and updated by the Immuta system account.

    hashtag
    Integration health status

    The status of the integration is visible on the integrations tab of the Immuta application settings page. If errors occur in the integration, a banner will appear in the Immuta UI with guidance for remediating the error.

    The definitions for each status and the state of configured data platform integrations is available in the response schema of the integrations API. However, the UI consolidates these error statuses and provides detail in the error messages.

    hashtag
    Supported cluster types

    All Redshift cluster types are supported for the Amazon Redshift Spectrum integration, and Immuta's views must exist in the same database as the raw tables. See the Configure an Amazon Redshift Spectrum guide for details about setting up this database for Immuta-managed resources.

    Immuta supports a single integration with secure views in a single database per cluster.

    hashtag
    Authentication method

    The Amazon Redshift Spectrum integration supports username and password authentication to configure the integration and create data sources.

    hashtag
    Tag ingestion

    Immuta cannot ingest tags from Amazon Redshift Spectrum, but you can connect any of these supported external catalogs to work with your integration.

    hashtag
    User impersonation

    circle-info

    Required Redshift privileges

    Setup user

    • OWNERSHIP ON GROUP IMMUTA_IMPERSONATOR_ROLE

    • CREATE GROUP

    Immuta system account

    • GRANT EXECUTE ON PROCEDURE grant_impersonation

    • GRANT EXECUTE ON PROCEDURE revoke_impersonation

    Impersonation allows users to query data as another Immuta user in Amazon Redshift. To enable user impersonation, see the User impersonation page.

    hashtag
    Multiple integrations

    Users can enable multiple Amazon Redshift Spectrum integrations with a single Immuta tenant.

    hashtag
    Limitations

    • The host of the data source must match the host of the integration for the view to be created.

    • When using multiple Amazon Redshift Spectrum integrations, a user has to have the same user account across all hosts.

    • Case sensitivity of database, table, and column identifiers is not supported. The enable_case_sensitive_identifier parameterarrow-up-right must be set to false (default setting) for your Redshift cluster to configure the integration and register data sources.

    installation guide
    hashtag
    Registering a connection

    The Amazon Redshift integration is configured and data is registered through connections, an Immuta feature that allows you to register your data objects through a single connection to make data registration more scalable for your organization. Instead of registering schema and databases individually, you can register them all at once and allow Immuta to monitor your data platform for changes so that data sources are added and removed automatically to reflect the state of data in your data platform.

    When the connection is registered, Immuta ingests and stores connection metadata in the Immuta metadata database. In the example below, the Immuta application administrator connects the database that contains marketing-data , research-data , and cs-data tables. Immuta registers these tables as data sources and stores the table metadata in the Immuta metadata database.

    Immuta presents a hierarchical view of your data that reflects the hierarchy of objects in Amazon Redshift after registration is complete:

    • Host

    • Database

    • Schema

    • Table or view

    Beyond making the registration of your data more intuitive, connections provides more control. Instead of performing operations on individual schemas or tables, you can perform operations (such as object sync) at the connection level.

    See the Connections reference guide for details about connections and how to manage them. To configure your Amazon Redshift integration and register data, see the Register an Amazon Redshift connection guide.

    hashtag
    Applying policies

    hashtag
    Subscription policies

    Immuta enforces read and write subscription policies on Amazon Redshift tables by issuing SQL statements in Amazon Redshift that grant and revoke access to tables according to the policy.

    When a user is subscribed to a data object registered in Immuta,

    1. Immuta creates a role for that user in Amazon Redshift, if one doesn't already exist.

    2. Amazon Redshift stores that role in its internal system catalog.

    3. Immuta issues grants to that user's role in Amazon Redshift to enforce policy. The Protecting data page provides an example of this policy enforcement.

    4. The users will query data in Amazon Redshift using the immuta_<username> role, which allows them to use the privileges granted to that role by Immuta.

    hashtag
    Data policies

    You can author data policies in Immuta to enforce fine-grained access controls on Amazon Redshift data objects registered as Immuta data sources.

    Once a data policy is applied to an Amazon Redshift data source in Immuta,

    1. Immuta generates a masking or row-level policy in Amazon Redshift and attaches the policy to the data object it applies to.

    2. When users query that data source in Amazon Redshift, the policy will dynamically apply to that data object so that users see policy-enforced data.

    See the supported policies section for a list of data policies supported for this integration.

    hashtag
    Amazon Redshift privileges granted by Immuta

    See the Subscription policy access types page for details about the Amazon Redshift privileges granted to users when they are subscribed to a data source protected by a subscription policy.

    hashtag
    Required Amazon Redshift privileges

    The privileges that the Amazon Redshift integration requires align to the least privilege security principle. The table below describes each privilege required by the setup user and the IMMUTA_SYSTEM_ACCOUNT user.

    Amazon Redshift privilege
    User requiring the privilege
    Explanation

    Database superuser or the following privileges:

    • CREATEDB

    • CREATE USER

    • sys:secadmin role

    Setup user

    These privileges allow the user registering the connection to

    • assign the required roles and privileges to the Immuta system account so that it can register the connection and manage the integration.

    • create an Immuta database that Immuta will use to connect to the Amazon Redshift instance and maintain state with the registered databases.

    • create a .

    USAGE on all the databases and schemas that will be registered

    Immuta system account

    This privilege allows Immuta to crawl the database and discover database objects so it can register the Amazon Redshift data objects.

    CREATE ROLE

    Immuta system account

    This privilege is required so that Immuta can create Redshift roles to enforce access controls.

    Database superuser or have the sys:secadmin role

    Immuta system account

    hashtag
    Maintaining state with Amazon Redshift

    The following user actions initiate processes that keep Immuta data synchronous with data in Amazon Redshift:

    • Data source created or updated: Immuta registers data source metadata and stores that metadata in the Immuta metadata database.

    • Data source deleted: Immuta deletes the data source metadata from the metadata database.

    • User account is mapped to Immuta: When a user account is mapped to Immuta, their metadata is stored in the metadata database.

    • User subscribed to a data source: When a user is added to a data source by a data owner or through a subscription policy, Immuta creates a role for that user (if a role for them does not already exist) and to that role.

    • Automatic subscription policy applied to or updated on a data source: Immuta calculates the users and data sources affected by the policy change and grants or revokes users' privileges on the data. See the for details about this process.

    • Subscription policy deleted: Immuta revokes privileges from the affected roles.

    • Data policy created or updated: Immuta calculates the users and data sources affected by the data policy change and attaches the policy to the data object in Amazon Redshift.

    • Data policy deleted: Immuta removes the policy from the data object in Amazon Redshift.

    • User removed from a data source: Immuta revokes privileges from the user's role.

    hashtag
    Supported object types

    Object type
    Subscription policy support
    Data policy support
    Marketplace support

    Tables

    ✅

    ✅

    ✅

    Views

    ✅

    ✅

    ✅

    Datashares

    ✅

    circle-exclamation

    Datashares privilege requirement

    To allow Immuta to enforce access controls on datashares, you must include the WITH PERMISSIONS clause when creating the database from the datashare. You cannot add the WITH PERMISSIONS Amazon Redshift privilege after the database has been created. See the Amazon Redshift documentation for detailsarrow-up-right.

    hashtag
    Supported policies

    The Amazon Redshift integration allows users to author subscription policies and data policies to enforce access controls.

    The following data policies are supported:

    • Masking with a constant

    • Masking with NULL

    • Only show rows (matching)

    See the applying policies section for details about policy enforcement.

    hashtag
    Policy exemption role

    The Amazon Redshift role configured as the policy exemption role in Immuta will be exempt from Immuta data policy enforcement. This role is created and managed in Amazon Redshift, not in Immuta.

    If you have service or system accounts that need to be exempt from masking and row-level policy enforcement, add them to this role in Amazon Redshift. Then, role members will be exempt from having data policies applied to them when they query Immuta-protected tables in Amazon Redshift.

    Typically, service or system accounts that perform the following actions are added to an exemption role in Amazon Redshift:

    • Automated queries

    • ETL

    • Report generation

    The system account used to register data sources in Immuta will be automatically added to the exemption role for the Amazon Redshift securables it registers.

    hashtag
    Security and compliance

    hashtag
    Authentication method

    The Amazon Redshift integration supports username and password authentication to register a connection. The credentials provided must be for an account with the permissions listed in the Register an Amazon Redshift connection guide.

    hashtag
    User registration and ID mapping

    The built-in Immuta IAM can be used as a complete solution for authentication and user entitlement. However, you can connect your existing identity management provider to Immuta to use that system for authentication and user entitlement instead. Each of the supported IAM protocols includes a set of configuration options that enable Immuta to communicate with the IAM system and map the users, permissions, groups, and attributes into Immuta.

    For policies to impact the right users, the user account in Immuta must be mapped to the user account in Amazon Redshift. You can ensure these accounts are mapped correctly in the following ways:

    • Automatically: If usernames in Amazon Redshift align with usernames in the external IAM and those accounts align with an IAM attribute, you can enter that IAM attribute on the app settings page to automatically map user IDs in Immuta to Amazon Redshift.

    • Manually: You can manually map user IDs for individual users.

    For guidance on connecting your IAM to Immuta, see the how-to guide for your protocol.

    hashtag
    Limitations and known issues

    The following Immuta features are unsupported:

    • Amazon Redshift Spectrum: See the AWS Lake Formation reference guide for details about registering Amazon Redshift Spectrum data sources in Immuta. However, if you are using data policies on your Redshift Spectrum data sources, you cannot use the AWS Lake Formation integration. Instead, use the Amazon Redshift Spectrum integration.

    • Several data policy types are unsupported. See the Supported policies section for a list of supported data policies.

    • Impersonation

    • Query audit

    • Tag ingestion

  • Click Add Integration and select Databricks Integration to add a new integration.

  • Enter your Databricks Spark integration settings again as configured previously.

  • Click Add Integration to add the integration, and then select Configure Cluster Policies to set up the updated cluster policies and init script.

  • Select the cluster policies you wish to use for your Immuta-enabled Databricks clusters.

  • Automatically push cluster policies and the init script (recommended) or manually update your cluster policies.

    • Automatically push cluster policies

      1. Select Automatically Push Cluster Policies and enter your privileged Databricks access token. This token must have privileges to write to cluster policies.

      2. Select Apply Policies to push the cluster policies and init script again.

      3. Click Save and Confirm to deploy your changes.

    • Manually update cluster policies

      1. Download the init script and the new cluster policies to your local computer.

      2. Click Save and Confirm to save your changes in Immuta.

      3. Log in to your Databricks workspace with your administrator account to set up cluster policies.

  • Restart any Databricks clusters using these updated policies for the changes to take effect.

  • For Maven artifacts, the URI is maven:/<maven_coordinates>, where <maven_coordinates> is the Coordinates field found when clicking on the installed artifact on the Libraries tab in the Databricks Clusters UI. Here's an example of an installed artifact:

    In this example, you would add the following Spark environment variable:

    IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URIS=maven:/com.github.immuta.hadoop.immuta-spark-third-party-maven-lib-test:2020-11-17-144644

    For jar artifacts, the URI is the Source field found when clicking on the installed artifact on the Libraries tab in the Databricks Clusters UI. For artifacts installed from DBFS or S3, this ends up being the original URI to your artifact. For uploaded artifacts, Databricks will rename your .jar and put it in a directory in DBFS. Here's an example of an installed artifact:

    In this example, you would add the following Spark environment variable:

    1. Once you've finished making your changes, restart the cluster.

    2. Once the cluster is up, execute a command in a notebook. If the trusted library installation is successful, you should see driver log messages like this:

    Getting Started with Azure Synapse Analytics

    The how-to guides linked on this page illustrate how to integrate Azure Synapse Analytics with Immuta. See the reference guide for information about the Azure Synapse Analytics integration.

    Requirement: A running Dedicated SQL pool

    1

    Connect your technology

    These guides provide instructions on getting your data set up in Immuta for the Marketplace and Governance apps.

    1. Configure your Azure Synapse Analytics integration: Configure an Azure Synapse Analytics integration with Immuta so that Immuta can create policy protected views for your users to query.

    2. : This will register your data objects into Immuta and allow you to start dictating access through Marketplace or global policies.

    3. : Use domains to segment your data and assign responsibilities to the appropriate team members. These domains will then be used in Marketplace and policies.

    2

    Register your users

    These guides provide instructions on getting your users set up in Immuta for the Marketplace and Governance apps.

    1. : Bring the IAM your organization already uses and allow Immuta to register your users for you.

    2. : Ensure the user IDs in Immuta, Azure Synapse Analytics, and your IAM are aligned so that the right policies impact the right users.

    3

    Start using Marketplace

    These guides provide instructions on using Marketplace for the first time.

    1. : Once you register your tables and users, you can immediately start publishing data products in Marketplace.

    2. : Users must then request access to your data products in Marketplace.

    4

    Add data metadata

    These guides provide instructions on getting your data metadata set up in Immuta for the Governance app.

    1. : Bring the external catalog your organization already uses and allow Immuta to continually sync your tags with your data sources for you.

    5

    Start using the Governance app

    These guides provide instructions on using the Governance app for the first time.

    1. : Once you add your data metadata to Immuta, you can immediately create policies that utilize your tags and apply to your tables. Subscription policies can be created to dictate access to data sources.

    2. : Data metadata can also be used to create data policies that apply to data sources as they are registered in Immuta. Data policies dictate what data a user can see once they are granted access to a data source. Using catalog tags you can create proactive policies, knowing that they will apply to data sources as they are added to Immuta with the automated tagging.

    Databricks

    Immuta offers three integrations for Databricks:

    • Databricks Unity Catalog integration: This integration supports working with database objects registered in Unity Catalog.

    • Databricks Lakebase integration: This connection supports working with Lakebase Postgres database objects within Databricks Lakebase.arrow-up-right

    • Databricks Spark integration: This integration supports working with database objects registered in the legacy Hive metastorearrow-up-right.

    hashtag
    Which integration should you use?

    To determine which integration you should use, consider which metastore you use:

    • Legacy Hive metastore: Databricks recommends that you migrate all data from the legacy Hive metastore to Unity Catalog. However, when this migration is not possible, use the to protect securables registered in the Hive metastore.

    • Unity Catalog: To protect securables registered in the Unity Catalog metastore, use the .

      • : To register and protect fully managed PostgreSQL-compatible data objects, use the Databricks Lakebase integration.

    hashtag
    Metastore magic

    Databricks metastore magic allows you to migrate your data from the Databricks legacy Hive metastore to the Unity Catalog metastore while protecting data and maintaining your current processes in a single Immuta instance.

    Databricks metastore magic is for organizations who intend to use the , but must still protect tables in the Hive metastore until they can migrate all of their data to Unity Catalog.

    hashtag
    Requirement

    Unity Catalog support is enabled in Immuta.

    hashtag
    Databricks metastores and Immuta policy enforcement

    Databricks has two built-in metastores that contain metadata about your tables, views, and storage credentials:

    • Legacy Hive metastore: Created at the workspace level. This metastore contains metadata of the registered securables in that workspace available to query.

    • Unity Catalog metastore: Created at the account level and is attached to one or more Databricks workspaces. This metastore contains metadata of the registered securables available to query. All clusters on that workspace use the configured metastore and all workspaces that are configured to use a single metastore share those securables.

    Databricks allows you to use the legacy Hive metastore and the Unity Catalog metastore simultaneously. However, Unity Catalog does not support controls on the Hive metastore, so you must attach a Unity Catalog metastore to your workspace and move existing databases and tables to the attached Unity Catalog metastore to use the governance capabilities of Unity Catalog.

    Immuta's Databricks Spark integration and Unity Catalog integration enforce access controls on the Hive and Unity Catalog metastores, respectively. However, because these metastores have two distinct security models, users were discouraged from using both in a single Immuta instance before metastore magic; the Databricks Spark integration and Unity Catalog integration were unaware of each other, so using both concurrently caused undefined behavior.

    hashtag
    Databricks metastore magic solution

    Metastore magic reconciles the distinct security models of the legacy Hive metastore and the Unity Catalog metastore, allowing you to use multiple metastores (specifically, the Hive metastore or alongside Unity Catalog metastores) within a Databricks workspace and single Immuta instance and keep policies enforced on all your tables as you migrate them. The diagram below shows Immuta enforcing policies on registered tables across workspaces.

    In clusters A and D, Immuta enforces policies on data sources in each workspace's Hive metastore and in the Unity Catalog metastore shared by those workspaces. In clusters B, C, and E (which don't have Unity Catalog enabled in Databricks), Immuta enforces policies on data sources in the Hive metastores for each workspace.

    hashtag
    Enforce policies as you migrate

    With metastore magic, the Databricks Spark integration enforces policies only on data in the Hive metastore, while the Unity Catalog integration enforces policies on tables in the Unity Catalog metastore. The table below illustrates this policy enforcement.

    To enforce plugin-based policies on Hive metastore tables and Unity Catalog native controls on Unity Catalog metastore tables, enable the Databricks Spark integration and the Databricks Unity Catalog integration. Note that some Immuta policies are not supported in the Databricks Unity Catalog integration. See the for details.

    hashtag
    Enforcing policies on Databricks SQL

    Databricks SQL cannot run the Databricks Spark plugin to protect tables, so Hive metastore data sources will not be policy enforced in Databricks SQL.

    To enforce policies on data sources in Databricks SQL, use to manually lock down Hive metastore data sources and the Databricks Unity Catalog integration to protect tables in the Unity Catalog metastore. Table access control is enabled by default on SQL warehouses, and any Databricks cluster without the Immuta plugin must have table access control enabled.

    Getting Started with Databricks Spark

    The how-to guides linked on this page illustrate how to integrate Databricks Spark with Immuta.

    Requirements

    • If Databricks Unity Catalog is enabled in a Databricks workspace, you must use an Immuta cluster policy when you set up the Databricks Spark integration to create an Immuta-enabled cluster.

    • If Databricks Unity Catalog is not enabled in your Databricks workspace, you must disable Unity Catalog in your Immuta tenant before proceeding with your configuration of Databricks Spark:

      1. Navigate to the App Settings page and click Integration Settings.

      2. Uncheck the Enable Unity Catalog checkbox.

      3. Click Save.

    1

    Connect your technology

    These guides provide instructions for getting your data set up in Immuta.

    1. .

    2. .

    Databricks Spark

    This integration enforces policies on Databricks securables registered in the legacy Hive metastore. Once these securables are registered as Immuta data sources, users can query policy-enforced data on Databricks clusters.

    The guides in this section outline how to integrate Databricks Spark with Immuta.

    hashtag
    Getting started

    This getting started guide outlines how to integrate Databricks with Immuta.

    hashtag
    How-to guides

    • : Configure the Databricks Spark integration.

    • : Manually update your cluster to reflect changes in the Immuta init script or cluster policies.

    • : Register a Databricks library with Immuta as a trusted library to avoid Immuta security manager errors when using third-party libraries.

    • : Raise the caching on-cluster and lower the cache timeouts for the Immuta web service to allow use of project UDFs in Spark jobs.

    hashtag
    Reference guides

    • : This guide describes the design and components of the integration.

    • : This guide provides an overview of the Immuta features that provide security for your users and Databricks clusters and that allow you to prove compliance and monitor for anomalies.

    • : This guide provides an overview of registering Databricks securables and protecting them with Immuta policies.

    • : This guide provides an overview of how Databricks users access data registered in Immuta.

    Security and Compliance

    Immuta offers several features to provide security for your users and to prove compliance and monitor for anomalies.

    hashtag
    Security

    hashtag
    Data processing and encryption

    See the and the guides for information about transmission of policy decision data, encryption of data in transit and at rest, and encryption key management.

    hashtag
    Authentication

    hashtag
    Registering the connection

    The PostgreSQL integration supports the following authentication methods to register a connection:

    • Amazon Aurora and Amazon RDS deployments

      • Access using AWS IAM role (recommended): Immuta will assume this IAM role from Immuta's AWS account when interacting with the AWS API to perform any operations in your AWS account. This option allows you to provide Immuta with an IAM role from your AWS account that is granted a trust relationship with Immuta's IAM role.

      • Access using access key and secret access key: These credentials are used temporarily by Immuta to register the connection. The access key ID and secret access key provided must be for an AWS account with the permissions listed in the .

    hashtag
    Identity providers for user authentication

    The built-in Immuta IAM can be used as a complete solution for authentication and user entitlement. However, you can connect your existing identity management provider to Immuta to use that system for authentication and user entitlement instead.

    Each of the supported identity providers includes a specific set of configuration options that enable Immuta to communicate with the IAM system and map the users, permissions, groups, and attributes into Immuta.

    See the for a list of supported providers and details.

    See the for details about user provisioning and mapping user accounts to Immuta.

    hashtag
    Auditing and compliance

    Immuta provides governance reports so that data owners and governors can monitor users' access to data and detect anomalies in behavior.

    Immuta governance reports allow users with the GOVERNANCE Immuta permission to use a natural language builder to instantly create reports that delineate user activity across Immuta. These reports can be based on various entity types, including users, groups, projects, data sources, purposes, policy types, or connection types.

    See the page for a list of report types and guidance.

    Protecting Data

    In the PostgreSQL integration, Immuta administers PostgreSQL privileges on data registered in Immuta. Then, Immuta users who have been granted access to the tables can query them with policies enforced.

    The sequence diagram below outlines the events that occur when an Immuta user who is subscribed to a data source queries it in PostgreSQL.

    hashtag
    Registering a connection

    PostgreSQL is configured and data is registered through connections, an Immuta feature that allows administrators to register data objects in a technology through a single connection to make data registration more scalable for your organization.

    Once the PostgreSQL connection is registered, you can author subscription policies in Immuta to enforce access controls.

    See the for more details about registering a connection.

    hashtag
    Protecting data

    After tables are registered in Immuta, you can author subscription policies in Immuta to enforce access controls.

    When a policy is applied to a data source, users who meet the conditions of the policy will be automatically subscribed to the data source. Then, Immuta issues a SQL statement in PostgreSQL that grants the SELECT privilege to users on those tables.

    Consider the following example that illustrates how Immuta enforces a subscription policy that only allows users in the analysts group to access the yellow-table. When this policy is authored and applied to the data source, Immuta issues a SQL statement in PostgreSQL that grants the SELECT privilege on yellow-table to users (registered in Immuta) that are part of the analysts group.

    In the image above, the user in the analysts group accesses yellow-table , while the user who is a part of the research group is denied access. See the for guidance on applying a subscription policy to a data source. See the page for details about the subscription policy types supported and PostgreSQL privileges Immuta grants on tables registered as Immuta data sources.

    Delta Lake API

    When using Delta Lake, the API does not go through the normal Spark execution path. This means that Immuta's Spark extensions do not provide protection for the API. To solve this issue and ensure that Immuta has control over what a user can access, the Delta Lake API is blocked.

    Spark SQL can be used instead to give the same functionality with all of Immuta's data protections.

    hashtag
    Requests

    Below is a table of the Delta Lake API with the Spark SQL that may be used instead.

    Delta Lake API
    Spark SQL

    See here for a complete list of the .

    hashtag
    Merging tables in workspaces

    When a table is created in a project workspace, you can merge a different Immuta data source from that workspace into that table you created.

    1. .

    2. Create a temporary view of the Immuta data source you want to merge into that table.

    3. Use that temporary view as the data source you add to the project workspace.

    4. Run the following command:

    Protecting Data

    In the Databricks Lakebase integration, Immuta administers PostgreSQL privileges on data registered in Immuta. Then, Immuta users who have been granted access to the tables can query them with policies enforced.

    The sequence diagram below outlines the events that occur when an Immuta user who is subscribed to a data source queries it in PostgreSQL.

    hashtag
    Registering a connection

    Databricks Lakebase is configured and data is registered through connections, an Immuta feature that allows administrators to register data objects in a technology through a single connection to make data registration more scalable for your organization.

    Once the Databricks Lakebase connection is registered, you can author subscription policies in Immuta to enforce access controls.

    See the for more details about registering a connection.

    hashtag
    Protecting data

    After tables are registered in Immuta, you can author subscription policies in Immuta to enforce access controls.

    When a policy is applied to a data source, users who meet the conditions of the policy will be automatically subscribed to the data source. Then, Immuta issues a SQL statement in PostgreSQL that grants the SELECT privilege to users on those tables.

    Consider the following example that illustrates how Immuta enforces a subscription policy that only allows users in the analysts group to access to yellow-table. When this policy is authored and applied to the data source, Immuta issues a SQL statement in PostgreSQL that grants the SELECT privilege on yellow-table to users (registered in Immuta) that are part of the analysts group.

    In the image above, the user in the analysts group accesses yellow-table , while the user who is a part of the research group is denied access. See the for guidance on applying a subscription policy to a data source. See the page details about the subscription policy types supported and PostgreSQL privileges Immuta grants on tables registered as Immuta data sources.

    Getting Started with PostgreSQL

    circle-info

    Public preview: This feature is available to all accounts. Contact your Immuta representative for details.

    The how-to guides linked on this page illustrate how to use PostgreSQL with Immuta. See the reference guide for information about the PostgreSQL integration.

    1

    Connect your technology

    These guides provide instructions on getting your data set up in Immuta for the Marketplace and Governance apps.

    1. : Using a single setup process, connect PostgreSQL to Immuta. This will register your data objects in Immuta and allow you to start dictating access through Marketplace or global policies.

    2. : Use domains to segment your data and assign responsibilities to the appropriate team members. These domains will then be used in Marketplace, policies, audit, and identification.

    2

    Register your users

    These guides provide instructions on getting your users set up in Immuta for the Marketplace and Governance apps.

    1. : Bring the IAM your organization already uses and allow Immuta to register your users for you.

    2. : Ensure the user IDs in Immuta and your data platform are aligned so that the right policies impact the right users. This step can be completed during initial configuration of your IAM or after it has been connected to Immuta.

    3

    Start using Marketplace

    These guides provide instructions on using Marketplace for the first time.

    1. : Once you register your tables and users, you can immediately start publishing data products in Marketplace.

    2. : Users must then request access to your data products in Marketplace.

    4

    Add data metadata

    These guides provide instructions on getting your data metadata set up in Immuta for the Governance app.

    1. : Bring the external catalog your organization already uses and allow Immuta to continually sync your tags with your data sources for you.

    2. : Identification allows you to automate data tagging using identifiers that detect certain data patterns.

    5

    Start using the Governance app

    These guides provide instructions on using the Governance app for the first time.

    1. : Once you add your data metadata to Immuta, you can immediately create policies that utilize your tags and apply to your tables. Subscription policies can be created to dictate access to data sources.

    2. : Once you have your data sources and users, and policies granting them access, you can set up audit export. This will export the audit logs from policy changes and tagging updates.

    Configure Snowflake Lineage Tag Propagation

    circle-info

    Private preview: This feature is available to select accounts. Contact your Immuta representative to enable this feature.

    Contact your Immuta representative to enable this feature in your Immuta tenant.

    hashtag
    Configure the Snowflake integration

    1. Navigate to the App Setting page and click the Integration tab.

    2. Click +Add Integration and select Snowflake from the dropdown menu.

    3. Complete the Host, Port, and Default Warehouse fields.

    4. Enable Query Audit.

    5. Enable Lineage and complete the following fields:

      • Ingest Batch Sizes: This setting configures the number of rows Immuta ingests per batch when streaming Access History data from your Snowflake instance.

      • Table Filter: This filter determines which tables Immuta will ingest lineage for. Enter a regular expression that excludes / from the beginning and end to filter tables. Without this filter, Immuta will attempt to ingest lineage for every table on your Snowflake instance.

    6. Select Manual or Automatic Setup and

    hashtag
    Trigger Snowflake lineage sync job

    hashtag
    Prerequisite

    .

    hashtag
    Trigger the lineage job

    The Snowflake lineage sync endpoint triggers the lineage ingestion job that allows Immuta to propagate Snowflake tags added through lineage to Immuta data sources.

    1. Copy the example and replace the Immuta URL and API key with your own.

    2. Change the payload attribute values to your own, where

      • tableFilter (string): This regular expression determines which tables Immuta will ingest lineage for. Enter a regular expression that excludes / from the beginning and end to filter tables. Without this filter, Immuta will attempt to ingest lineage for every table on your Snowflake instance.

    hashtag
    Next steps

    Once the sync job is complete, you can complete the following steps:

    Databricks Spark Integration Configuration

    The Databricks Spark integration is one of two integrations Immuta offers for Databricks.

    In this integration, Immuta installs an Immuta-maintained Spark plugin on your Databricks cluster. When a user queries data that has been registered in Immuta as a data source, the plugin injects policy logic into the plan Spark builds so that the results returned to the user only include data that specific user should see.

    The reference guides in this section are written for Databricks administrators who are responsible for setting up the integration, securing Databricks clusters, and setting up users:

    • Installation and compliance: This guide includes information about what Immuta creates in your Databricks environment and securing your Databricks clusters.

    • : Consult this guide for information about customizing the Databricks Spark integration settings.

    • : Consult this guide for information about connecting data users and setting up user impersonation.

    • : This guide provides a list of Spark environment variables used to configure the integration.

    • : This guide describes ephemeral overrides and how to configure them to reduce the risk that a user has overrides set to a cluster (or multiple clusters) that aren't currently up.

    Configure an Amazon Redshift Spectrum Integration

    This page illustrates how to configure the on the Immuta app settings page. To configure this integration via the Immuta API, see the .

    hashtag
    Requirements

    • A Redshift cluster with an AWS row-level security patch applied. for guidance.

    Azure Synapse Analytics Pre-Configuration Details

    This page describes the Azure Synapse integration, configuration options, and features. See the for a tutorial on enabling the integration and these features through the app settings page.

    hashtag
    Feature availability

    Getting Started with Databricks Unity Catalog

    The how-to guides linked on this page illustrate how to integrate Databricks Unity Catalog with Immuta. See the for information about the Databricks Unity Catalog integration.

    Requirements:

    • Unity Catalog and attached to a Databricks workspace. Immuta supports configuring a single metastore for each configured integration, and that metastore may be attached to multiple Databricks workspaces.

    • Unity Catalog enabled on your Databricks cluster or SQL warehouse. All SQL warehouses have Unity Catalog enabled if your workspace is attached to a Unity Catalog metastore.

    1

    DBFS Access

    This page outlines how to enable access to DBFS in Databricks for non-sensitive data. Databricks administrators should place the desired configuration in the Spark environment variables.

    hashtag
    DBFS FUSE mount

    This Databricks feature mounts DBFS to the local cluster filesystem at /dbfs. Although disabled when using process isolation, this feature can safely be enabled if raw, unfiltered data is not stored in DBFS and all users on the cluster are authorized to see each other’s files. When enabled, the entirety of DBFS essentially becomes a scratch path where users can read and write files in /dfbs/path/to/my/file as though they were local files.

    Setting Up Users

    When the Databricks Spark plugin is running on a Databricks cluster, all Databricks users running jobs or queries are either a privileged user or a non-privileged user:

    • Privileged users: Privileged users can effectively read from and write to any table or view in the cluster Metastore, or any file path accessible by the cluster, without restriction. Privileged users are either or users specified in . Any user writing queries or jobs impersonating another user is a non-privileged user, even if they are impersonating a privileged user.\

      Privileged users have effective authority to read from and write to any securable in the cluster metastore or file path, because in almost all cases Databricks clusters running with the Immuta Spark plug-in installed have disabled . However, if Hive metastore table access control is enabled on the cluster, privileged users will have the authority granted to them that is specified by table access control.

    Getting Started with Snowflake

    The how-to guides linked on this page illustrate how to integrate Snowflake with Immuta. See the for information about the Snowflake integration.

    Requirements

    • Snowflake enterprise edition

    • Access to a Snowflake account that can create a Snowflake user

    1

    Accessing Data

    Once a Databricks securable is registered in Immuta as a data source and you are subscribed to that data source, you must access that data through SQL:

    With R, you must load the SparkR library in a cell before accessing the data.

    See the sections below for more guidance on accessing data using , , and .

    hashtag

    Troubleshooting

    This page provides guidelines for troubleshooting issues with the Databricks Spark integration and resolving Py4J security and Databricks trusted library errors.

    hashtag
    Debugging the integration

    For easier debugging of the Databricks Spark integration, follow the recommendations below.

    • Enable cluster init script logging:

    Edit or Remove Your Snowflake Integration

    circle-exclamation

    Deprecation notice

    Support for editing or deleting the Snowflake integration using this legacy workflow has been deprecated. Instead, manage or .

    To or a Snowflake integration, you have two options:

    Using Snowflake Data Sharing with Immuta

    Immuta is compatible with . Using both Immuta and Snowflake, organizations can share the policy-protected data of their Snowflake database with other Snowflake accounts with Immuta policies enforced in real time.

    Prerequisites:

    IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URIS=dbfs:/immuta/bstabile/jars/immuta-spark-third-party-lib-test.jar
    TrustedLibraryUtils: Successfully found all configured Immuta configured trusted libraries in Databricks.
    TrustedLibraryUtils: Wrote trusted libs file to [/databricks/immuta/immutaTrustedLibs.json]: true.
    TrustedLibraryUtils: Added trusted libs file with 1 entries to spark context.
    TrustedLibraryUtils: Trusted library installation complete.

    Get the path you will upload the init script (immuta_cluster_init_script_proxy.sh) to by opening one of the cluster policy .json files and looking for the defaultValue of the field init_scripts.0.dbfs.destination. This should be a DBFS path in the form of dbfs:/immuta-plugin/hostname/immuta_cluster_init_script_proxy.sh.

  • Click Data in the left pane to upload your init script to DBFS to the path you found above.

  • To find your existing cluster policies you need to update, click Compute in the left pane and select the Cluster policies tab.

  • Edit each of these cluster policies that were configured before and overwrite the contents of the JSON with the new cluster policy JSON you downloaded.

  • In the cluster page in Databricks for the target cluster, navigate to Advanced Options -> Logging.

  • Change the Destination from NONE to DBFS and change the path to the desired output location. Note: The unique cluster ID will be added onto the end of the provided path.

  • View the Spark UI on your target Databricks cluster: On the cluster page, click the Spark UI tab, which shows the Spark application UI for the cluster. If you encounter issues creating Databricks data sources in Immuta, you can also view the JDBC/ODBC Server portion of the Spark UI to see the result of queries that have been sent from Immuta to Databricks.

  • hashtag
    Using the validation and debugging notebook

    The validation and debugging notebook is designed to be used by or under the guidance of an Immuta support professional. Reach out to your Immuta representative for assistance.

    1. Import the notebook into a Databricks workspace by navigating to Home in your Databricks instance.

    2. Click the arrow next to your name and select Import.

    3. Once you have executed commands in the notebook and populated it with debugging information, export the notebook and its contents by opening the File menu, selecting Export, and then selecting DBC Archive.

    hashtag
    Py4J security error

    • Error Message: py4j.security.Py4JSecurityException: Constructor <> is not allowlisted

    • Explanation: This error indicates you are being blocked by Py4J security rather than the Immuta Security Manager. Py4J security is strict and generally ends up blocking many ML libraries.

    • Solution: Turn off Py4J security on the offending cluster by setting IMMUTA_SPARK_DATABRICKS_PY4J_STRICT_ENABLED=false in the environment variables section. Additionally, because there are limitations to the security mechanisms Immuta employs on-cluster when Py4J security is disabled, ensure that all users on the cluster have the same level of access to data, as users could theoretically see (policy-enforced) data that other users have queried.

    hashtag
    Databricks trusted library errors

    Check the driver logs for details. Some possible causes of failure include

    • One of the Immuta-configured trusted library URIs does not point to a Databricks library. Check that you have configured the correct URI for the Databricks library.

    • For trusted Maven artifacts, the URI must follow this format: maven:/group.id:artifact-id:version.

    • Databricks failed to install a library. Any Databricks library installation errors will appear in the Databricks UI under the Libraries tab.

    Respond to an access request: To grant access to a data product and its tables, respond to the access request.

    Configure audit: Once you have your data sources and users, and policies granting them access, you can set up audit export. This will export the audit logs from policy changes and tagging updates.

    Register Azure Synapse Analytics data sources
    Organize your data sources into domains and assign domain permissions to accountable teams
    Connect an IAM
    Map external user IDs from Azure Synapse Analytics to Immuta
    Publish a data product
    Request access to a data product
    Connect an external catalog
    Author a global subscription policy
    Author a global data policy

    Run R and Scala spark-submit jobs on Databricks: Run R and Scala spark-submit jobs on your Databricks cluster.

  • DBFS access: Access DBFS in Databricks for non-sensitive data.

  • Troubleshooting: Resolve errors in the Databricks Spark configuration.

  • Configure a Databricks Spark integration
    Manually update your Databricks cluster
    Install a trusted library
    Project UDFs cache settings
    Databricks Spark integration configuration
    Security and compliance
    Registering and protecting data
    Accessing data

    Neon and PostgreSQL deployments

    • Username and password: These credentials are used temporarily by Immuta to register the connection. The credentials provided must be for an account with the permissions listed in the Register a PostgreSQL connection guide.

    Data processing
    Encryption and masking practices
    Register a PostgreSQL connection guide
    Identity managers guide
    PostgreSQL integration reference guide
    Governance report types
    Respond to an access request: To grant access to a data product and its tables, respond to the access request.
    Register your PostgreSQL connection
    Organize your data sources into domains and assign domain permissions to accountable teams
    Connect an IAM
    Map external user IDs to Immuta
    Publish a data product
    Request access to a data product
    Connect an external catalog
    Run identification
    Author a global subscription policy
    Configure audit
    Customizing the integration
    Setting up users
    Spark environment variables
    Ephemeral overrides
    Non-privileged users
    : Non-privileged users are any users who are not privileged users, and all authorization for non-privileged users is determined by Immuta policies.

    Whether a user is a privileged user or a non-privileged user, for a given query or job, is cached once first determined, based on IMMUTA_SPARK_ACL_PRIVILEGED_TIMEOUT_SECONDS environment variable. This caching can be disabled entirely by setting the value of that environment variable to 0.

    hashtag
    Mapping Databricks users to Immuta

    Usernames in Databricks must match the usernames in the connected Immuta tenant. By default, the Immuta Spark plugin checks the Databricks username against the username within Immuta's internal IAM to determine access. However, you can integrate your existing IAM with Immuta and use that instead of the default internal IAM. Ideally, you should use the same identity manager for Immuta that you use for Databricks. See the Immuta support matrix page for a list of supported identity providers and protocols.

    It is possible within Immuta to have multiple users share the same username if they exist within different IAMs. In this case, the cluster can be configured to look up users from a specified IAM. To do this, the value of theIMMTUA_USER_MAPPING_IAMID Spark environment variable must be updated to be the targeted IAM ID configured within the Immuta tenant. The targeted IAM ID can be found on the App settings page. Each Databricks cluster can only be mapped to one IAM.

    hashtag
    User impersonation

    Databricks user impersonation allows a Databricks user to impersonate an Immuta user. With this feature,

    • the Immuta user who is being impersonated does not have to have a Databricks account, but they must have an Immuta account.

    • the Databricks user who is impersonating an Immuta user does not have to be associated with Immuta. For example, this could be a service account.

    When acting under impersonation, the Databricks user loses their privileged access, so they can only access the tables the Immuta user has access to and only perform DDL commands when that user is acting under an allowed circumstance (such as workspaces, scratch paths, or non-Immuta reads/writes).

    Use the IMMUTA_SPARK_DATABRICKS_ALLOWED_IMPERSONATION_USERS Spark environment variable to enable user impersonation.

    circle-exclamation

    Scala clusters

    Immuta discourages use of this feature with Scala clusters, as the proper security mechanisms were not built to account for user isolation limitations in Scala clusters. Instead, this feature was developed for the BI tool use case in which service accounts connecting to the Databricks cluster need to impersonate Immuta users so that policies can be enforced.

    Databricks workspace adminsarrow-up-right
    IMMUTA_SPARK_ACL_ALLOWLIST
    Hive metastore table access controlarrow-up-right

    Legacy Hive metastore and Unity Catalog: If you need to work with database objects registered in both the legacy Hive metastore and in Unity Catalog, metastore magic allows you to use both integrations.

    Databricks Spark integration
    Databricks Unity Catalog integration
    Databricks Lakebase
    Databricks Unity Catalog integration
    AWS Glue Data Catalogarrow-up-right
    Databricks Unity Catalog integration reference guide
    Hive metastore table access controlsarrow-up-right
    queries the corresponding view

    USAGE on all databases and schemas that contain data you want to register

  • The following privileges WITH GRANT OPTION on objects registered in Immuta:

    • DELETE

    • INSERT

    • SELECT

    • TRUNCATE

    • UPDATE

  • CREATE ROLE

  • This role allows Immuta to apply masking and row-level policies to Redshift securables.

    The following privileges WITH GRANT OPTION on objects registered in Immuta:

    • DELETE

    • INSERT

    • SELECT

    • TRUNCATE

    • UPDATE

    Immuta system account

    These privileges allow Immuta to apply read and write subscription policies on tables registered in Immuta.

    ✅

    ✅

    grants Amazon Redshift privileges
    Protecting data page
    policy exemption role
    PostgreSQL integration reference guide
    Author a subscription policy page
    Subscription policy access types
    Databricks Lakebase connection reference guide
    Author a subscription policy page
    Subscription policy access types
    hashtag
    Create Immuta Policies to Protect the Data

    Required Permission: Immuta: GOVERNANCE

    Build Immuta data policies to fit your organization's compliance requirements.

    It's important to understand that subscription policies are not relevant to Snowflake data shares, because the act of sharing the data is the subscription policy. Data policies can be enforced on the consuming account from the producer account on a share following these instructions.

    hashtag
    Register the Snowflake Data Consumer with Immuta

    Required Permission: Immuta: USER_ADMIN

    To register the Snowflake data consumer in Immuta,

    1. Create a new Immuta user.

    2. Update the Immuta user's Snowflake username to match the account ID for the data consumer. This value is the output on the data consumer side when SELECT CURRENT_ACCOUNT() is run in Snowflake.

    3. Give the Immuta user the appropriate attributes and groups for your organization's policies.

    4. Subscribe the Immuta user to the data sources.

    hashtag
    Create the Snowflake Data Share

    Required Permission: Snowflake ACCOUNTADMIN

    To share the policy-protected data source,

    1. Create a Snowflake Data Sharearrow-up-right of the Snowflake table that has been registered in Immuta.

    2. Grant reference usage on the Immuta database to the share you created:

      Replace the content in angle brackets above with the name of your Immuta database and Snowflake data share.

    Snowflake Secure Data Sharingarrow-up-right
    Snowflake integration enabled
    Snowflake tables registered in Immuta as data sources

    DeltaTable.convertToDelta

    CONVERT TO DELTA parquet./path/to/parquet/

    DeltaTable.delete

    DELETE FROM [table_identifier delta./path/to/delta/] WHERE condition

    DeltaTable.generate

    GENERATE symlink_format_manifest FOR TABLE [table_identifier delta./path/to/delta]

    DeltaTable.history

    DESCRIBE HISTORY [table_identifier delta./path/to/delta] (LIMIT x)

    DeltaTable.merge

    MERGE INTO

    DeltaTable.update

    UPDATE [table_identifier delta./path/to/delta/] SET column = valueWHERE (condition)

    DeltaTable.vacuum

    VACUUM [table_identifier delta./path/to/delta]

    Delta SQL Commandsarrow-up-right
    Create a table in the project workspace

    Tag Filter: This filter determines which tags to propagate using lineage. Enter a regular expression that excludes / from the beginning and end to filter tags. Without this filter, Immuta will ingest lineage for every tag on your Snowflake instance.

    batchSize (integer): This parameter configures the number of rows Immuta ingests per batch when streaming Access History data from your Snowflake instance. Minimum 1.

  • lastTimestamp (string): Setting this parameter will only return lineage events later than the value provided. Use a format like 2022-06-29T09:47:06.012-07:00.

  • follow the steps in this guide to configure the Snowflake integration
    Authenticate with the Immuta API
    Register Snowflake data sources
    Build policies
    circle-info

    DBFS FUSE mount limitation: This feature cannot be used in environments with E2 Private Link enabled.

    For example,

    In Python,

    Note: This solution also works in R and Scala.

    hashtag
    Enable DBFS FUSE mount

    To enable the DBFS FUSE mount, set this configuration in the Spark environment variables: IMMUTA_SPARK_DATABRICKS_DBFS_MOUNT_ENABLED=true.

    circle-info

    Mounting a bucket

    • Users can mount additional buckets to DBFSarrow-up-right that can also be accessed using the FUSE mount.

    • Mounting a bucket is a one-time action, and the mount will be available to all clusters in the workspace from that point on.

    • Mounting must be performed from a non-Immuta cluster.

    hashtag
    Scala DBUtils (and %fs magic) with scratch paths

    Scratch paths will work when performing arbitrary remote filesystem operations with fs magic or Scala dbutils.fs functions. For example,

    hashtag
    Configure Scala DBUtils (and %fs magic) with scratch paths

    To support %fs magic and Scala DBUtils with scratch paths, configure

    hashtag
    Configure DBUtils in Python

    To use dbutils in Python, set this configuration: immuta.spark.databricks.py4j.strict.enabled=false.

    hashtag
    Example workflow

    This section illustrates the workflow for getting a file from a remote scratch path, editing it locally with Python, and writing it back to a remote scratch path.

    1. Get the file from remote storage:

    2. Make a copy if you want to explicitly edit localScratchFile, as it will be read-only and owned by root:

    3. Write the new file back to remote storage:

    GRANT REFERENCE_USAGE ON DATABASE "<Immuta database of the provider account>" TO SHARE "<DATA_SHARE>";
    MERGE INTO delta_native.target_native as target
    USING immuta_temp_view_data_source as source
    ON target.dr_number = source.dr_number
    WHEN MATCHED THEN
    UPDATE SET target.date_reported = source.date_reported
    curl -X 'POST' \
        'https://www.organization.immuta.com/lineage/ingest/snowflake' \
        -H 'accept: application/json' \
        -H 'Content-Type: application/json' \
        -H 'Authorization: 846e9e43c86a4ct1be14290d95127d13f' \
        -d '{
        "tableFilter": "MY_DATABASE\\MY_SCHEMA\\..*",
        "batchSize": 1,
        "lastTimestamp": "2022-06-29T09:47:06.012-07:00"
        }'
    dbutils.fs.cp(s3ScratchFile, "file://{}".format(localScratchFile))
    shutil.copy(localScratchFile, localScratchFileCopy)
    with open(localScratchFileCopy, "a") as f:
        f.write("Some appended file content")
    dbutils.fs.cp("file://{}".format(localScratchFileCopy), s3ScratchFile)
    %sh echo "I'm creating a new file in DBFS" > /dbfs/my/newfile.txt
    %python
    with open("/dbfs/my/newfile.txt", "w") as f:
      f.write("I'm creating a new file in DBFS")
    %fs put -f s3://my-bucket/my/scratch/path/mynewfile.txt "I'm creating a new file in S3"
    %scala dbutils.fs.put("s3://my-bucket/my/scratch/path/mynewfile.txt", "I'm creating a new file in S3")
           <property>
               <name>immuta.spark.databricks.scratch.paths</name>
               <value>s3://my-bucket/my/scratch/path</value>
           </property>
    %python
    import os
    import shutil
    
    s3ScratchFile = "s3://some-bucket/path/to/scratch/file"
    localScratchDir = os.environ.get("IMMUTA_LOCAL_SCRATCH_DIR")
    localScratchFile = "{}/myfile.txt".format(localScratchDir)
    localScratchFileCopy = "{}/myfile_copy.txt".format(localScratchDir)
    Organize your data sources into domains and assign domain permissions to accountable teams (recommended): Use domains to segment your data and assign responsibilities to the appropriate team members. These domains will then be used in policies, audit, and identification.
    2

    Register your users

    These guides provide instructions on setting up your users in Immuta.

    1. Integrate an IAM with Immuta: Connect the IAM your organization already uses and allow Immuta to register your users for you.

    2. Map external user IDs from Databricks to Immuta: Ensure the user IDs in Immuta, Databricks, and your IAM are aligned so that the right policies impact the right users.

    3

    Add data metadata

    These guides provide instructions on getting your data metadata set up in Immuta for use in policies.

    1. Connect an external catalog: Connect the external catalog your organization already uses and allow Immuta to continually sync your tags with your data sources for you.

    2. Run identification: Identification allows you to automate data tagging using identifiers that detect certain data patterns.

    4

    Protect and monitor data access

    These guides provide instructions on authoring policies and auditing data access.

    • Author a global subscription policy: Once you add your data metadata to Immuta, you can immediately create policies that utilize your tags and apply to your tables. Subscription policies can be created to dictate access to data sources.

    • Author a global data policy: Data metadata can also be used to create data policies that apply to data sources as they are registered in Immuta. Data policies dictate what data a user can see once they are granted access to a data source. Using catalog and identification tags you can create proactive policies, knowing that they will apply to data sources as they are added to Immuta with the automated tagging.

    • : Once you have your data sources and users, and policies granting them access, you can set up audit export. This will export the audit logs from user queries, policy changes, and tagging updates.

    Configure your Databricks Spark integration
    Register Databricks securable objects in Immuta as data sources
    An AWS IAM role for Redshiftarrow-up-right that is associated with your Redshift clusterarrow-up-right.
  • The enable_case_sensitive_identifier parameterarrow-up-right must be set to false (default setting) for your Redshift cluster.

  • A Redshift database that contains an external schema and external tablesarrow-up-right. You have two options for configuring this database:

    • Configure the integration with an existing database that contains the external tables: Instead of creating an immuta database that manages all schemas and views created when Redshift data is registered in Immuta, the integration adds the Immuta-managed schemas and views to an existing database in Redshift.

    • Configure the integration by creating a new immuta database: Create a new database for Immuta that manages all schemas and views created when Redshift data is registered in Immuta, and re-create all of your external tables in that database.

  • hashtag
    Permissions

    The user configuring the integration must have the permissions below.

    • APPLICATION_ADMIN Immuta permission

    • The Redshift role used to run the Immuta bootstrap script must have the following privileges when configuring the integration:

      • If using an existing database

        • ALL PRIVILEGES ON DATABASE for the database you configure the integration with, as you must manage grants on that database.

        • CREATE USER

        • GRANT TEMP ON DATABASE

      • If creating a new database

        • CREATE DATABASE

        • CREATE USER

        • GRANT TEMP ON DATABASE

      • If enabling user impersonation:

        • OWNERSHIP ON GROUP IMMUTA_IMPERSONATOR_ROLE

        • CREATE GROUP

    hashtag
    Add a Redshift integration

    Allow Immuta to create secure views of your external tables through one of these methods:

    • Configure the integration with an existing database that contains the external tables: Instead of creating an immuta database that manages all schemas and views created when Redshift data is registered in Immuta, the integration adds the Immuta-managed schemas and views to an existing database in Redshift.

    • Configure the integration by creating a new immuta database: Create a new database for Immuta that manages all schemas and views created when Redshift data is registered in Immuta, and re-create all of your external tables in that database.

    Select a tab below for instructions for either method.

    hashtag
    Configure the integration with an existing database

    1. Click the App Settings icon in the navigation menu.

    2. Click the Integrations tab.

    3. Click the +Add Integration button and select Redshift from the dropdown menu.

    4. Complete the Host and Port fields.

    5. Enter the name of the database you created the external schema in as the Immuta Database. This database will store all secure schemas and Immuta-created views.

    6. Opt to check the Enable Impersonation box and customize the Impersonation Role name as needed. This will allow users to natively impersonate another user.

    7. Select Manual and download the second bootstrap script (bootstrap script (Immuta database)) from the Setup section. The specified role used to run the bootstrap needs to have the for an existing database.

    8. Run the bootstrap script (Immuta database) in the Redshift database that contains the external schema.

    9. Choose username and password as your authentication method, and enter the credentials from the bootstrap script for the Immuta_System_Account.

    10. Click Save.

    hashtag
    Configure the integration by creating a new database

    1. Click the App Settings icon in the navigation menu.

    2. Click the Integrations tab.

    hashtag
    Edit a Redshift Spectrum integration

    1. Click the App Settings icon in the navigation menu.

    2. Navigate to the Integrations tab and click the down arrow next to the Redshift Spectrum integration.

    3. Edit the field you want to change. Note any field shadowed is not editable, and the integration must be disabled and re-installed to change it.

    4. Download the Edit Script and run it in the Immuta Database in Amazon Redshift.

    5. In Immuta, enter the credentials used to initially configure the integration.

    6. Click Save.

    hashtag
    Remove a Redshift Spectrum integration

    circle-exclamation

    Disabling Amazon Redshift Spectrum

    Disabling the Amazon Redshift Spectrum integration is not supported when you set the fields nativeWorkspaceName, nativeViewName, and nativeSchemaName to create Redshift Spectrum data sources. Disabling the integration when these fields are used in metadata ingestion causes undefined behavior.

    1. Click the App Settings icon in the navigation menu.

    2. Navigate to the Integrations tab and click the down arrow next to the Amazon Redshift Spectrum integration.

    3. Click the checkbox to disable the integration.

    4. Enter the credentials that were used to initially configure the integration.

    5. Click cleanup script to download the script.

    6. Click Save.

    7. Run the cleanup script in Amazon Redshift.

    Amazon Redshift Spectrum integration
    Integrations API getting started guide
    Contact Immutaarrow-up-right

    ❌

    ❌

    ✅

    ❌

    ✅

    hashtag
    Prerequisite

    • A running dedicated SQL pool

    hashtag
    Authentication methods

    The Azure Synapse Analytics integration supports the following authentication methods to configure the integration and create data sources:

    • Username and password: Immuta supports SQL authentication with username and password for Azure Synapse Analytics. See the SQL Authentication in Azure Synapse Analytics documentationarrow-up-right for details.

    • OAuth authentication with Microsoft Entra ID: You can use this authentication method to register data sources or configure the Azure Synapse Analytics integration using the manual setup method. To use this authentication method, OAuth must be set up via Microsoft Entra ID app registration with a client secretarrow-up-right. See the Microsoft Entra documentationarrow-up-right for details about using OAuth authentication with Microsoft Entra ID.

    hashtag
    Tag ingestion

    Immuta cannot ingest tags from Synapse, but you can connect any of these supported external catalogs to work with your integration.

    hashtag
    User impersonation

    Impersonation allows users to query data as another Immuta user in Synapse. To enable user impersonation, see the User Impersonation page.

    hashtag
    Multiple integrations

    A user can configure multiple integrations of Synapse to a single Immuta tenant.

    hashtag
    Limitations

    • Immuta does not support the following masking types in this integration because of limitations with dedicated SQL pools (linked below). Any column assigned one of these masking types will be masked to NULL:

      • Reversible Masking: Synapse UDFs currently only support SQL, but Immuta needs to execute code (such as JavaScript or Python) to support this masking feature. See the Synapse Documentation for detailsarrow-up-right.

      • Format Preserving Masking: Synapse UDFs currently only support SQL, but Immuta needs to execute code (such as JavaScript or Python) to support this masking feature. See the Synapse Documentation for detailsarrow-up-right.

      • Regex: The built in string replace function does not support full regex. See the .

    • The delimiters configured when enabling the integration cannot be changed once they are set. To change the delimiters, the integration has to be disabled and re-enabled.

    • If the generated view name is more than 128 characters, then the view name is shortened to 128 characters. This could cause collisions between view names if the shortened version is the same for two different data sources.

    • For proper updates, the dedicated SQL pools have to be running when changes are made to users or data sources in Immuta.

    Project Workspaces

    Tag Ingestion

    User Impersonation

    Query Audit

    Multiple Integrations

    Azure Synapse integration page

    Connect your technology

    These guides provide instructions on getting your data set up in Immuta for the Marketplace and Governance apps.

    1. Register your Databricks Unity Catalog connection: Using a single setup process, connect Databricks Unity Catalog to Immuta. This will register your data objects into Immuta and allow you to start dictating access through Marketplace or global policies.

    2. Organize your data sources into domains and assign domain permissions to accountable teams: Use domains to segment your data and assign responsibilities to the appropriate team members. These domains will then be used in Marketplace, policies, audit, and identification.

    circle-info

    Connections are available on all tenants created after February 26, 2025. If you do not have connections enabled on your tenant, and using the legacy workflow.

    2

    Register your users

    These guides provide instructions on getting your users set up in Immuta for the Marketplace and Governance apps.

    1. Connect an IAM: Bring the IAM your organization already uses and allow Immuta to register your users for you.

    2. Map external user IDs from Databricks to Immuta: Ensure the user IDs in Immuta, Databricks, and your IAM are aligned so that the right policies impact the right users.

    3

    Start using Marketplace

    These guides provide instructions on using Marketplace for the first time.

    1. Publish a data product: Once you register your tables and users, you can immediately start publishing data products in Marketplace.

    2. Request access to a data product: Users must then request access to your data products in Marketplace.

    3. : To grant access to a data product and its tables, respond to the access request.

    4

    Add data metadata

    These guides provide instructions on getting your data metadata set up in Immuta for the Governance app.

    1. Connect an external catalog: Bring the external catalog your organization already uses and allow Immuta to continually sync your tags with your data sources for you.

    2. Run identification: Identification allows you to automate data tagging using identifiers that detect certain data patterns.

    5

    Start using the Governance app

    These guides provide instructions on using the Governance app for the first time.

    1. Author a global subscription policy: Once you add your data metadata to Immuta, you can immediately create policies that utilize your tags and apply to your tables. Subscription policies can be created to dictate access to data sources.

    2. Author a global data policy: Data metadata can also be used to create data policies that apply to data sources as they are registered in Immuta. Data policies dictate what data a user can see once they are granted access to a data source. Using catalog and identification tags you can create proactive policies, knowing that they will apply to data sources as they are added to Immuta with the automated tagging.

    3. : Once you have your data sources and users, and policies granting them access, you can set up audit export. This will export the audit logs from user queries, policy changes, and tagging updates.

    reference guide
    metastore createdarrow-up-right
    Connect your technology

    These guides provide instructions on getting your data set up in Immuta for the Marketplace and Governance apps.

    1. Register your Snowflake connection: Using a single setup process, connect Snowflake to Immuta. This will register your data objects into Immuta and allow you to start dictating access through Marketplace or global policies.

    2. Organize your data sources into domains and assign domain permissions to accountable teams: Use domains to segment your data and assign responsibilities to the appropriate team members. These domains will then be used in Marketplace, policies, audit, and identification.

    circle-info

    Connections are available on all tenants created after February 26, 2025. If you do not have connections enabled on your tenant, configure Snowflake and register data sources using the legacy workflow.

    2

    Register your users

    These guides provide instructions on getting your users set up in Immuta for the Marketplace and Governance apps.

    1. Connect an IAM: Bring the IAM your organization already uses and allow Immuta to register your users for you.

    2. Map external user IDs from Snowflake to Immuta: Ensure the user IDs in Immuta, Snowflake, and your IAM are aligned so that the right policies impact the right users.

    3

    Start using Marketplace

    These guides provide instructions on using Marketplace for the first time.

    1. Publish a data product: Once you register your tables and users, you can immediately start publishing data products in Marketplace.

    2. Request access to a data product: Users must then request access to your data products in Marketplace.

    3. : To grant access to a data product and its tables, respond to the access request.

    4

    Add data metadata

    These guides provide instructions on getting your data metadata set up in Immuta for the Governance app.

    1. Connect an external catalog: Bring the external catalog your organization already uses and allow Immuta to continually sync your tags with your data sources for you.

    2. Run identification: Identification allows you to automate data tagging using identifiers that detect certain data patterns.

    5

    Start using the Governance app

    These guides provide instructions on using the Governance app for the first time.

    1. Author a global subscription policy: Once you add your data metadata to Immuta, you can immediately create policies that utilize your tags and apply to your tables. Subscription policies can be created to dictate access to data sources.

    2. Author a global data policy: Data metadata can also be used to create data policies that apply to data sources as they are registered in Immuta. Data policies dictate what data a user can see once they are granted access to a data source. Using catalog and identification tags you can create proactive policies, knowing that they will apply to data sources as they are added to Immuta with the automated tagging.

    3. : Once you have your data sources and users, and policies granting them access, you can set up audit export. This will export the audit logs from user queries, policy changes, and tagging updates.

    reference guide
    Delta Lake

    When using Delta Lake, the API does not go through the normal Spark execution path. This means that Immuta's Spark extensions do not provide protection for the API. To solve this issue and ensure that Immuta has control over what a user can access, the Delta Lake API is blocked.

    Spark SQL can be used instead to give the same functionality with all of Immuta's data protections. See the Delta API reference guide for a list of corresponding Spark SQL calls to use.

    hashtag
    Spark direct file reads

    In addition to supporting direct file reads through workspace and scratch paths, Immuta allows direct file reads in Spark for file paths. As a result, users who prefer to interact with their data using file paths or who have existing workflows revolving around file paths can continue to use these workflows without rewriting those queries for Immuta.

    When reading from a path in Spark, the Immuta Databricks Spark plugin queries the Immuta Web Service to find Databricks data sources for the current user that are backed by data from the specified path. If found, the query plan maps to the Immuta data source and follows existing code paths for policy enforcement.

    Users can read data from individual parquet files in a sub-directory and partitioned data from a sub-directory (or by using a where predicate). Expand the blocks below to view examples of reading data using these methods.

    chevron-rightRead data from an individual parquet filehashtag

    To read from an individual file, load a partition file from a sub-directory:

    spark.read.format("parquet").load("s3:/my_bucket/path/to/my_parquet_table/partition_column=01/my_file.parquet")
    chevron-rightRead partitioned data from a sub-directoryhashtag

    To read partitioned data from a sub-directory, load a parquet partition from a sub-directory:

    spark.read.format("parquet").load("s3:/my_bucket/path/to/my_parquet_table/partition_column=01")

    Alternatively, load a parquet partition using a where predicate:

    spark.read.format("parquet").load("s3:/my_bucket/path/to/my_parquet_table").where("partition_column=01")Read partitioned data from a sub-directory

    hashtag
    Limitations

    • Direct file reads for Immuta data sources only apply to data sources created from tables, not data sources created from views or queries.

    • If more than one data source has been created for a path, Immuta will use the first valid data source it finds. It is therefore not recommended to use this integration when more than one data source has been created for a path.

    • In Databricks, multiple input paths are supported as long as they belong to the same data source.

    • CSV-backed tables are not currently supported.

    • Loading a delta partition from a sub-directory is not recommended by Spark and is not supported in Immuta. Instead, use a where predicate:

    hashtag
    User impersonation

    User impersonation allows Databricks users to query data as another Immuta user. To impersonate another user, see the User impersonation page.

    Delta Lake
    direct file reads in Spark for file paths
    user impersonation
    df = spark.sql("select * from immuta.table")
    import org.apache.spark.sql.SparkSession
    
    val spark = SparkSession
      .builder()
      .appName("Spark SQL basic example")
      .config("spark.some.config.option", "some-value")
      .getOrCreate()
    val sqlDF = spark.sql("SELECT * FROM immuta.table")
    %sql
    select * from immuta.table
    library(SparkR)
    df <- SparkR::sql("SELECT * from immuta.table")
    Automatic: Grant Immuta one-time use of credentials with the following privileges to automatically edit or remove the integration:
    • CREATE DATABASE ON ACCOUNT WITH GRANT OPTION

    • CREATE ROLE ON ACCOUNT WITH GRANT OPTION

    • CREATE USER ON ACCOUNT WITH GRANT OPTION

    • MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION

  • Manual: Run the Immuta script in your Snowflake environment as a user with the following privileges to edit or remove the integration:

    • CREATE DATABASE ON ACCOUNT WITH GRANT OPTION

    • CREATE ROLE ON ACCOUNT WITH GRANT OPTION

    • CREATE USER ON ACCOUNT WITH GRANT OPTION

    • MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION

    • APPLY MASKING POLICY ON ACCOUNT WITH GRANT OPTION

    • APPLY ROW ACCESS POLICY ON ACCOUNT WITH GRANT OPTION

  • hashtag
    Edit a Snowflake integration

    Select one of the following options for editing your integration:

    • Automatic: Grant Immuta one-time use of credentials to automatically edit the integration.

    • Manual: Run the Immuta script in your Snowflake environment yourself to edit the integration.

    hashtag
    Automatic edit

    1. Click the App Settings icon in the navigation menu.

    2. Click the Integrations tab and click the down arrow next to the Snowflake integration.

    3. Edit the field you want to change or check a checkbox of a feature you would like to enable. Note any field shadowed is not editable, and the integration must be disabled and re-installed to change it.

    4. From the Select Authentication Method Dropdown, select either Username and Password or Key Pair Authentication:

      • Username and Password option: Complete the Username, Password, and Role fields.

      • Key Pair Authentication option:

        1. Complete the Username field.

    5. Click Save.

    hashtag
    Manual edit

    1. Click the App Settings icon in the navigation menu.

    2. Click the Integrations tab and click the down arrow next to the Snowflake integration.

    3. Edit the field you want to change or check a checkbox of a feature you would like to enable. Note any field shadowed is not editable, and the integration must be disabled and re-installed to change it.

    4. Click edit script to download the script, and then run it in Snowflake.

    5. Click Save.

    hashtag
    Remove a Snowflake integration

    Select one of the following options for deleting your integration:

    • Automatic: Grant Immuta one-time use of credentials to automatically remove the integration and Immuta-managed resources from your Snowflake environment.

    • Manual: Run the Immuta script in your Snowflake environment yourself to remove Immuta-managed resources and policies from Snowflake.

    hashtag
    Automatic removal

    1. Click the App Settings icon in the navigation menu.

    2. Click the Integrations tab and click the down arrow next to the Snowflake integration.

    3. Click the checkbox to disable the integration.

    4. Enter the Username, Password, and Role that was entered when the integration was configured.

    5. Click Save.

    hashtag
    Manual removal

    circle-exclamation

    Cleaning up your Snowflake environment Until you manually run the cleanup script in your Snowflake environment, Immuta-managed roles and Immuta policies will still exist in Snowflake.

    1. Click the App Settings icon in the navigation menu.

    2. Click the Integrations tab and click the down arrow next to the Snowflake integration.

    3. Click the checkbox to disable the integration.

    4. Click cleanup script to download the script.

    5. Click Save.

    6. Run the cleanup script in Snowflake.

    your connection settings
    deregister your connection
    edit
    remove

    Configure Azure Synapse Analytics Integration

    This page provides a tutorial for enabling the Azure Synapse Analytics integration on the Immuta app settings page. To configure this integration via the Immuta API, see the Configure an Azure Synapse Analytics integration API guide.

    For an overview of the integration, see the Azure Synapse Analytics overview documentation.

    hashtag
    Requirement

    • A running dedicated SQL pool is required.

    hashtag
    Prerequisites

    If you are using the OAuth authentication method,

    • Ensure that Microsoft Entra ID is on the same account as the Azure Synapse Analytics workspace and dedicated SQL pool.

    • .

    • Select Accounts in this organizational directory only as the account type.

    hashtag
    Add an Azure Synapse Analytics integration

    1. Click the App Settings icon in the navigation menu.

    2. Click the Integrations tab.

    3. Click the +Add Integration button and select Azure Synapse Analytics from the dropdown menu.

    4. Complete the Host, Port, Immuta Database

    hashtag
    Select your configuration method

    You have two options for configuring your Azure Synapse Analytic environment:

    • : Grant Immuta one-time use of credentials to automatically configure your environment and the integration.

    • : Run the Immuta script in your Azure Synapse Analytics environment yourself to configure the integration.

    hashtag
    Automatic setup

    Enter the username and password in the Privileged User Credentials section.

    hashtag
    Manual setup

    1. Select Manual.

    2. Download, fill out the appropriate fields, and run the bootstrap master script and bootstrap script linked in the Setup section. Note: The master script is not required if you're using the OAuth authentication method.

    3. Select the authentication method:

    hashtag
    Save the configuration

    Click Save.

    hashtag
    Register data

    .

    hashtag
    Edit an Azure Synapse Analytics integration

    1. Click the App Settings icon in the navigation menu.

    2. Navigate to the Integrations tab and click the down arrow next to the Azure Synapse Analytics Integration.

    3. Edit the field you want to change. Note any field shadowed is not editable, and the integration must be disabled and re-installed to change it.

    4. Use the authentication method and credentials you provided when initially configuring the integration.

    circle-info

    Immuta requires temporary, one-time use of credentials with specific permissions

    When performing edits to an integration, Immuta requires temporary, one-time use of credentials of a Superuser or a user with the Manage GRANTS permission.

    Alternatively, you can download the Edit Script from your Azure Synapse Analytics configuration on the Immuta app settings page and run it in Azure Synapse Analytics.

    hashtag
    Remove an Azure Synapse Analytics integration

    1. Click the App Settings icon in the navigation menu.

    2. Navigate to the Integrations tab and click the down arrow next to the Azure Synapse Analytics Integration.

    3. Click the checkbox to disable the integration.

    4. Enter the credentials that were used to initially configure the integration.

    Register an Oracle Connection

    circle-info

    Immuta policies will not be automatically enforced in Oracle

    While you can author and apply subscription and data policies on Oracle data sources within Immuta, these policies will not be enforced natively in the Oracle platform. You can use Immuta webhooks to be notified about changes to user access and make appropriate access updates in Oracle using your own process.

    To use this integration, contact your Immuta representative.

    hashtag
    Requirement

    hashtag
    Permissions

    The user registering the connection must have the permissions below.

    • APPLICATION_ADMIN Immuta permission

    • Either of the following Oracle system privileges:

      • GRANT ANY ROLE

    hashtag
    Create the database user

    1. . Immuta will use this system account continuously to crawl the connection.

    2. on the system views listed below:

      • V$DATABASE

      • CDB_PDBS

    hashtag
    Register an Oracle connection

    1. In Immuta, click Data and select Connections in the navigation menu.

    2. Click the + Add Connection button.

    3. Select the Oracle tile.

    4. Select RDS as the deployment method.

    Getting Started with Databricks Lakebase

    circle-info

    Public preview: This feature is available to all accounts. Contact your Immuta representative for details.

    The how-to guides linked on this page illustrate how to use Databricks Lakebase with Immuta. See the reference guide for information about the Databricks Lakebase integration.

    1

    Connect your technology

    These guides provide instructions on getting your data set up in Immuta for the Marketplace and Governance apps.

    1. : Using a single setup process, connect Databricks Lakebase to Immuta. This will register your data objects in Immuta and allow you to start dictating access through Marketplace or global policies.

    2. : Use domains to segment your data and assign responsibilities to the appropriate team members. These domains will then be used in Marketplace, policies, audit, and identification.

    2

    Register your users

    These guides provide instructions on getting your users set up in Immuta for the Marketplace and Governance apps.

    1. : Bring the IAM your organization already uses and allow Immuta to register your users for you.

    2. : Ensure the user IDs in Immuta and your data platform are aligned so that the right policies impact the right users. This step can be completed during initial configuration of your IAM or after it has been connected to Immuta.

    3

    Start using Marketplace

    These guides provide instructions on using Marketplace for the first time.

    1. : Once you register your tables and users, you can immediately start publishing data products in Marketplace.

    2. : Users must then request access to your data products in Marketplace.

    4

    Add data metadata

    These guides provide instructions on getting your data metadata set up in Immuta for the Governance app.

    1. : Bring the external catalog your organization already uses and allow Immuta to continually sync your tags with your data sources for you.

    2. : Identification allows you to automate data tagging using identifiers that detect certain data patterns.

    5

    Start using the Governance app

    These guides provide instructions on using the Governance app for the first time.

    1. : Once you add your data metadata to Immuta, you can immediately create policies that utilize your tags and apply to your tables. Subscription policies can be created to dictate access to data sources.

    2. : Once you have your data sources and users, and policies granting them access, you can set up audit export. This will export the audit logs from policy changes and tagging updates.

    Snowflake

    Immuta manages access to Snowflake tables by administering Snowflake row access policiesarrow-up-right and column masking policiesarrow-up-right on those tables, allowing users to query tables directly in Snowflake while dynamic policies are enforced.

    hashtag
    Getting started

    This getting started guide outlines how to integrate your Snowflake account with Immuta.

    hashtag
    How-to guides

    • Integration settings:

      • : Enable Snowflake table grants and configure the Snowflake role prefix.

      • : Use Snowflake data sharing with table grants or project workspaces.

    hashtag
    Reference guides

    • : This reference guide describes the design and features of the Snowflake integration.

    • : Snowflake table grants simplifies the management of privileges in Snowflake when using Immuta. Instead of manually granting users access to tables registered in Immuta, you allow Immuta to manage privileges on your Snowflake tables and views according to subscription policies. This guide describes the components of Snowflake table grants and how they are used in Immuta's Snowflake integration.

    • : Organizations can share the policy-protected data of their Snowflake database with other Snowflake accounts with Immuta policies enforced in real time. This guide describes the components of using Immuta with Snowflake data shares.

    hashtag
    Explanatory guide

    : A phased onboarding approach to configuring the Snowflake integration ensures that your users will not be immediately affected by changes as you add data sources and policies. This guide describes the settings and requirements for implementing this phased approach.

    Azure Synapse Analytics Overview

    This page describes the Azure Synapse Analytics integration, through which Immuta applies policies directly in Azure Synapse Analytics. For a tutorial on configuring Azure Synapse Analytics see the Azure Synapse Integration page.

    hashtag
    Overview

    The Azure Synapse Analytics is a policy push integration that allows Immuta to apply policies directly in Azure Synapse Analytics Dedicated SQL pools without the need for users to go through a proxy. Instead, users can work within their existing Synapse Studio and have per-user policies dynamically applied at query time.

    hashtag
    Architecture

    This integration works on a per-Dedicated-SQL-pool basis: all of Immuta's policy definitions and user entitlements data need to be in the same pool as the target data sources because Dedicated SQL pools do not support cross-database joins. Immuta creates schemas inside the configured Dedicated SQL pool that contain policy-enforced views that users query.

    When the integration is configured, the Application Admin specifies the

    • Immuta Database: This is the pre-existing database Immuta uses. Immuta will create views from the tables contained in this database, and all schemas and views created by Immuta will exist in this database, such as the schemas immuta_system, immuta_functions, and the immuta_procedures that contain the tables, views, UDFs, and stored procedures that support the integration.

    • Immuta Schema: The schema that Immuta manages. All views generated by Immuta for tables registered as data sources will be created in this schema.

    For a tutorial on configuring the integration see the .

    hashtag
    Data source naming convention

    Synapse data sources are represented as views and are under one schema instead of a database, so their view names are a combination of their schema and table name, separated by an underscore.

    For example, with a configuration that uses IMMUTA as the schema in the database dedicated_pool, the view name for the data source dedicated_pool.tpc.case would be dedicated_pool.IMMUTA.tpc_case.

    You can see the view information on the data source details page under Connection Information.

    hashtag
    Policy enforcement

    This integration uses webhooks to keep views up-to-date with the corresponding Immuta data sources. When a data source or policy is created, updated, or disabled, a webhook is called that creates, modifies, or deletes the dynamic view in the Immuta schema. Note that only standard views are available because Azure Synapse Analytics Dedicated SQL pools do not support secure views.

    hashtag
    Integration health status

    The status of the integration is visible on the integrations tab of the Immuta application settings page. If errors occur in the integration, a banner will appear in the Immuta UI with guidance for remediating the error.

    The definitions for each status and the state of configured data platform integrations is available in the . However, the UI consolidates these error statuses and provides detail in the error messages.

    hashtag
    Data flow

    1. An Immuta Application Administrator , registering their initial Synapse Dedicated SQL pool with Immuta.

    2. Immuta creates Immuta schemas inside the configured Synapse Dedicated SQL pool.

    3. A Data Owner in Immuta as data sources. A Data Owner, Data Governor, or Administrator or in Immuta.

    4. Data source metadata, tags, user metadata, and policy definitions are stored in Immuta's Metadata Database.

    Run R and Scala spark-submit Jobs on Databricks

    This guide illustrates how to run R and Scala spark-submit jobs on Databricks, including prerequisites and caveats.

    hashtag
    R spark-submit

    hashtag

    Register a Databricks Lakebase Connection

    circle-info

    Public preview: This feature is available to all accounts. Contact your Immuta representative for details.

    hashtag
    Requirements

    Register a MariaDB Connection

    circle-info

    Immuta policies will not be automatically enforced in MariaDB

    While you can author and apply subscription and data policies on MariaDB data sources within Immuta, these policies will not be enforced natively in the MariaDB platform. You can use to be notified about changes to user access and make appropriate access updates in MariaDB using your own process.

    To use this integration, contact your Immuta representative.

    Security and Compliance

    Immuta offers several features to provide security for your users and Databricks clusters and to prove compliance and monitor for anomalies.

    hashtag
    Authentication

    hashtag
    Configuring the integration and registering data

    Register a Snowflake Connection

    hashtag
    Requirements

    • APPLICATION_ADMIN Immuta permission

    • The Snowflake user registering the connection and running the script must have the following privileges:

    Register a MySQL Connection

    circle-info

    Immuta policies will not be automatically enforced in MySQL

    While you can author and apply subscription and data policies on MySQL data sources within Immuta, these policies will not be enforced natively in the MySQL platform. You can use to be notified about changes to user access and make appropriate access updates in MySQL using your own process.

    To use this integration, contact your Immuta representative.

    Click Key Pair (Required), and upload a Snowflake key pair file.
  • Complete the Role field.

  • Configure audit
  • REVOKE ALL PRIVILEGES ON DATABASE

  • Click the +Add Integration button and select Redshift from the dropdown menu.

  • Complete the Host and Port fields.

  • Enter an Immuta Database. This is a new database where all secure schemas and Immuta created views will be stored.

  • Opt to check the Enable Impersonation box and customize the Impersonation Role name as needed. This will allow users to natively impersonate another user.

  • Select Manual and download both of the bootstrap scripts from the Setup section. The specified role used to run the bootstrap needs to have the permissions listed above for a new database.

  • Run the bootstrap script (initial database) in the Redshift initial database.

  • Run the bootstrap script (Immuta database) in the new Immuta Database in Redshift.

  • Choose username and password as your authentication method, and enter the credentials from the bootstrap script for the Immuta_System_Account.

  • Click Save.

  • Then, add your external tables to the Immuta Database.

  • permissions listed above
    Synapse Documentation for detailsarrow-up-right
    configure Databricks Unity Catalog
    register data sources
    Respond to an access request
    Configure audit
    Respond to an access request
    Configure audit
    GRANT ANY PRIVILEGE
  • SYS.DBA_USERS

  • SYS.DBA_TABLES

  • SYS.DBA_VIEWS

  • SYS.DBA_MVIEWS

  • SYS.DBA_TAB_COLUMNS

  • SYS.DBA_OBJECTS

  • SYS.DBA_CONSTRAINTS

  • SYS.DBA_CONS_COLUMNS

  • Enter the host connection information:

    1. Display Name: This is the name of your new connection. This name will be used in the API (connectionKey), in data source names from the host, and on the connections page.

    2. Hostname: URL of your Oracle instance.

    3. Port: Port configured for Oracle.

    4. Database: The Oracle database you want to connect to. All databases in the host will be registered.

    5. Region: The region of the AWS account with your Oracle instance.

  • Enter the username and password of the Oracle database user you created above.

  • Click Save connection.

  • Amazon RDS for Oraclearrow-up-right
    Create a new database user in Oracle to serve as the Immuta system accountarrow-up-right
    Grant this account the SELECT Oracle privilegearrow-up-right
    Respond to an access request: To grant access to a data product and its tables, respond to the access request.
    Register your Databricks Lakebase connection
    Organize your data sources into domains and assign domain permissions to accountable teams
    Connect an IAM
    Map external user IDs to Immuta
    Publish a data product
    Request access to a data product
    Connect an external catalog
    Run identification
    Author a global subscription policy
    Configure audit

    Snowflake low row access policy mode: Enable Snowflake low row access policy mode.

  • Snowflake lineage tag propagation: Configure your Snowflake integration to automatically apply tags added to a Snowflake table to its descendant data source columns in Immuta.

  • Snowflake low row access policy mode: The Snowflake low row access policy mode improves query performance in Immuta's Snowflake integration. To do so, this mode decreases the number of Snowflake row access policies Immuta creates and uses table grants to manage user access. This guide describes the design and requirements of this mode.

  • Snowflake lineage tag propagation: Snowflake column lineage specifies how data flows from source tables or columns to the target tables in write operations. When Snowflake lineage tag propagation is enabled in Immuta, Immuta automatically applies tags added to a Snowflake table to its descendant data source columns in Immuta so you can build policies using those tags to restrict access to sensitive data.

  • Warehouse sizing recommendations: Adjust the size and scale of clusters for your warehouse to manage workloads so that you can use Snowflake compute resources the most cost effectively.

  • Register a Snowflake connection
    Enable Snowflake table grants
    Use Snowflake data sharing with Immuta
    Snowflake integration reference guide
    Snowflake table grants
    Snowflake data sharing with Immuta
    Phased Snowflake onboarding
    User Profile Delimiters
    : Since Azure Synapse Analytics dedicated SQL pools do not support array or hash objects, certain user access information is stored as delimited strings; the Application Admin can modify those delimiters to ensure they do not conflict with possible characters in strings.

    The Immuta Web Service calls a stored procedure that modifies the user entitlements or policies and updates data source view definitions as necessary.

  • A Synapse user who is subscribed to the data source in Immuta queries the corresponding data source view in Synapse and sees policy-enforced data.

  • Azure Synapse Integration page
    response schema of the integrations API
    configures the Synapse integration
    registers Synapse tables
    creates or changes a policy
    user
    Immuta supports the following authentication methods to configure the Databricks Spark integration and register data sources:
    • OAuth machine-to-machine (M2M): Immuta uses the Client Credentials Flowarrow-up-right to integrate with Databricks OAuth machine-to-machine authenticationarrow-up-right, which allows Immuta to authenticate with Databricks using a client secret. Once Databricks verifies the Immuta service principal’s identity using the client secret, Immuta is granted a temporary OAuth token to perform token-based authentication in subsequent requests. When that token expires (after one hour), Immuta requests a new temporary token. See the Databricks OAuth machine-to-machine (M2M) authentication pagearrow-up-right for more details.

    • Personal access token (PAT): This token gives Immuta temporary permission to push the cluster policies to the configured Databricks workspace and overwrite any cluster policy templates previously applied to the workspace when configuring the integration or to register securables as Immuta data sources.

    hashtag
    User authentication

    The built-in Immuta IAM can be used as a complete solution for authentication and fine-grained user entitlement. However, you can connect your existing identity management provider to Immuta to use that system for authentication and fine-grained user entitlement instead.

    Each of the supported identity providers includes a specific set of configuration options that enable Immuta to communicate with the IAM system and map the users, permissions, groups, and attributes into Immuta.

    See the Identity managers guide for a list of supported providers and details.

    See the Setting up users guide for details and instructions on mapping Databricks user accounts to Immuta.

    hashtag
    Cluster security

    hashtag
    Data processing and encryption

    See the Data processing and the Encryption and masking practices guides for more information about transmission of policy decision data, encryption of data in transit and at rest, and encryption key management.

    hashtag
    Protecting the Immuta configuration

    Non-administrator users on an Immuta-enabled Databricks cluster must not have access to view or modify Immuta configuration, as this poses a security loophole around Immuta policy enforcement. Databricks secretsarrow-up-right allow you to securely apply environment variables to Immuta-enabled clusters.

    Databricks secrets can be used in the environment variables configuration section for a cluster by referencing the secret path instead of the actual value of the environment variable.

    See the Installation and compliance guide for details and instructions on using Databricks secrets.

    hashtag
    Scala cluster security

    There are limitations to isolation among users in Scala jobs on a Databricks cluster. When data is broadcast, cached (spilled to disk), or otherwise saved to SPARK_LOCAL_DIR, it's impossible to distinguish between which user’s data is composed in each file/block. To address this vulnerability, Immuta suggests that you

    • limit Scala clusters to Scala jobs only and

    • require equalized projects, which will force all users to act under the same set of attributes, groups, and purposes with respect to their data access. This requirement guarantees that data being dropped into SPARK_LOCAL_DIR will have policies enforced and that those policies will be homogeneous for all users on the cluster. Since each user will have access to the same data, if they attempt to manually access other users' cached/spilled data, they will only see what they have access to via equalized permissions on the cluster. If project equalization is not turned on, users could dig through that directory and find data from another user with heightened access, which would result in a data leak.

    See the Installation and compliance guide for more details and configuration instructions.

    hashtag
    Auditing and compliance

    Immuta provides auditing features and governance reports so that data owners and governors can monitor users' access to data and detect anomalies in behavior.

    You can view the information in these audit logs on dashboards or export the full audit logs to S3 and ADLS for long-term backup and processing with log data processors and tools. This capability fosters convenient integrations with log monitoring services and data pipelines.

    See the Audit documentation for details about these capabilities and how they work with the Databricks Spark integration.

    hashtag
    Databricks query audit

    Immuta captures the code or query that triggers the Spark plan in Databricks, making audit records more useful in assessing what users are doing.

    To audit what triggers the Spark plan, Immuta hooks into Databricks where notebook cells and JDBC queries execute and saves the cell or query text. Then, Immuta pulls this information into the audits of the resulting Spark jobs.

    Immuta will audit queries that come from interactive notebooks, notebook jobs, and JDBC connections, but will not audit Scala or R submit jobs. Furthermore, Immuta only audits Spark jobs that are associated with Immuta tables. Consequently, Immuta will not audit a query in a notebook cell that does not trigger a Spark job, unless IMMUTA_SPARK_AUDI_ALL_QUERIES is set to true.

    See the Databricks Spark query audit logs page for examples of saved queries and the resulting audit records. To exclude query text from audit events, see the App settings page.

    hashtag
    Auditing all queries

    Immuta supports auditing all queries run on a Databricks cluster, regardless of whether users touch Immuta-protected data or not.

    See the Installation and compliance guide for details and instructions.

    hashtag
    Auditing queries run while impersonating another user

    When a query is run by a user impersonating another user, the extra.impersonationUser field in the audit log payload is populated with the Databricks username of the user impersonating another user. The userId field will return the Immuta username of the user being impersonated:

    See the Setting up users guide for details about user impersonation.

    hashtag
    Governance reports

    Immuta governance reports allow users with the GOVERNANCE Immuta permission to use a natural language builder to instantly create reports that delineate user activity across Immuta. These reports can be based on various entity types, including users, groups, projects, data sources, purposes, policy types, or connection types.

    See the Governance report types page for a list of report types and guidance.

    # Not recommended by Spark and not supported in Immuta
    spark.read.format("delta").load("s3:/my_bucket/path/to/my_delta_table/partition_column=01")
    
    # Recommended by Spark and supported in Immuta.
    spark.read.format("delta").load("s3:/my_bucket/path/to/my_delta_table").where("partition_column=01")
    
    {
      "id": "query-a20e-493e-id-c1ada0a23a26",
      [...]
      "userId": "<immuta_username>",
      [...]
      "extra": {
        [...]
        "impersonationUser": "<databricks_username>"
      }
      [...]
    }
    , and
    Immuta Schema
    fields.
  • Opt to check the Enable Impersonation box and customize the Impersonation Role name as needed. This will allow users to natively impersonate another user.

  • Opt to update the User Profile Delimiters. This will be necessary if any of the provided symbols are used in user profile information.

  • Username and Password: Enter the username and password in the Immuta System Account Credentials section. The username and password provided must be the credentials that were set in the bootstrap master script when you created the user.
  • Entra ID OAuth Client Secret: The values below can be found on the overview page of the application you created in Microsoft Entra ID. Before you enter this information, ensure you have completed the prerequisites for OAuth authentication listed above.

    1. Display Name: This must match the name of the OAuth application you registered.

    2. Tenant Id

    3. Client Id

    4. Client Secret: Enter the Value of the secret, not the secret ID.

  • Click Save.

    Click Save.

    Set up OAuth via Microsoft Entra ID app registration with a client secretarrow-up-right
    Automatic setup
    Manual setup
    Register Azure Synapse Analytics data in Immuta
    Prerequisites

    Before you can run spark-submit jobs on Databricks, complete the following steps.

    1. Initialize the Spark session:

      1. Enter these settings into the R submit script to allow the R script to access Immuta data sources, scratch paths, and workspace tables: immuta.spark.acl.assume.not.privileged="true" and spark.hadoop.immuta.databricks.config.update.service.enabled="false".

      2. Once the script is written, upload the script to a location in dbfs/S3/ABFS to give the Databricks cluster access to it.

    2. Because of how some user properties are populated in Databricks, load the SparkR library in a separate cell before attempting to use any SparkR functions.

    hashtag
    Create the R spark submit Job

    To create the R spark-submit job,

    1. Go to the Databricks jobs page.

    2. Create a new job, and select Configure spark-submit.

    3. Set up the parameters:

      Note: The path dbfs:/path/to/script.R can be in S3 or ABFS (on Azure Databricks), assuming the cluster is configured with access to that path.

    4. Edit the cluster configuration, and change the Databricks Runtime to be a .

    5. Configure the section as you normally would for an Immuta cluster.

    hashtag
    Scala spark-submit

    hashtag
    Prerequisites

    Before you can run spark-submit jobs on Databricks you must initialize the Spark session with the settings outlined below.

    1. Configure the Spark session with immuta.spark.acl.assume.not.privileged="true" and spark.hadoop.immuta.databricks.config.update.service.enabled="false".

      Note: Stop your Spark session (spark.stop()) at the end of your job or the cluster will not terminate.

    2. The spark submit job needs to be launched using a different classloader which will point at the designated user JARs directory. The following Scala template can be used to handle launching your submit code using a separate classloader:

    hashtag
    Create the Scala spark-submit Job

    To create the Scala spark-submit job,

    1. Build and upload your JAR to dbfs/S3/ABFS where the cluster has access to it.

    2. Select Configure spark-submit, and configure the parameters:

      Note: The fully-qualified class name of the class whose main function will be used as the entry point for your code in the --class parameter.

      Note: The path dbfs:/path/to/code.jar can be in S3 or ABFS (on Azure Databricks) assuming the cluster is configured with access to that path.

    3. Edit the cluster configuration, and change the Databricks Runtime to a .

    4. Include IMMUTA_INIT_ADDITIONAL_JARS_URI=dbfs:/path/to/code.jar in the "Environment Variables" (where dbfs:/path/to/code.jar is the path to your jar) so that the jar is uploaded to all the cluster nodes.

    hashtag
    Caveats

    • The user mapping works differently from notebooks because spark-submit clusters are not configured with access to the Databricks SCIM API. The cluster tags are read to get the cluster creator and match that user to an Immuta user.

    • Privileged users (Databricks admins and allowlisted users) must be tied to an Immuta user and given access through Immuta to access data through spark-submit jobs because the setting immuta.spark.acl.assume.not.privileged="true" is used.

    • There is an option of using the immuta.api.key setting with an Immuta API key generated on the Immuta profile page.

    • Currently when an API key is generated it invalidates the previous key. This can cause issues if a user is using multiple clusters in parallel, since each cluster will generate a new API key for that Immuta user. To avoid these issues, manually generate the API key in Immuta and set the immuta.api.key on all the clusters or use a specified job user for the submit job.

    Databricks Lakebase databasearrow-up-right

    hashtag
    Permissions

    The user registering the connection must have the permissions below.

    • APPLICATION_ADMIN Immuta permission

    • The account credentials you provide to register the connection should be a Databricks service principalarrow-up-right and it must have these Databricks Lakebase privileges:

      • databricks_superuser

      • CREATEROLE

      For descriptions and explanations of privileges Immuta needs to enforce policies and maintain state in Databricks Lakebase, see the .

    hashtag
    Register a Databricks Lakebase connection

    1. In Immuta, click Data and select Connections in the navigation menu.

    2. Click the + Add Connection button.

    3. Select the Databricks Lakebase tile.

    4. Enter the host connection information:

      1. Display Name: This is the name of your new connection. This name will be used in the API (connectionKey), in data source names from the host, and on the connections page.

      2. Hostname

      3. Port

      4. Database: This should be the PostgreSQL dbname in the Databricks Lakebase connection details.

    5. Enter privileged credentials to register the connection using OAuth M2M:

      1. Follow for the Immuta service principal and assign this service principal the for the Databricks Lakebase.

      2. Fill out the Workspace URL (e.g., https://<your workspace name>.cloud.databricks.com).

    6. Click Save Connection.

    hashtag
    Map users

    Requirement: USER_ADMIN Immuta permission

    Map PostgreSQL usernames to each Immuta user account to ensure Immuta properly enforces policies when the user queries the Databricks Lakebase objects in PostgreSQL.

    The instructions below illustrate how to do this for individual users, but you can also configure user mapping in your IAM connection on the app settings page.

    1. Click People and select Users in the navigation menu.

    2. Click the user's name to navigate to their page and scroll to the External User Mapping section.

    3. Click Edit in the PostgreSQL row.

    4. Select one of the following options from the dropdown:

      1. Select PostgreSQL Username to map the PostgreSQL username to the Immuta user and enter the PostgreSQL username in the field. Username mapping is case insensitive.

      2. Select Unset (fallback to Immuta username) to use the Immuta username as the assumed PostgreSQL username. Use this option if the user's PostgreSQL username exactly matches the user's Immuta username. Username mapping is case insensitive.

      3. Select None (user does not exist in PostgreSQL) if this is an Immuta-only user. This option will improve performance for Immuta users who do not have a mapping to PostgreSQL users and will be automatically selected by Immuta if an Immuta user is not found in PostgreSQL. To ensure your PostgreSQL users have policies correctly applied, manually map their usernames using the first option above.

    5. Click Save.

    hashtag
    Requirement
    • Amazon RDS for MariaDB

    hashtag
    Permissions

    The user registering the connection must have the permissions below.

    • APPLICATION_ADMIN Immuta permission

    • The MariaDB user setting up the connection must be the root user or have the GRANT OPTION MariaDB privilege.

    hashtag
    Create a database user account

    1. Create a new database user in MariaDB to serve as the Immuta system account. Immuta will use this system account continuously to crawl the database you register. How you create this user depends on your database authentication methodarrow-up-right. Follow the instructions linked below to create this user:

      1. Password authenticationarrow-up-right: Follow the MariaDB documentationarrow-up-right to create the database user in and assign that user a password.

      2. IAM database authenticationarrow-up-right:

        1. .

        2. .

    2. . A sample command that provides all these privileges to all databases and views is provided below:

      1. SHOW DATABASES on all databases in the server

      2. SELECT on all databases, tables, and views in the server

    hashtag
    Register a MariaDB connection

    1. In Immuta, click Data and select Connections in the navigation menu.

    2. Click the + Add Connection button.

    3. Select the MariaDB tile.

    4. Select RDS as the deployment method.

    5. Enter the host connection information:

      1. Display Name: This is the name of your new connection. This name will be used in the API (connectionKey), in data source names from the host, and on the connections page.

      2. Hostname: URL of your MariaDB instance.

      3. Port: Port configured with MariaDB.

    6. Select an authentication method from the dropdown menu.

      1. AWS Access Key: Provide the access key ID and secret access key for the .

      2. AWS Assumed Role (recommended): Immuta will assume this IAM role from Immuta's AWS account to request temporary credentials that it can use to perform operations in the registered MariaDB database. Before proceeding, contact your Immuta representative and provide your service principal's IAM role. Immuta will allowlist the service principal so that Immuta can successfully assume that role. Your Immuta representative will provide the account to add to your trust relationship. Then, complete the steps below.

    7. Click Save connection.

    Immuta webhooks
    • CREATE DATABASE ON ACCOUNT WITH GRANT OPTION

    • CREATE ROLE ON ACCOUNT WITH GRANT OPTION

    • CREATE USER ON ACCOUNT WITH GRANT OPTION

    • MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION

    • APPLY MASKING POLICY ON ACCOUNT WITH GRANT OPTION

    • APPLY ROW ACCESS POLICY ON ACCOUNT WITH GRANT OPTION

    hashtag
    Prerequisites

    No Snowflake integration configured in Immuta. If your Snowflake integration is already configured on the app settings page, follow the Use the connection upgrade manager guide.

    hashtag
    Set up the Immuta system account

    Complete the following actions in Snowflake:

    1. Create a new user in Snowflake to be the Immuta system accountarrow-up-right. Immuta will use this system account continuously to orchestrate Snowflake policies and maintain state between Immuta and Snowflake.

    2. Create a Snowflake rolearrow-up-right with a minimum of the following privileges:

      • USAGE on all databases and schemas with registered data sources.

      • REFERENCES on all tables and views registered in Immuta.

      • SELECT on all tables and views registered in Immuta.

    3. to the system account you just created.

    hashtag
    Register a connection

    To register a Snowflake connection, follow the instructions below.

    1. Click Data and select the Connections tab in the navigation menu.

    2. Click the + Add Connection button.

    3. Select the Snowflake data platform tile.

    4. Enter the connection information:

      • Host: The URL of your Snowflake account.

      • Port: Your Snowflake port.

      • Warehouse: The warehouse the Immuta system account user will use to run queries and perform Snowflake operations.

      • Immuta Database: The new, empty database for Immuta to manage. This is where system views, user entitlements, row access policies, column-level policies, procedures, and functions managed by Immuta will be created and stored.

      • Display Name: The display name represents the unique name of your connection and will be used as prefix in the name for all data objects associated with this connection. It will also appear as the display name in the UI and will be used in all API calls made to update or delete the connection.

    5. Click Next.

    6. Select an authentication method from the dropdown menu and enter the authentication information for the . Enter the Role with the , then continue to enter the authentication information:

      1. Username and password (): Choose one of the following options.

        1. Select Immuta Generated to have Immuta populate the system account name and password.

    7. Copy the provided script and run it in Snowflake as a user with the privileges listed in the requirements section. Running this script grants the following privileges to the Immuta system account:

      1. CREATE ROLE ON ACCOUNT WITH GRANT OPTION

      2. APPLY MASKING POLICY ON ACCOUNT WITH GRANT OPTION

    8. Click Test Connection.

    9. If the connection is successful, click Next. If there are any errors, check the connection details and credentials to ensure they are correct and try again.

    10. Ensure all the details are correct in the summary and click Complete Setup.

    hashtag
    Requirements
    • Amazon RDS or Amazon Aurora for MySQL

    hashtag
    Permissions

    The user registering the connection must have the permissions below.

    • APPLICATION_ADMIN Immuta permission

    • The MySQL user registering the connection must be the root user or have the GRANT OPTION MySQL privilege.

    hashtag
    Create a database user account

    1. Create a new database user in MySQL to serve as the Immuta system account. Immuta will use this system account continuously to crawl the database you register. How you create this user depends on your database authentication methodarrow-up-right. Follow the instructions linked below to create this user:

      1. Password authenticationarrow-up-right: Follow the MySQL documentationarrow-up-right to create the database user in and assign that user a password.

      2. IAM database authenticationarrow-up-right:

        1. .

        2. .

    2. . A sample command that provides all these privileges to all databases and views is provided below:

      1. SHOW DATABASES on all databases in the server

      2. SELECT on all databases, tables, and views in the server

    hashtag
    Register a MySQL connection

    1. In Immuta, click Data and select Connections in the navigation menu.

    2. Click the + Add Connection button.

    3. Select the MySQL tile.

    4. Select your deployment type:

      1. Aurora

      2. RDS

    5. Enter the host connection information:

      1. Display Name: This is the name of your new connection. This name will be used in the API (connectionKey), in data source names from the host, and on the connections page.

      2. Hostname: The URL of your MySQL instance.

    6. Select an authentication method from the dropdown menu.

      1. AWS Access Key: Provide the access key ID and secret access key for the .

      2. AWS Assumed Role (recommended): Immuta will assume this IAM role from Immuta's AWS account to request temporary credentials that it can use to perform operations in the registered MySQL database. Before proceeding, contact your Immuta representative and provide your service principal's IAM role. Immuta will allowlist the service principal so that Immuta can successfully assume that role. Your Immuta representative will provide the account to add to your trust relationship. Then, complete the steps below.

    7. Click Save connection.

    Immuta webhooks

    MariaDB Integration Reference Guide

    circle-info

    Immuta policies will not be automatically enforced in MariaDB

    While you can author and apply subscription and data policies on MariaDB data sources within Immuta, these policies will not be enforced natively in the MariaDB platform. You can use Immuta webhooks to be notified about changes to user access and make appropriate access updates in MariaDB using your own process.

    To use this integration, contact your Immuta representative.

    The MariaDB integration allows you to register data from MariaDB in Immuta. Immuta supports MariaDB on Amazon RDS.

    hashtag
    What does Immuta do in my environment?

    hashtag
    Registering a connection

    MariaDB is configured and data is registered through , an Immuta feature that allows you to register your data objects through a single connection to make data registration more scalable for your organization. Instead of registering schema and databases individually, you can register them all at once and allow Immuta to monitor your data platform for changes so that data sources are added and removed automatically to reflect the state of data in your data platform.

    When the , Immuta ingests and stores connection metadata in the Immuta metadata database. In the example below, the Immuta application administrator connects the database that contains marketing-data , research-data , and cs-data tables. Immuta registers these tables as data sources and stores the table metadata in the Immuta metadata database.

    Immuta presents a hierarchical view of your data that reflects the hierarchy of objects in MariaDB after registration is complete:

    • Host

    • Database

    • Data object

    Beyond making the registration of your data more intuitive, connections provides more control. Instead of performing operations on individual schemas or tables, you can perform operations (such as object sync) at the connection level.

    See the for details about connections and how to manage them. To configure your MariaDB integration and register data, see the .

    hashtag
    Required MariaDB privileges

    The privileges that the MariaDB integration requires align to the least privilege security principle. The table below describes each privilege required by the setup user and the IMMUTA_SYSTEM_ACCOUNT user.

    MariaDB privilege
    User requiring the privilege
    Explanation

    hashtag
    Maintaining state with MariaDB

    The following user actions spur various processes in the MariaDB integration so that Immuta data remains synchronous with data in MariaDB:

    • Data source created or updated: Immuta registers data source metadata and stores that metadata in the Immuta metadata database.

    • Data source deleted: Immuta deletes the data source metadata from the metadata database and removes subscription policies from that table.

    hashtag
    Supported object types

    Object type
    Subscription policy support
    Data policy support
    Marketplace support

    hashtag
    Immuta policies

    Immuta will not apply policies in this integration.

    hashtag
    Security and compliance

    hashtag
    Authentication methods

    The MariaDB integration supports the following authentication methods to register a connection:

    • Access using AWS IAM role (recommended): Immuta will assume this IAM role from Immuta's AWS account to request temporary credentials that it can use to perform operations in the registered MariaDB database. This option allows you to provide Immuta with an IAM role from your AWS account that is granted a trust relationship with Immuta's IAM role.

    • Access using access key and secret access key: These credentials are used by Immuta to register the connection and maintain state between Immuta and MariaDB. The access key ID and secret access key provided must be for an AWS account with the privileges listed in the .

    hashtag
    Limitations and known issues

    The following Immuta features are unsupported:

    • Subscription and data policies

    • Identification

    • Tag ingestion

    • Query audit

    Oracle Integration Reference Guide

    circle-info

    Immuta policies will not be automatically enforced in Oracle

    While you can author and apply subscription and data policies on Oracle data sources within Immuta, these policies will not be enforced natively in the Oracle platform. You can use Immuta webhooks to be notified about changes to user access and make appropriate access updates in Oracle using your own process.

    To use this integration, contact your Immuta representative.

    The Oracle integration allows you to register data from Oracle in Immuta. Immuta supports Oracle on Amazon RDS.

    hashtag
    What does Immuta do in my environment?

    hashtag
    Registering a connection

    Oracle is configured and data is registered through , an Immuta feature that allows you to register your data objects through a single connection to make data registration more scalable for your organization. Instead of registering schema and databases individually, you can register them all at once and allow Immuta to monitor your data platform for changes so that data sources are added and removed automatically to reflect the state of data in your data platform.

    When the , Immuta ingests and stores connection metadata in the Immuta metadata database. In the example below, the Immuta application administrator connects the database that contains marketing-data , research-data , and cs-data tables. Immuta registers these tables as data sources and stores the table metadata in the Immuta metadata database.

    Immuta presents a hierarchical view of your data that reflects the hierarchy of objects in Oracle after registration is complete:

    • Host

    • Database

    • Schema

    • Table

    Beyond making the registration of your data more intuitive, connections provides more control. Instead of performing operations on individual schemas or tables, you can perform operations (such as object sync) at the connection level.

    See the for details about connections and how to manage them. To configure your Oracle integration and register data, see the .

    hashtag
    Required Oracle privileges

    The privileges that the Oracle integration requires align to the least privilege security principle. The table below describes each privilege required by the setup user and the IMMUTA_SYSTEM_ACCOUNT user.

    Oracle privilege
    User requiring the privilege
    Explanation

    hashtag
    Maintaining state with Oracle

    The following user actions spur various processes in the Oracle integration so that Immuta data remains synchronous with data in Oracle:

    • Data source created or updated: Immuta registers data source metadata and stores that metadata in the Immuta metadata database.

    • Data source deleted: Immuta deletes the data source metadata from the metadata database and removes subscription policies from that table.

    hashtag
    Supported object types

    While you can author and apply subscription and data policies on Oracle data sources in Immuta, these policies will not be enforced natively in the Oracle platform.

    Object type
    Subscription policy support
    Data policy support
    Marketplace support

    hashtag
    Immuta policies

    Immuta will not apply policies in this integration.

    hashtag
    Security and compliance

    hashtag
    Authentication method

    The Oracle integration supports username and password authentication to register a connection. The credentials provided must be for an account with the permissions listed in the .

    hashtag
    Limitations and known issues

    The following Immuta features are unsupported:

    • Subscription and data policies

    • Tag ingestion

    • Query audit

    Databricks Lakebase Integration

    circle-info

    Public preview: This feature is available to all accounts. Contact your Immuta representative for details.

    The Databricks Lakebase integration registers data from Databricks Lakebase in Immuta and enforces subscription policies on that data when queried in PostgreSQL. The sequence diagram below outlines the events that occur when an Immuta user who is subscribed to a data source queries that data.

    hashtag
    What does Immuta do in my environment?

    hashtag
    Registering a connection

    Databricks Lakebase is configured and data is registered through , an Immuta feature that allows you to register your data objects through a single connection to make data registration more scalable for your organization. Instead of registering schema and catalogs individually, you can register them all at once and allow Immuta to monitor your data platform for changes so that data sources are added and removed automatically to reflect the state of data in your data platform.

    During connection registration, you provide Immuta credentials with the . When the connection is registered, Immuta ingests and stores connection metadata in the Immuta metadata database.

    In the example below, the Immuta application administrator connects the database that contains marketing-data , research-data , and cs-data tables. Immuta registers these tables as data sources and stores the table metadata in the Immuta metadata database.

    Immuta presents a hierarchical view of your data that reflects the objects in PostgreSQL hosted on Databricks Lakebase after registration is complete:

    • Lakebase database

    • Database

    • Schema

    • Table

    Beyond making the registration of your data more intuitive, connections provides more control. Instead of performing operations on individual schemas or tables, you can perform operations (such as object sync) at the connection level.

    See the for details about connections and how to manage them. To configure your Databricks Lakebase connection, see the .

    hashtag
    Applying policies

    Immuta enforces read and write subscription policies on Databricks Lakebase tables by issuing SQL statements in PostgreSQL that grant and revoke access to tables according to the policy.

    When a user is subscribed to a table registered in Immuta,

    1. Immuta creates a role for that user in PostgreSQL, if one doesn't already exist.

    2. PostgreSQL stores that role in its internal system catalog.

    3. Immuta issues grants to that user's role in PostgreSQL to enforce policy. The provides an example of this policy enforcement.

    See the for details about the privileges granted to users when they are subscribed to a data source protected by a subscription policy.

    hashtag
    Databricks Lakebase privileges granted by Immuta

    Immuta grants access to Databricks Lakebase through PostgreSQL privileges. See the for details about the privileges granted to users when they are subscribed to a data source protected by a subscription policy.

    hashtag
    Required Databricks Lakebase privileges

    The privileges that the Databricks Lakebase integration requires align to the least privilege security principle. The table below describes each privilege required by the IMMUTA_SYSTEM_ACCOUNT user.

    Databricks Lakebase privilege
    Explanation

    hashtag
    Maintaining state with Databricks Lakebase

    The following user actions spur various processes in the Databricks Lakebase integration so that Immuta data remains synchronous with data in Databricks Lakebase:

    • Data source created: Immuta registers data source metadata and stores that metadata in the Immuta metadata database.

    • Data source deleted: Immuta deletes the data source metadata from the metadata database and removes subscription policies from that table.

    • : When a user account is mapped to Immuta, their metadata is stored in the metadata database.

    • User subscribed to a data source: When a user is added to a data source by a data owner or through a subscription policy, Immuta creates a role for that user (if a role for them does not already exist) and grants PostgreSQL privileges to their role.

    The database instance must be up and running for state to be maintained and object sync to successfully complete. If the database instance is stopped, object sync will fail.

    hashtag
    Supported object types

    Databricks Lakebase holds PostgreSQL objects. See the section for details about the PostgreSQL objects and policies that Immuta supports.

    hashtag
    Supported policies

    Immuta supports Databricks Lakebase policies through PostgreSQL. See the section for details about the policies that Immuta supports.

    hashtag
    Security and compliance

    hashtag
    Authentication method

    The Databricks Lakebase integration supports OAuth machine-to-machine (M2M) authentication to register a connection.

    The Databricks Lakebase connection authenticates as a Databricks identity and generates an OAuth token. Immuta then uses that token as a password when connecting to PostgreSQL. To enable secure, automated machine-to machine access to the database instance, the connection must obtain an OAuth token using a Databricks service principal. See the for more details.

    hashtag
    User registration and ID mapping

    The built-in Immuta IAM can be used as a complete solution for authentication and user entitlement. However, you can connect your existing identity management provider to Immuta to use that system for authentication and user entitlement instead. Each of the includes a set of configuration options that enable Immuta to communicate with the IAM system and map the users, permissions, groups, and attributes into Immuta.

    For policies to impact the right users, the user account in Immuta must be mapped to the user account in PostgreSQL. You can ensure these accounts are mapped correctly in the following ways:

    • : If usernames in PostgreSQL align with usernames in the external IAM and those accounts align with an IAM attribute, you can enter that IAM attribute on the app settings page to automatically map user IDs in Immuta to PostgreSQL.

    • : You can manually map user IDs for individual users.

    For guidance on connecting your IAM to Immuta, see the .

    hashtag
    Limitations and known issues

    The following Immuta features are unsupported:

    • Data policies

    • Impersonation

    • Tag ingestion

    • Query audit

    Register a Databricks Unity Catalog Connection

    hashtag
    Requirements

    • APPLICATION_ADMIN Immuta permission

    • The Databricks user registering the connection and running the script must have the following privileges:

    Configure a Databricks Spark Integration

    hashtag
    Permissions

    • APPLICATION_ADMIN Immuta permission

    • CAN MANAGE

    MySQL Integration Reference Guide

    circle-info

    Immuta policies will not be automatically enforced in MySQL

    While you can author and apply subscription and data policies on MySQL data sources within Immuta, these policies will not be enforced natively in the MySQL platform. You can use to be notified about changes to user access and make appropriate access updates in MySQL using your own process.

    To use this integration, contact your Immuta representative.

    The MySQL integration uses connections to register data from MySQL in Immuta. Immuta supports the following deployment methods:

     [
     "--conf","spark.driver.extraJavaOptions=-Djava.security.manager=com.immuta.security.ImmutaSecurityManager -Dimmuta.security.manager.classes.config=file:///databricks/immuta/allowedCallingClasses.json -Dimmuta.spark.encryption.fpe.class=com.immuta.spark.encryption.ff1.ImmutaFF1Service",
     "--conf","spark.executor.extraJavaOptions=-Djava.security.manager=com.immuta.security.ImmutaSecurityManager -Dimmuta.security.manager.classes.config=file:///databricks/immuta/allowedCallingClasses.json -Dimmuta.spark.encryption.fpe.class=com.immuta.spark.encryption.ff1.ImmutaFF1Service",
     "--conf","spark.databricks.repl.allowedLanguages=python,sql,scala,r",
     "dbfs:/path/to/script.R",
     "arg1", "arg2", "..."
     ]
    package com.example.job
    
    import java.net.URLClassLoader
    import java.io.File
    
    import org.apache.spark.sql.SparkSession
    
    object ImmutaSparkSubmitExample {
    def main(args: Array[String]): Unit = {
        val jarDir = new File("/databricks/immuta/jars/")
        val urls = jarDir.listFiles.map(_.toURI.toURL)
    
        // Configure a new ClassLoader which will load jars from the additional jars directory
        val cl = new URLClassLoader(urls)
        val jobClass = cl.loadClass(classOf[ImmutaSparkSubmitExample].getName)
        val job = jobClass.newInstance
        jobClass.getMethod("runJob").invoke(job)
    }
    }
    
    class ImmutaSparkSubmitExample {
    
    def getSparkSession(): SparkSession = {
        SparkSession.builder()
        .appName("Example Spark Submit")
        .enableHiveSupport()
        .config("immuta.spark.acl.assume.not.privileged", "true")
        .config("spark.hadoop.immuta.databricks.config.update.service.enabled", "false")
        .getOrCreate()
    }
    
    def runJob(): Unit = {
        val spark = getSparkSession
        try {
        val df = spark.table("immuta.<YOUR DATASOURCE>")
    
        // Run Immuta Spark queries...
    
        } finally {
        spark.stop()
        }
    }
    }
     [
     "--conf","spark.driver.extraJavaOptions=-Djava.security.manager=com.immuta.security.ImmutaSecurityManager -Dimmuta.security.manager.classes.config=file:///databricks/immuta/allowedCallingClasses.json -Dimmuta.spark.encryption.fpe.class=com.immuta.spark.encryption.ff1.ImmutaFF1Service",
     "--conf","spark.executor.extraJavaOptions=-Djava.security.manager=com.immuta.security.ImmutaSecurityManager -Dimmuta.security.manager.classes.config=file:///databricks/immuta/allowedCallingClasses.json -Dimmuta.spark.encryption.fpe.class=com.immuta.spark.encryption.ff1.ImmutaFF1Service",
     "--conf","spark.databricks.repl.allowedLanguages=python,sql,scala,r",
     "--class","org.youorg.package.MainClass",
     "dbfs:/path/to/code.jar",
     "arg1", "arg2", "..."
     ]
    supported version
    Spark environment variables
    supported version
    Select User Provided to enter your own name and password for the Immuta system account.
  • Snowflake External OAuth:

    1. Fill out the Token Endpoint, which is where the generated token is sent. It is also known as aud (audience) and iss (issuer).

    2. Fill out the Client ID, which is the subject of the generated token. It is also known as sub (subject).

    3. Opt to fill out the Resource field with a URI of the resource where the requested token will be used.

    4. Enter the x509 Certificate Thumbprint. This identifies the corresponding key to the token and is often abbreviated as x5t or is called kid (key identifier).

    5. Upload the PEM Certificate, which is the client certificate that is used to sign the authorization request.

  • Key Pair Authenticationarrow-up-right:

    1. Complete the Username field. This user must be assigned the public key in Snowflakearrow-up-right.

    2. If using an encrypted private key, enter the Private Key Password.

    3. Click Select a File, and upload the Snowflake private key pair file.

  • APPLY ROW ACCESS POLICY ON ACCOUNT WITH GRANT OPTION
  • MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION

  • Alternatively, you can grant the Immuta system account OWNERSHIP on the objects that Immuta will secure, instead of granting MANAGE GRANTS ON ACCOUNT. The current role that has OWNERSHIP on the securables will need to be granted to the Immuta system role. However, if granting OWNERSHIP instead of MANAGE GRANTS ON ACCOUNT, Immuta will not be able to manage the role that is granted to the account, so it is recommended to run the script as-is, without changes.

    Grant the new Snowflake rolearrow-up-right
    Immuta system account you created
    listed privileges
    Not recommendedarrow-up-right
  • Automatic subscription policy applied to or updated on a data source: Immuta calculates the users and data sources affected by the policy change and grants or revokes users' privileges on the PostgreSQL table. See the Protecting data page for details about this process.

  • Subscription policy deleted: Immuta revokes privileges from the affected roles.

  • User removed from a data source: Immuta revokes privileges from the user's role.

  • Subscription policies on partitioned tables

    databricks_superuser

    This privilege is required so that Immuta can create and grant permissions to PostgreSQL roles.

    CREATEROLE

    Because privileges are granted to roles, this privilege is required so that Immuta can create PostgreSQL roles and manage role membership to enforce access controls for Databricks Lakebase objects.

    connections
    privileges outlined on the Register a Databricks Lakebase connection page
    Connections reference guide
    Register a Databricks Lakebase connection guide
    Protecting data page
    Subscription policy access types page
    granting PostgreSQL privileges section on the Subscription policy access types page
    User account is mapped to Immuta
    PostgreSQL supported object types and policies
    PostgreSQL supported policies
    Databricks OAuth machine-to-machine (M2M) authentication pagearrow-up-right
    supported IAM protocols
    Automatically
    Manually
    how-to guide for your protocol

    ✅

    GRANT ANY ROLE or GRANT ANY PRIVILEGE system privilege

    Setup user

    This privilege allows the user registering the connection to assign the SELECT_CATALOG_ROLE or SELECT privileges to the Immuta system account so that it can register and manage the connection.

    SELECT on all the system views listed below:

    • V$DATABASE

    • CDB_PDBS

    • SYS.DBA_USERS

    • SYS.DBA_TABLES

    • SYS.DBA_VIEWS

    • SYS.DBA_MVIEWS

    • SYS.DBA_TAB_COLUMNS

    • SYS.DBA_OBJECTS

    • SYS.DBA_CONSTRAINTS

    • SYS.DBA_CONS_COLUMNS

    Immuta system account

    This privilege provides access to all the Oracle system views necessary to register the connection and maintain state between the Oracle database and Immuta.

    Tables

    ❌

    ❌

    ✅

    Views

    ❌

    ❌

    ✅

    Materialized views

    ❌

    connections
    connection is registered
    Connections reference guide
    Register an Oracle connection guide
    Register an Oracle connection guide

    ❌

    Fill out the
    Client ID
    . This is a combination of letters, numbers, or symbols, used as a public identifier and is the
    .
  • Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.

  • Databricks Lakebase connection reference guide
    Databricks documentation to create an OAuth token for machine-to-machine authenticationarrow-up-right
    privileges listed above
    client ID displayed in Databricks when creating the client secret for the service principalarrow-up-right
    Username and password:
    These credentials are used by Immuta to register the connection and maintain state between Immuta and MariaDB. The credentials provided must be for a
    .

    Root user or GRANT OPTION privilege

    Setup user

    This privilege is required so that the setup user can grant privileges to the Immuta system account.

    SHOW DATABASES on all databases in the server

    Immuta system account

    This privilege allows the Immuta system account to discover new databases to keep data in MariaDB and Immuta in sync.

    SHOW VIEW on all views in the server

    Immuta system account

    This privilege allows the Immuta system account to access view definitions.

    SELECT on all databases, tables, and views in the server

    Immuta system account

    Base tables

    ❌

    ❌

    ✅

    Views

    ❌

    ❌

    ✅

    connections
    connection is registered
    Connections reference guide
    Register a MariaDB connection guide
    Register a MariaDB connection guide

    This privilege allows the Immuta system account to connect to MariaDB and register the databases and their objects.

    MariaDB user account with the privileges listed in the Register a MariaDB connection guide
    SHOW VIEW
    on all views in the server

    Region: The region of the AWS account with your MariaDB instance.

    1. Enter the Role ARN of the database account you created above.

    2. Set the external ID provided in a condition on the trust relationship for the role specified above. See the AWS documentationarrow-up-right for guidance.

  • Username and Password: Enter the credentials for the MariaDB database user account you created above.

  • Create the database user in MariaDB.arrow-up-right
    Create an IAM policy for IAM database authenticationarrow-up-right
    Create the database accountarrow-up-right
    Grant this account the following MariaDB privilegesarrow-up-right
    database account you created above
    SHOW VIEW
    on all views in the server
    Port
    : The port configured for MySQL.
  • Region: The region of the AWS account with your MySQL instance.

    1. Enter the Username of the database user account you created above

    2. Enter the Role ARN of the database user account you created above.

    3. Set the External ID provided in a condition on the trust relationship for the role specified above. See the AWS documentationarrow-up-right for guidance.

  • Username and Password: Enter the credentials for the MySQL database user account you created above.

  • Create the database user in MySQL.arrow-up-right
    Create an IAM policy for IAM database authenticationarrow-up-right
    Create the database accountarrow-up-right
    Grant this account the following MySQL privilegesarrow-up-right
    database user account you created above
    GRANT SELECT, SHOW DATABASES, SHOW VIEW ON *.* TO '<user>'@'%';
    GRANT SELECT, SHOW DATABASES, SHOW VIEW ON *.* TO '<user>'@'%';
    • Metastore admin and account admin

    • CREATE CATALOG privilege on the Unity Catalog metastore to create an Immuta-owned catalog and tables

    See the Databricks documentationarrow-up-right for more details about Unity Catalog privileges and securable objects.

    hashtag
    Prerequisites

    • Unity Catalog metastore createdarrow-up-right and attached to a Databricks workspace.

    • Unity Catalog enabled on your Databricks cluster or SQL warehouse. All SQL warehouses have Unity Catalog enabled if your workspace is attached to a Unity Catalog metastore. Immuta recommends linking a SQL warehouse to your Immuta tenant rather than a cluster for both performance and availability reasons.

    hashtag
    Create the Databricks service principal

    In Databricks, create a service principalarrow-up-right with the privileges listed below. Immuta uses this service principal continuously to orchestrate Unity Catalog policies and maintain state between Immuta and Databricks.

    • USE CATALOG and MANAGE on all catalogs containing securables you want registered as Immuta data sources.

    • USE SCHEMA on all schemas containing securables you want registered as Immuta data sources.

    • MODIFY and SELECT on all securables you want registered as Immuta data sources. The MODIFY privilege is not required for materialized views registered as Immuta data sources, since MODIFY is not a supported privilege on that object type in .

    circle-info

    MANAGE and MODIFY are required so that the service principal can apply row filters and column masks on the securable; to do so, the service principal must also have SELECT on the securable as well as USE CATALOG on its parent catalog and USE SCHEMA on its parent schema. Since privileges are inherited, you can grant the service principal the MODIFY and SELECT privilege on all catalogs or schemas containing Immuta data sources, which automatically grants the service principal the MODIFY and SELECT privilege on all current and future securables in the catalog or schema. The service principal also inherits MANAGE from the parent catalog for the purpose of applying row filters and column masks, but that privilege must be set directly on the parent catalog in order for grants to be fully applied.

    See the Databricks documentationarrow-up-right for more details about Unity Catalog privileges and securable objects.

    hashtag
    Set up query audit

    circle-info

    Audit is enabled by default on all Databricks Unity Catalog connections. If you need to turn audit off, create the connection with the connections API and set audit to false in the payload.

    Grant the service principal access to the Databricks Unity Catalog system tablesarrow-up-right. For Databricks Unity Catalog audit to work, Immuta must have, at minimum, the following access.

    • USE CATALOG on the system catalog

    • USE SCHEMA on the system.access and system.query schemas

    • SELECT on the following system tables:

      • system.access.table_lineage

      • system.access.column_lineage

      • system.access.audit

      Access to system tables is governed by Unity Catalog. No user has access to these system schemas by default. To grant access, a user that is both a metastore admin and an account admin must grant USE_SCHEMA and SELECT privileges on the system schemas to the service principal. See .

    hashtag
    Register a connection

    circle-exclamation

    Create a separate Immuta catalog for each Immuta tenant

    If multiple Immuta tenants are connected to your Databricks environment, create a separate Immuta catalog for each of those tenants. Having multiple Immuta tenants use the same Immuta catalog causes failures in policy enforcement.

    1. Click Data and select the Connections tab in the navigation menu.

    2. Click the + Add Connection button.

    3. Select the Databricks data platform tile.

    4. Enter the connection information:

      • Host: The hostname of your Databricks workspace.

      • Port: Your Databricks port.

      • HTTP Path: The HTTP path of your Databricks cluster or SQL warehouse.

      • Immuta Catalog: The name of the catalog Immuta will create to store internal entitlements and other user data specific to Immuta. This catalog will only be readable for the Immuta service principal and should not be granted to other users. The catalog name may only contain letters, numbers, and underscores and cannot start with a number.

      • Display Name: The display name represents the unique name of your connection and will be used as prefix in the name for all data objects associated with this connection. It will also appear as the display name in the UI and will be used in all API calls made to update or delete the connection.

    5. Click Next.

    6. Select your authentication method from the dropdown:

      • Access Token: Enter the Access Token in the Immuta System Account Credentials section. This is the access token for the Immuta service principal, which can be . This service principal must have the metastore for the metastore associated with the Databricks workspace. If this token is configured to expire, update this field regularly for the connection to continue to function. This authentication information will be included in the script populated later on the page.

      • OAuth M2M:

    7. Copy the provided script and run it in Databricks as a user with the .

    8. Click Validate Connection.

    9. If the connection is successful, click Next. If there are any errors, check the connection details and credentials to ensure they are correct and try again.

    10. Ensure all the details are correct in the summary and click Complete Setup.

    circle-exclamation

    Databricks Unity Catalog behavior

    If you register a connection and a data object has no subscription policy set on it, Immuta will REVOKE access to the data in Databricks for all Immuta users, even if they had been directly granted access to the table in Unity Catalog.

    If you disable a Unity Catalog data source in Immuta, all existing grants and policies on that object will be removed in Databricks for all Immuta users. All existing grants and policies will be removed, regardless of whether they were set in Immuta or in Unity Catalog directly.

    If a user is not registered in Immuta, Immuta will have no effect on that user's access to data in Unity Catalog.

    See the for more details about permissions Immuta revokes and how to configure this behavior for your connection.

    Databricks privilege on the cluster

    hashtag
    Requirements

    • A Databricks workspace with the Premium tier, which includes cluster policies (required to configure the Spark integration)

    • A cluster that uses one of these supported Databricks Runtimes:

      • 11.3 LTS

      • 14.3 LTS

    • Supported languages

      • Python

      • R (not supported for Databricks Runtime 14.3 LTS)

      • Scala (not supported for Databricks Runtime 14.3 LTS)

      • SQL

    • A Databricks cluster that is one of these supported compute types:

    • Custom access mode

    • A Databricks workspace and cluster with the ability to directly make HTTP calls to the Immuta web service. The Immuta web service also must be able to connect to and perform queries on the Databricks cluster, and to call .

    hashtag
    Prerequisites

    • Enable OAuth M2M authentication arrow-up-right(recommended) or ​personal access tokensarrow-up-right.

    • Disable Photonarrow-up-right by setting runtime_engine to STANDARD using the Clusters APIarrow-up-right. Immuta does not support clusters with Photon enabled. Photon is enabled by default on compute running Databricks Runtime 9.1 LTS or newer and must be manually disabled before setting up the integration with Immuta.

    • Restrict the set of Databricks principals who have CAN MANAGE privileges on Databricks clustersarrow-up-right where the Spark plugin is installed. This is to prevent editing , editing cluster policies, or removing the Spark plugin from the cluster, all of which would cause the Spark plugin to stop working.

    • If Databricks Unity Catalog is enabled in a Databricks workspace, you must use an Immuta cluster policy when you set up the Databricks Spark integration to create an Immuta-enabled cluster. See the section below for guidance.

    • If Databricks Unity Catalog is not enabled in your Databricks workspace, you must disable Unity Catalog in your Immuta tenant before proceeding with your configuration of Databricks Spark:

      1. Navigate to the App Settings page and click Integration Settings.

      2. Uncheck the Enable Unity Catalog checkbox.

      3. Click

    hashtag
    Add the integration on the app settings page

    1. Click the App Settings icon in Immuta.

    2. Navigate to HDFS > System API Key and click Generate Key.

    3. Click Save and then Confirm. If you do not save and confirm, the system API key will not be saved.

    4. Scroll to the Integration Settings section.

    5. Click + Add Native Integration and select Databricks Spark Integration from the dropdown menu.

    6. Complete the Hostname field.

    7. Enter a Unique ID for the integration. The unique ID is used to name cluster policies clearly, which is important when managing several Databricks Spark integrations. As cluster policies are workspace-scoped, but multiple integrations might be made in one workspace, this ID lets you distinguish between different sets of cluster policies.

    8. Select the identity manager that should be used when mapping the current Spark user to their corresponding identity in Immuta from the Immuta IAM dropdown menu. This should be set to reflect the identity manager you use in Immuta (such as Entra ID or Okta).

    9. Choose an Access Model. The Protected until made available by policy option , whereas the Available until protected by policy option allows it.

    circle-exclamation

    Behavior change

    If a table is registered in Immuta and does not have a subscription policy applied to it, that data will be visible to users, even if the Protected until made available by policy setting is enabled.

    If you have enabled this setting, author an "Allow individually selected users" global subscription policy that applies to all data sources.

    1. Select the Storage Access Type from the dropdown menu.

    2. Opt to add any Additional Hadoop Configuration Files.

    3. Click Add Native Integration, and then click Save and Confirm. This will restart the application and save your Databricks Spark integration. (It is normal for this restart to take some time.)

    The Databricks Spark integration will not do anything until your cluster policies are configured, so even though your integration is saved, continue to the next section to configure your cluster policies so the Spark plugin can manage authorization on the Databricks cluster.

    hashtag
    Configure cluster policies

    1. Click Configure Cluster Policies.

    2. Select one or more cluster policies in the matrix. Clusters running Immuta with Databricks Runtime 14.3 can only use Python and SQL. You can make changes to the policy by clicking Additional Policy Changes and editing the environment variables in the text field or by downloading it. See the Spark environment variables reference guide for information about each variable and its default value. Some common settings are linked below:

      1. Audit all queries

      2. (you can also )

    3. Select your Databricks Runtime.

    4. Use one of the two installation types described below to apply the policies to your cluster:

      • Automatically push cluster policies: This option allows you to automatically push the cluster policies to the configured Databricks workspace. This will overwrite any cluster policy templates previously applied to this workspace.

        1. Select the Automatically Push Cluster Policies radio button.

    5. Click Close, and then click Save and Confirm.

    6. Apply the cluster policy generated by Immuta to the cluster with the Spark plugin installed by following the .

    hashtag
    Map users and grant them access to the cluster

    1. Map external user IDs from Databricks to Immuta.

    2. Give users the Can Attach To permission on the cluster.

    Amazon Aurora with MySQL

  • Amazon RDS with MySQL

  • hashtag
    What does Immuta do in my environment?

    hashtag
    Registering a connection

    MySQL is configured and data is registered through connections, an Immuta feature that allows you to register your data objects through a single connection to make data registration more scalable for your organization. Instead of registering schema and databases individually, you can register them all at once and allow Immuta to monitor your data platform for changes so that data sources are added and removed automatically to reflect the state of data in your data platform.

    When the connection is registered, Immuta ingests and stores connection metadata in the Immuta metadata database. In the example below, the Immuta application administrator connects the database that contains marketing-data , research-data , and cs-data tables. Immuta registers these tables as data sources and stores the table metadata in the Immuta metadata database.

    Immuta presents a hierarchical view of your data that reflects the hierarchy of objects in MySQL after registration is complete:

    • Host

    • Database

    • Data object

    Beyond making the registration of your data more intuitive, connections provides more control. Instead of performing operations on individual schemas or tables, you can perform operations (such as object sync) at the connection level.

    See the Connections reference guide for details about connections and how to manage them. To configure your MySQL connection, see the Register a MySQL connection guide.

    hashtag
    Required MySQL privileges

    The privileges that the MySQL integration requires align with the least privilege security principle. The table below describes each privilege required by the setup user and the IMMUTA_SYSTEM_ACCOUNT user.

    MySQL privilege
    User requiring the privilege
    Explanation

    Root user or GRANT OPTION privilege

    Setup user

    This privilege is required so that the setup user can grant privileges to the Immuta system account.

    SHOW DATABASES on all databases in the server

    Immuta system account

    This privilege allows the Immuta system account to discover new databases to keep data in MySQL and Immuta in sync.

    SHOW VIEW on all views in the server

    Immuta system account

    This privilege allows the Immuta system account to access view definitions.

    SELECT on all databases, tables, and views in the server

    Immuta system account

    hashtag
    Maintaining state with MySQL

    The following user actions spur various processes in the MySQL integration so that Immuta data remains synchronous with data in MySQL:

    • Data source created or updated: Immuta registers data source metadata and stores that metadata in the Immuta metadata database.

    • Data source deleted: Immuta deletes the data source metadata from the metadata database.

    hashtag
    Supported object types

    While you can author and apply subscription and data policies on MySQL data sources in Immuta, these policies will not be enforced natively in the MySQL platform.

    Object type
    Subscription policy support
    Data policy support
    Marketplace support

    Base tables

    ❌

    ❌

    ✅

    Views

    ❌

    ❌

    ✅

    hashtag
    Immuta policies

    Immuta will not apply policies in this integration.

    hashtag
    Security and compliance

    hashtag
    Authentication methods

    The MySQL integration supports the following authentication methods when registering a connection:

    • Access using AWS IAM role (recommended): Immuta will assume this IAM role from Immuta's AWS account to request temporary credentials that it can use to perform operations in the registered MySQL database. This option allows you to provide Immuta with an IAM role from your AWS account that is granted a trust relationship with Immuta's IAM role.

    • Access using access key and secret access key: These credentials are used by Immuta to register the connection and maintain state between Immuta and MySQL. The access key ID and secret access key provided must be for an AWS account with the privileges listed in the Register a MySQL connection guide.

    • Username and password: These credentials are used by Immuta to register the connection and maintain state between Immuta and MySQL. The credentials provided must be for a .

    hashtag
    Limitations and known issues

    The following Immuta features are unsupported:

    • Subscription and data policies

    • Identification

    • Tag ingestion

    • Query audit

    Immuta webhooks

    Register an AWS Lake Formation Connection

    circle-info

    Public preview: This feature is available to all accounts.

    hashtag
    Requirements

    • . The account in which this is set up is referred to as the admin account. This is the account that you will use to initially configure IAM and AWS Lake Formation permissions to give the Immuta service principal access to perform operations. The user in this account must be able to manage IAM permissions and Lake Formation permissions for all data in the Glue Data Catalog.

    • No AWS Lake Formation connections configured in the same Immuta instance for the same Glue Data Catalog.

    • The databases and tables you want Immuta to govern must be . Immuta cannot govern resources that use IAM access control or hybrid access mode. To ensure Immuta can govern your resources, verify that the default Data Catalog settings in AWS are unchecked. See the screenshot below and for instructions on changing these settings:

    • Enable AWS IAM Identity Center (IDC) (recommended): is the best approach for user provisioning because it treats users as users, not users as roles. Consequently, access controls are enforced for the querying user, nothing more. This approach eliminates over-provisioning and permits granular access control. Furthermore, IDC uses trusted identity propagation, meaning AWS propagates a user's identity wherever that user may operate within the AWS ecosystem. As a result, a user's identity always remains known and consistent as they navigate across AWS services, which is a key requirement for organizations to properly govern that user. Enabling IDC does not impact any existing access controls; it is additive. See the for instructions on mapping users from AWS IDC to user accounts in Immuta.

    hashtag
    Permissions

    These are permissions that the user registering the connection must have in order to successfully complete setup.

    • APPLICATION_ADMIN Immuta permission to register the connection

    • Create LF-Tag AWS permission

    • DESCRIBE AWS permission. You must have the DESCRIBE

    hashtag
    Set up the Immuta service principal

    The Immuta service principal is the AWS IAM role that Immuta will assume to perform operations in your AWS account. This role must have all the necessary permissions in AWS Glue and AWS Lake Formation to allow Immuta to register data sources and apply policies.

    1. Create an IAM policy with the following AWS Lake Formation and AWS Glue permissions. You will attach this to your service principal once created.

    1. and select AWS Account as the trusted entity type. This role will be used by Immuta to set up the connection and orchestrate AWS Lake Formation policies. Immuta will assume this IAM role from Immuta's AWS account in order to perform any operations in your AWS account.

    2. Add the IAM policy from step 1 to your service principal. These permissions will allow the service principal to register data sources and apply policies on Immuta's behalf.

    3. Add the service principal as an .

    This method follows the principle of least privilege and is the most flexible way of granting permissions to the service principal. LF-Tags cascade down from databases to tables, while allowing for exceptions. This means that when you apply this tag to a database, it will automatically apply to all tables within that database and allow you to remove it from any tables if those should be out of the scope of Immuta’s governance.

    1. Create a new LF-Tag, giving yourself permissions to grant that tag to a user, which will ultimately be your service principal.

      1. In the Lake Formation console, navigate to LF-Tags and permissions and click Add LF-Tag. You will need the Create LF-Tag permission to do this.

    hashtag
    Register an AWS Lake Formation connection

    1. Click Data and select Connections in the navigation menu.

    2. Click the + Add Connection button.

    3. Select the AWS Lake Formation tile.

    4. Enter the host connection information:

    hashtag
    Map users

    Requirement: USER_ADMIN Immuta permission

    Map AWS IAM principals to each Immuta user to ensure Immuta properly enforces policies.

    1. Click People and select Users in the navigation menu.

    2. Click the user's name to navigate to their page and scroll to the External User Mapping section.

    3. Click Edit in the AWS User row.

    4. Use the dropdown menu to select the

    See the for details about supported principals.

    Registering and Protecting Data

    In the Databricks Spark integration, Immuta installs an Immuta-maintained Spark plugin on your Databricks cluster. When a user queries data that has been registered in Immuta as a data source, the plugin injects policy logic into the plan Spark builds so that the results returned to the user only include data that specific user should see.

    The sequence diagram below breaks down this process of events when an Immuta user queries data in Databricks.

    Immuta intercepts Spark calls to the Metastore. Immuta then modifies the logical plan so that policies are applied to the data for the querying user.

    hashtag
    Registering data

    When data owners register Databricks securables in Immuta, the securable metadata is registered and Immuta creates a corresponding data source for those securables. The data source metadata is stored in the Immuta Metadata Database so that it can be referenced in policy definitions.

    The image below illustrates what happens when a data owner registers the Accounts, Claims, and Customers securables in Immuta.

    Users who are subscribed to the data source in Immuta can then query the corresponding securable directly in their Databricks notebook or workspace.

    hashtag
    Authentication methods

    See the for details about the authentication methods supported for registering data.

    hashtag
    Schema monitoring

    When schema monitoring is enabled, Immuta monitors your servers to detect when new tables or columns are created or deleted, and automatically registers (or disables) those tables in Immuta. These newly updated data sources will then have any global policies and tags that are set in Immuta applied to them. The Immuta data dictionary will be updated with any column changes, and the Immuta environment will be in sync with your data environment.

    For Databricks Spark, the automatic is disabled because of the . In this case, Immuta requires you to download a schema detection job template (a Python script) and import that into your Databricks workspace.

    See the for instructions on enabling schema monitoring.

    hashtag
    Ephemeral overrides

    In Immuta, a Databricks data source is considered ephemeral, meaning that the compute resources associated with that data source will not always be available.

    Ephemeral data sources allow the use of ephemeral overrides, user-specific connection parameter overrides that are applied to Immuta metadata operations.

    When a user runs a Spark job in Databricks, the Immuta plugin automatically submits ephemeral overrides for that user to Immuta. Consequently, subsequent metadata operations for that user will use the current cluster as compute.

    See the for more details about ephemeral overrides and how to configure or disable them.

    hashtag
    Ephemeral override requests

    The Spark plugin has the capability to send ephemeral override requests to Immuta. These requests are distinct from ephemeral overrides themselves. Ephemeral overrides cannot be turned off, but the Spark plugin can be configured to not send ephemeral override requests.

    hashtag
    Tag ingestion

    Tags can be used in Immuta in a variety of ways:

    • Use tags for global subscription or data policies that will apply to all data sources in the organization. In doing this, company-wide data security restrictions can be controlled by the administrators and governors, while the users and data owners need only to worry about tagging the data correctly.

    • Generate Immuta reports from tags for insider threat surveillance or data access monitoring.

    • Filter search results with tags in the Immuta UI.

    The Databricks Spark integration cannot ingest tags from Databricks, but you can connect any of these to work with your integration.

    You can also manage tags in Immuta by to your data sources and columns. Alternatively, you can use to automatically tag your sensitive data.

    hashtag
    Protecting data

    Immuta allows you to author subscription and data policies to automate access controls on your Databricks data.

    • Subscription policies: After registering data sources in Immuta, you can control who has access to specific securables in Databricks through Immuta subscription policies or by . Data users will only see the immuta database with no tables until they are granted access to those tables as Immuta data sources. See the for a list of policy types supported.

    • Data policies: You can create data policies to apply fine-grained access controls (such as restricting rows or masking columns) to manage what users can see in each table after they are subscribed to a data source. See the for details about specific types of data policies supported.

    The image below illustrates how Immuta enforces a subscription policy that only allows users in the Analysts group to access the yellow-table.

    See the for details about the benefits of using Immuta subscription and data policies.

    hashtag
    Policy enforcement in Databricks

    Once a Databricks user who is subscribed to the data source in Immuta directly in their workspace, Spark Analysis initiates and the following events take place:

    1. Spark calls down to the Metastore to get table metadata.

    2. Immuta intercepts the call to retrieve table metadata from the Metastore.

    3. Immuta modifies the Logical Plan to enforce policies that apply to that user.