1 of 41

Connect Your Data

Immuta integrates with your data platforms and external catalogs so you can register your data and effectively manage access controls on that data.

This section includes concept, reference, and how-to guides for registering and managing data sources.

Registering a connection

This section includes reference and how-to guides for configuring Immuta in order to manage data through a single connection between Immuta and your data platform.

Registering metadata

This section covers concepts related to registering your metadata with Immuta.

Registering a Connection

Connections allow you to register your data objects in a technology through a single connection, making data registration more scalable for your organization. Instead of registering schema and databases individually, you can register them all at once and allow Immuta to monitor your data platform for changes so that data sources are added and removed automatically to reflect the state of data on your data platform.

How-to guides

Register a Snowflake connection: Register a connection with a Snowflake account and register the data objects within it.
Register a Databricks Unity Catalog connection: Register a connection with a Databricks Unity Catalog metastore and register the data objects within it.
Run object sync on a connection or object: Trigger object sync manually for the entire connection or a single object to sync your remote data platform objects with Immuta.
Use the connection upgrade manager: Complete the upgrade path from the existing integrations and data sources to a connection.

Reference guides

Connections: This reference guide discusses the major concepts, design, and settings of connections.
Upgrading to a connection: This reference guide discusses the differences when upgrading from the existing integrations and data sources to a connection.

How-to Guides

Register a Snowflake Connection

Public preview

Connections allow you to register your data objects in a technology through a single connection, instead of registering data sources and an integration separately.

This feature is public preview. It is enabled by default on all tenants created post February 26, 2025 and available to select tenants created prior. Reach out to your Immuta support professional to enable it on your tenant.

Requirements

The following permissions and personas are used in the registration process:

Immuta permission: APPLICATION_ADMIN
Snowflake permissions for the user registering the connection and running the script:
- CREATE DATABASE ON ACCOUNT WITH GRANT OPTION
- CREATE ROLE ON ACCOUNT WITH GRANT OPTION
- CREATE USER ON ACCOUNT WITH GRANT OPTION
- MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION
- APPLY MASKING POLICY ON ACCOUNT WITH GRANT OPTION
- APPLY ROW ACCESS POLICY ON ACCOUNT WITH GRANT OPTION
- REFERENCES on all tables
- USAGE on the schema and database to register data sources
Snowflake permissions for the new Immuta system user that is created:
- APPLY MASKING POLICY ON ACCOUNT
- APPLY ROW ACCESS POLICY ON ACCOUNT
- Additional grants associated with the IMMUTA database

Prerequisite

No Snowflake integration configured in Immuta. If your Snowflake integration is already configured on the app settings page, follow the Use the connection upgrade manager guide.

Register a connection

To register a Snowflake connection, follow the instructions below.

Click Data and select the Connections tab in the navigation menu.
Click the + Add Connection button.
Select the Snowflake data platform tile.
Enter the connection information:
- Host: The URL of your Snowflake account.
- Port: Your Snowflake port.
- Warehouse: The warehouse the Immuta system account user will use to run queries and perform Snowflake operations.
- Immuta Database: The new, empty database for Immuta to manage. This is where system views, user entitlements, row access policies, column-level policies, procedures, and functions managed by Immuta will be created and stored.
- Role: The default Snowflake role for the Immuta system account user.
- Display Name: The display name represents the unique name of your connection and will be used as prefix in the name for all data objects associated with this connection. It will also appear as the display name in the UI and will be used in all API calls made to update or delete the connection.
Click Next.
Select an authentication method from the dropdown menu. This authentication information will be included in the script populated later on the page.
1. Username and password: Choose one of the following options.
  1. Select Immuta Generated to have Immuta populate the system account name and password.
  2. Select User Provided to enter your own name and password for the Immuta system account.
2. Snowflake External OAuth:
  1. Fill out the Token Endpoint, which is where the generated token is sent. It is also known as aud (audience) and iss (issuer).
  2. Fill out the Client ID, which is the subject of the generated token. It is also known as sub (subject).
  3. Opt to fill out the Resource field with a URI of the resource where the requested token will be used.
  4. Enter the x509 Certificate Thumbprint. This identifies the corresponding key to the token and is often abbreviated as x5t or is called kid (key identifier).
  5. Upload the PEM Certificate, which is the client certificate that is used to sign the authorization request.
3. Key Pair Authentication:
  1. Complete the Username field. This username will be used to connect to the remote database and retrieve records for this data source.
  2. If using a private key, enter the Private Key Password.
  3. Click Select a File, and upload a Snowflake key pair file.
The Role is prepopulated from the entry on the previous page.
Copy the provided script and run it in Snowflake with the following Snowflake permissions:
- CREATE DATABASE ON ACCOUNT WITH GRANT OPTION
- CREATE ROLE ON ACCOUNT WITH GRANT OPTION
- CREATE USER ON ACCOUNT WITH GRANT OPTION
- MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION
- APPLY MASKING POLICY ON ACCOUNT WITH GRANT OPTION
- APPLY ROW ACCESS POLICY ON ACCOUNT WITH GRANT OPTION
Click Test Connection.
If the connection is successful, click Next. If there are any errors, check the connection details and credentials to ensure they are correct and try again.
Ensure all the details are correct in the summary and click Complete Setup.

Register a Databricks Unity Catalog Connection

Public preview

Connections allow you to register your data objects in a technology through a single connection, instead of registering data sources and an integration separately.

Requirements

The following permissions and personas are used in the registration process.

An Immuta user with the APPLICATION_ADMIN Immuta permission must register the Databricks Unity Catalog connection.

A Databricks user authorized to create a Databricks service principal must create one for Immuta. This service principal is used continuously by Immuta to orchestrate Unity Catalog policies and maintain state between Immuta and Databricks. This service principal needs the following Databricks privileges:

USE CATALOG and MANAGE on all catalogs containing securables registered as Immuta data sources and USE SCHEMA on all schemas containing securables registered as Immuta data sources.
MODIFY and SELECT on all securables registered as Immuta data sources. MANAGE and MODIFY are required so that the service principal can apply row filters and column masks on the securable; to do so, the service principal must also have SELECT on the securable as well as USE CATALOG on its parent catalog and USE SCHEMA on its parent schema. Since privileges are inherited, you can grant the service principal the MODIFY and SELECT privilege on all catalogs or schemas containing Immuta data sources, which automatically grants the service principal the MODIFY and SELECT privilege on all current and future securables in the catalog or schema. The service principal also inherits MANAGE from the parent catalog for the purpose of applying row filters and column masks, but that privilege must be set directly on the parent catalog in order for grants to be fully applied.

See the Databricks documentation for more details about Unity Catalog privileges and securable objects.

Optionally, to include audit, the service principal needs the following additional privileges:

USE CATALOG on system catalog
USE SCHEMA on system.access schema
SELECT on system.access.audit table
SELECT on system.access.table_lineage table
SELECT on system.access.column_lineage table

Access to system tables is governed by Unity Catalog. No user has access to these system schemas by default. To grant access, a user that is both a metastore admin and an account admin must grant USE and SELECT permissions on the system schemas to the service principal. See Manage privileges in Unity Catalog. The system.access schema must also be enabled on the metastore before it can be used.

Prerequisites

Unity Catalog metastore created and attached to a Databricks workspace. See the Databricks Unity Catalog reference guide for information on workspaces and catalog isolation support with Immuta.
Unity Catalog enabled on your Databricks cluster or SQL warehouse. All SQL warehouses have Unity Catalog enabled if your workspace is attached to a Unity Catalog metastore. Immuta recommends linking a SQL warehouse to your Immuta tenant rather than a cluster for both performance and availability reasons.

Register a connection

Click Data and select the Connections tab in the navigation menu.
Click the + Add Connection button.
Select the Databricks data platform tile.
Enter the connection information:
- Host: The hostname of your Databricks workspace.
- Port: Your Databricks port.
- HTTP Path: The HTTP path of your Databricks cluster or SQL warehouse.
- Immuta Catalog: The name of the catalog Immuta will create to store internal entitlements and other user data specific to Immuta. This catalog will only be readable for the Immuta service principal and should not be granted to other users. The catalog name may only contain letters, numbers, and underscores and cannot start with a number.
- Display Name: The display name represents the unique name of your connection and will be used as prefix in the name for all data objects associated with this connection. It will also appear as the display name in the UI and will be used in all API calls made to update or delete the connection.
Click Next.
Select your authentication method from the dropdown:
- Access Token: Enter the Access Token in the Immuta System Account Credentials section. This is the access token for the Immuta service principal, which can be an on-behalf token created in Databricks. This service principal must have the metastore privileges listed in the requirements section for the metastore associated with the Databricks workspace. If this token is configured to expire, update this field regularly for the connection to continue to function. This authentication information will be included in the script populated later on the page.
- OAuth M2M:
  - AWS Databricks:
    Follow Databricks documentation to create a client secret for the Immuta service principal and assign this service principal the privileges listed above for the metastore associated with the Databricks workspace.
    Fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.cloud.databricks.com/oidc/v1/token.
    Fill out the Client ID. This is a combination of letters, numbers, or symbols, used as a public identifier and is the client ID displayed in Databricks when creating the client secret for the service principal.
    Enter the Scope (string). The scope limits the operations and roles allowed in Databricks by the access token. See the OAuth 2.0 documentation for details about scopes.
    Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
  - Azure Databricks:
    Follow Databricks documentation to create a service principal within Azure and then populate to your Databricks account and workspace.
    Assign this service principal the privileges listed above for the metastore associated with the Databricks workspace.
    Within Databricks, create an OAuth client secret for the service principal. This completes your Databricks-based service principal setup.
    Within Immuta, fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.azuredatabricks.net/oidc/v1/token.
    Fill out the Client ID. This is a combination of letters, numbers, or symbols, used as a public identifier and is the client ID displayed in Databricks when creating the client secret for the service principal (note that Azure Databricks uses the Azure SP Client ID; it will be identical).
    Enter the Scope (string). The scope limits the operations and roles allowed in Databricks by the access token. See the OAuth 2.0 documentation for details about scopes.
    Enter the Client Secret you created above. Immuta uses this secret to authenticate with the authorization server when it requests a token.
Copy the provided script and run it in Databricks as a user with the CREATE CATALOG privilege on the Unity Catalog metastore.
Click Validate Connection.
If the connection is successful, click Next. If there are any errors, check the connection details and credentials to ensure they are correct and try again.
Ensure all the details are correct in the summary and click Complete Setup.

Manually Run Object Sync

Requirement: GOVERNANCE or APPLICATION_ADMIN global permission or Infrastructure Admin or Data Owner within the hierarchy

Prerequisite: A connection for Snowflake or Databricks Unity Catalog

Run object sync on a connection

Click Data and select the Connections tab in the navigation menu.
Click the more actions menu for the connection you want and select Run Object Sync.
Opt to click the checkbox to Also scan all inactive objects.
Click Run Object Sync.

Run object sync on a database

Click Data and select the Connections tab in the navigation menu.
Select the connection.
Click the more actions menu in the Action column for the database you want to sync and select Run Object Sync.
Opt to click the checkbox to Also scan all inactive objects.
Click Run Object Sync.

Run object sync on a schema

Click Data and select the Connections tab in the navigation menu.
Select the connection.
Select the database.
Click the more actions menu in the Action column for the schema you want to sync and select Run Object Sync.
Opt to click the checkbox to Also scan all inactive objects.
Click Run Object Sync.

Run object sync on a data source

You can run object sync from the data source health check or from the connection,

Click Data and select the Connections tab in the navigation menu.
Select the connection.
Select the database.
Select the schema
Click the more actions menu in the Action column for the data object you want to sync and select Run Object Sync.
Opt to click the checkbox to Also scan all inactive objects.
Click Run Object Sync

Use the Connection Upgrade Manager

Public preview

This feature is public preview and available to select accounts. Reach out to your Immuta support professional to enable it on your tenant.

Supported technologies

Databricks Unity Catalog
Snowflake

Requirements

An integration enabled on the Immuta app settings page
Data sources registered
Immuta global GOVERNANCE and APPLICATION_ADMIN permissions

To complete your upgrade,

Select Upgrade Manager in the navigation menu. This tab will only be available if you have integrations ready for upgrade.
Click Start Upgrade.
Display Name: The display name represents the unique name of your connection and will be used as prefix in the name for all data objects associated with this connection. It will also appear as the display name in the UI and will be used in all API calls made to update or delete the connection.
Click Next.
Ensure Immuta has the correct credentials to connect to Databricks Unity Catalog or Snowflake. Select the tab below for more information:

Click Validate Credentials to ensure the access token can connect Immuta and Databricks Unity Catalog.

Create a Snowflake role with a minimum of the following permissions:
- USAGE on all databases and schemas with registered data sources
- REFERENCES on all tables and views registered in Immuta
Grant the new Snowflake role to the in your Snowflake environment.
Enter the new Snowflake role in the textbox.
Click Validate Credentials to ensure the role has been granted to the right user.

Click Next.
Click Upgrade Connection.
Click the link to the docs to understand the impacts of the upgrade.
Click the checkbox to confirm understanding of the upgrade effects, and click Yes, Upgrade Connection.

Troubleshooting

If you attempted the upgrade and receive the message that your upgrade is Partially Complete, find the un-upgraded data sources by navigating to the Upgrade Manager and clicking the number in the Available column for the relevant connection.

Use the options below to resolve those un-upgraded data sources in order to finish your upgrade. See the linked how-to's for more details on the actions to take.

Note that these un-upgraded data sources still exist and are still protected by policy.

Delete the remaining data sources: The easiest solution is to delete the data sources that did not upgrade. Note that disabled data sources that no longer exist in your data platform will never be upgraded. Only do this if you no longer need these data sources in Immuta.
Adjust the privileges of the system user used to connect Immuta and your data platform: Ensure that the Immuta system user can also access all remaining un-upgraded data sources in your data platform.
1. Expand permissions in Snowflake or Databricks (recommended): Extend the Immuta system user's permissions in your data platform by granting it access to all remaining un-upgraded data sources.
2. Change the system user credentials used by Immuta: You can also provide Immuta with a different set of credentials that already have the required permissions on the un-upgraded data sources.

Required Snowflake permissions

Ensure that has the required permissions to register a Snowflake connection and has been granted to the .

Required Databricks Unity Catalog permissions

Ensure the Databricks service principal you created and connected with Immuta has the required permissions to register a Databricks Unity Catalog connection.

Delete the data sources

Schema monitoring must be turned off in the schema project to disable and delete data sources that did not upgrade.

View the data sources that were not upgraded

Find the un-upgraded data sources by navigating to the Upgrade Manager and clicking the number in the Available column.

Disable the data sources

From this data source list page, disable all the data sources to delete.

Check the top checkbox in the data source list table. Deselect the checkbox for any data sources you do not want to delete.
Click More Actions.
Click Disable and then Confirm.

Delete the data sources

From this data source list page, delete the data sources.

Check the top checkbox in the data source list table. Deselect the checkbox for any data sources you do not want to delete.
Click More Actions.
Click Disable and then Confirm.

Finalize the upgrade

Once the un-upgraded data sources are deleted, you should be able to complete the upgrade.

Navigate to the Upgrade Manager.
Click Finalize.

Expand permissions in Snowflake

Check your role permissions

To find the role you specified, do the following in the Immuta UI:

Navigate to Connections.
Select the connection you are trying to upgrade.
Navigate to the Connections tab.
See the Role.

Now, ensure that role has the required permissions for each data source that was not successfully upgraded. Add the permissions where needed.

Grant your role to the system account

To find the system account you specified, do the following in the Immuta UI:

Navigate to Connections.
Select the connection you are trying to upgrade.
Navigate to the Connections tab.
See the Setup: Username.

Now, in Snowflake, grant the role to the system account:

GRANT ROLE <name> TO USER <user_name>;

Run object sync

Navigate to Connections.
Click on the more actions menu for the connection you are trying to upgrade.
Select Run Object Sync.
Click the checkbox to Also scan all inactive objects.
Click Run Object Sync.

Now, navigate back to the Upgrade Manager tab, and if all your data sources are successfully upgraded, finalize the upgrade.

Finalize the upgrade

Once the un-upgraded data sources are resolved, you can complete the upgrade.

Navigate to the Upgrade Manager.
Click Finalize.

Expand permissions in Databricks Unity Catalog

Check your service principal privileges

To find the service principal you specified, do the following in the Immuta UI:

Navigate to Connections.
Select the connection you are trying to upgrade.
Navigate to the Connections tab.

Now, ensure that service principal has the required privileges for each data source that was not successfully upgraded. Add the privileges where needed.

Run object sync

Navigate to Connections.
Click on the more actions menu for the connection you are trying to upgrade.
Select Run Object Sync.
Click the checkbox to Also scan all inactive objects.
Click Run Object Sync.

Now, navigate back to the Upgrade Manager tab, and if all your data sources are successfully upgraded, finalize the upgrade.

Finalize the upgrade

Once the un-upgraded data sources are resolved, you can complete the upgrade.

Navigate to the Upgrade Manager.
Click Finalize.

Change the system user credentials used by Immuta

If you have another set of credentials on hand with wider permissions, you can edit the connection to use these credentials instead to resolve the un-upgraded data sources.

Edit the connection

Navigate to Connections.
Select the connection you are trying to upgrade.
Navigate to the Connections tab.
Click Edit and then Next
Enter the new credentials in the textbox and continue to the end to save.

Run object sync

Navigate to Connections.
Click on the more actions menu for the connection you are trying to upgrade.
Select Run Object Sync.
Click the checkbox to Also scan all inactive objects.
Click Run Object Sync.

Now, navigate back to the Upgrade Manager tab, and if all your data sources are successfully upgraded, finalize the upgrade.

Finalize the upgrade

Once the un-upgraded data sources are resolved, you can complete the upgrade.

Navigate to the Upgrade Manager.
Click Finalize.

Reference Guides

Connections

Public preview

Once you register your connection, Immuta presents a hierarchical view of your data that reflects the hierarchy of objects in your data platform:

Account (Snowflake) or Metastore (Databricks Unity Catalog)
Database
Schema
Tables: These represent the individual objects in your data platform, and when active, become data sources

Beyond making the registration of your data more intuitive, connections provides more control. Instead of performing operations on individual schemas or tables, you can perform operations (such as object sync) at the connection level.

Requirements

See the Snowflake or Databricks Unity Catalog connection registration how-to guides for a list of requirements.

Object sync

Immuta will ensure the objects in your database stay synchronous with the registered objects in Immuta. To do this, Immuta uses the account credentials provided during registration to check the remote technology for object changes like a table being created, new columns being added to a table, or a table being deleted.

If tables are added, new data sources are created in Immuta. If remote tables are deleted, the corresponding data sources and data objects will be removed from Immuta. And if a column changes in a table, those changes will be reflected in the Immuta data source data dictionary.

Your connection can be synced in two ways:

Periodic object sync: This happens once every 24 hours (at 1:00 AM UTC). Currently, updating this schedule is not configurable.
Manual object sync: You can manually run object sync on your whole connection or on any object in your connection.

Tracking new data source columns

When new columns are detected and added to Immuta, they will be automatically tagged with the New tag. This allows governors to use the seeded New Column Added global policy to mask columns with the New tag, since they could contain sensitive data.

The New Column Added global policy is staged (inactive) by default. See the Clone, activate, or stage a global policy guide to activate this seeded global policy if you want any columns with the New tag to be automatically masked.

Without connections, schema monitoring would also tag new data sources with the New tag. However this behavior is exclusive to schema monitoring and will not happen with object sync. Object sync only tags new columns of known data sources with the New tag.

Data source requests

When there is an active policy that targets the New tag, Immuta sends validation requests to data owners for the following changes made in the remote data platform:

Column added: Immuta applies the New tag on the column that has been added and sends a request to the data owner to validate if the new column contains sensitive data. Once the data owner confirms they have validated the content of the column, Immuta removes the New tag from it and as a result any policy that targets the New column tag no longer applies.
Column deleted: Immuta deletes the column from the data source's data dictionary in Immuta. Then, Immuta sends a request to the data owner to validate the deleted column.

For instructions on how to view and manage your tasks and requests in the Immuta UI, see the Manage access requests guide. To view and manage your tasks and requests via the Immuta API, see the Manage data source requests section of the API documentation.

Default settings

When registering a connection, Immuta sets the connection to the recommended default settings to protect your . The recommended settings are described below:

Object sync: This setting allows Immuta to monitor the connection for changes. When Immuta identifies a new table, a data source will automatically be created. Similarly, if remote tables are deleted, the corresponding data sources and data objects will be deleted in Immuta. This setting is enabled by default and cannot be disabled.
Default run schedule: This sets the time interval for Immuta to check for new objects. By default, this schedule is set to 24 hours.
Sensitive data discovery: This setting enables sensitive data discovery and allows you to select the sensitive data discovery framework that Immuta will apply to your data objects. This setting is enabled by default to use the preconfigured or global framework.
Impersonation: This setting enable and defines the role for user impersonation in Snowflake. User impersonation is not supported in the Databricks Unity Catalog integration. This setting is disabled by default.
Project workspaces: This setting enables Snowflake project workspaces. If you use Snowflake secure data sharing with Immuta, enable this setting, as project workspaces are required. If you use Snowflake table grants, disable this setting; project workspaces cannot be used when Snowflake table grants are enabled. Project workspaces are not supported in the Databricks Unity Catalog integration. This setting is disabled by default.

Deregistering a connection

Deregistering a connection automatically deletes all of its child objects in Immuta. However, Immuta will not remove the objects in your Snowflake or Databricks account.

Limitations

Snowflake and Databricks Unity Catalog are currently the only integrations that support connections
Databricks Unity Catalog: Delta shares are unsupported.

How-to guides

Reference guides

Upgrading to Connections

Public preview

This feature is public preview and available to select accounts. Reach out to your Immuta support professional to enable it on your tenant.

Exceptions

Do not upgrade to Connections if you meet any of the criteria below:

You are using Snowflake alternative OAuth claims for authentication
You are using the Databricks Spark integration
You are not on SaaS
You are using the V2 /data endpoint to register data sources and attach tags automatically

Integrations

Integrations are now connections. Once the upgrade is complete, you will control most integration settings at the connection level via the Connections tab in Immuta.

Integrations (existing)

Connections (new)

Integrations are set up from the Immuta app settings page or via the API. These integrations establish a relationship between Immuta and your data platform for policy orchestration. Then tables are registered as data sources through an additional step with separate credentials. Schemas and databases are not reflected in the UI.

Integrations and data sources are set up together with a single connection per account between Immuta and your data platform. Based on the privileges granted to the Immuta system user, metadata from databases, schemas, and tables is automatically pulled into Immuta and continuously monitored for any changes.

Supported technology and authorization methods

Snowflake

Snowflake OAuth (client secret and certificate are supported, but not alternate claims)
Username and password
Key pair

Databricks

Personal Access Token
M2M OAuth

Unsupported technologies

The following technologies are not yet supported with connections:

Azure Synapse Analytics
Databricks Spark
Google BigQuery
Redshift
S3
Starburst (Trino)

Supported features

The tables below outline Immuta features, their availability with integrations, and their availability with connections.

Snowflake

Feature

Integrations (existing)

Connections (new)

User impersonation

Project workspaces

Snowflake lineage

Supported

Query audit

Supported

Tag ingestion

Supported

Databricks Unity Catalog

Feature

Integrations (existing)

Connections (new)

User impersonation

Not supported

Project workspaces

Not supported

Query audit

Supported

Tag ingestion

Supported

Catalog isolation support

Supported

Not supported

Data sources

There will be no policy downtime on your data sources while performing the upgrade.

Supported object types

The supported object types are the same for both data sources with integration and data sources with connections. When applying read and write access policies to these data sources, the privileges granted by Immuta vary depending on the object type. See an outline of privileges granted by Immuta on Snowflake and Databricks Unity Catalog object types on the Subscription policy access types page.

Snowflake

Table
View
Materialized view
External table
Event table
Iceberg table
Dynamic table

Databricks Unity Catalog

Table
View
Materialized view
Streaming table
External table
Foreign table

Hierarchy

With connections, your data sources are ingested and presented to reflect the infrastructure hierarchy of your connected data platform. For example, this is what the new hierarchy will look like for a Snowflake connection:

Integrations (existing)

Connections (new)

Integration

Connection

Database

Schema

Data source

Data source (once activated, becomes available for policy enforcement)

Users and permissions

With integrations

Permission

Action

Object

APPLICATION_ADMIN

Configure integration

Integration

CREATE_DATA_SOURCE

Data source

Data owner

Manage data sources

Data source

With connections

Permission

Action

Object

APPLICATION_ADMIN

Connection, database, schema, data source

GOVERNANCE or APPLICATION_ADMIN

Manage all connections

Connection, database, schema, data source

Infrastructure admin

Manage a connection

Connection, database, schema, data source

Data owner

Manage data objects

Connection, database, schema, data source

Schema monitoring

Schema monitoring is renamed to object sync with connections, as it can also monitor for changes at database and connection level.

During object sync, Immuta crawls your connection to ingest metadata for every database, schema, and table that the Snowflake role or Databricks account credentials you provided during the configuration has access to. Upon completion of the upgrade, the tables' states depend on your previous schema monitoring settings:

If you had schema monitoring enabled on a schema: All tables from that schema will be registered in Immuta as active data sources.
If you had schema monitoring disabled on a schema: All tables from that schema (that were not already registered in Immuta) will be registered as inactive data sources. They are visible from the Data Objects tab in Immuta, but are not listed as data sources until they are activated.

After the initial upgrade, object sync runs on your connection every 24 hours (at 1:00 AM UTC) to keep your tables in Immuta in sync. Additionally, users can also manually run object sync via the UI or API.

Additional settings

Object sync provides additional controls compared to schema monitoring:

Object status: Connections, databases, schemas and tables can be marked active, which for tables make them appear as data sources, or inactive. These statuses are inherited to all lower objects by default, but that can be overridden. For example, if you make a database inactive, all schemas and tables within that database will also be inactive. However, if you want one of those tables to be a data source, you can manually activate it.
Activate new data objects: This setting controls what state new objects are registered as in Immuta when found by object sync.
- Active: New data objects found by object sync will automatically be activated and tables will be registered as data sources.
- Inactive: This is the default. New data objects found by object sync will be inactive.

Comparison

Integrations (existing)

Connections (new)

Name

Schema monitoring and column detection

Object sync

Where to turn on

Enable (optionally) when configuring a data source

Enabled by default

Where to update the feature

Enable or disable from the schema project

Object sync cannot be disabled

Default schedule

Every 24 hours

Every 24 hours (at 1:00 AM UTC)

Can you adjust the default schedule?

New tags applied automatically

New tags are applied automatically for a data source being created, a column being added, or a column type being updated on an existing data source

New tags are applied automatically for a column being added or a column type being updated on an existing data source

Performance

Connections use a new architectural pattern resulting in an improved performance when monitoring for in your data platform, particularly with large numbers of data sources. The following scenarios are regularly tested in an isolated environment in order to provide a benchmark. Please note, that these numbers can vary based on a number of factors such as (but not limited to) number and type of policies applied, overall API and user activity in the system, connection latency to your data platform.

Databricks Unity Catalog

Data sources with integrations, required users to manually create the schema monitoring job in Databricks. However, this job has been fully automated on data sources with connections, and this step is no longer necessary.

APIs

Consolidating integration setup and data source registration into a single connection significantly simplifies programmatic interaction with the Immuta APIs. Actions that used to be managed through multiple different endpoints can now be achieved through one simple and standardized one. As a result, multiple API endpoints are blocked once a user has upgraded their connection.

All blocked APIs will send an error indicating "400 Bad Request - [...]. Use the /data endpoint." This error indicates that you will need to update your processes that are calling the Immuta APIs to leverage the new /data endpoint instead. For details, see the API changes page.

Before You Begin

Connections are an improvement from the existing process for not only onboarding your data sources but also managing the integration. However, there are some differences between the two processes that should be noted and understood before you start with the upgrade.

API changes: See the API changes pages for a complete breakdown of the APIs that will not work once you begin the upgrade. These changes will mostly affect users with automated API calls around schema monitoring and data source registration.
Automated data source names: Previously, you could name data sources manually. However, data sources from connections are automatically named using the information (database, schema, table) and casing from your data platform. For example, on Snowflake this will typically mean that my_table will become My Connection.MY_DATABASE.MY_SCHEMA.MY_TABLE.
If you are leveraging Immuta APIs, you may need to adjust code to allow for the new data source names.
Schema projects phased out: With integrations, many settings and the connection info for data sources were controlled in the schema project. This functionality is no longer needed with connections and now you can control connection details in a central spot.
New hierarchy display: With integrations, tables were brought in as data sources and presented as a flat list on the data source list page. With connections, databases and schemas are displayed as objects too.
Change from schema monitoring to object sync: Object metadata synchronization between Immuta and your data platform is no longer optional but always required:
1. If schema monitoring is off before the upgrade: Once the connection is registered, everything the system user can see will be pulled into Immuta and, if it didn't already exist in Immuta, it will be an inactive object. These inactive objects exist so you can see them, but policy is not protecting the objects, and they will not appear as data sources.
2. If schema monitoring is on before the upgrade: Once the connection is registered, everything the system user can see will be pulled into Immuta. If it already existed in Immuta, it will be an active object and continue to appear as data source.
Enabling a connection will enable all databases, schemas, and tables in the hierarchy: If the connection is disabled after completing your upgrade to connections, only enable the host if you want to enable all databases, schemas, and tables within it.
Enabling a table that is ordinarily disabled will elevate it to a data source. Immuta will then apply data and subscription policies on that data source.

API Changes

Action

Deprecated endpoint

Use this with connections instead

Create a single data source

Step 1: Ensure your system user has been granted access to the relevant object in the data platform.

Step 2: Wait until the next object sync or manually trigger a metadata crawl using .

Step 3: If the parent schema has activateNewChildren: false,

with settings: isActive: true.

Bulk create data sources

Step 1: Ensure your system user has been granted access to the relevant object in the data platform.

Step 2: Wait until the next object sync or manually trigger a metadata crawl using .

Step 3: If the parent schema has activateNewChildren: false,

with settings: isActive: true.

Edit a data source connection

No substitute. Data sources no longer have their own separate connection details but are tied to the parent connection.

Bulk edit data source's connections

No substitute. Data sources no longer have their own separate connection details but are tied to the parent connection.

Run schema detection (object sync)

Delete a data source

Bulk delete data sources

Enable a single data source

with settings: isActive: true

Bulk enable data sources

with settings: isActive: true

Disable a single data source

with settings: isActive: false

Bulk disable data sources

with settings: isActive: false

Edit a data source name

No substitute. Data source names are automatically generated based on information from your data platform.

Edit a display name

No substitute. Data sources no longer have their own separate connection details but are tied to the parent connection.

Override a host name

No substitute. Data sources no longer have their own separate connection details but are tied to the parent connection.

Create an integration/connection

Update an integration/connection

Delete an integration/connection

Delete and update a data dictionary

PUT

No substitute. Data source dictionaries are automatically generated based on information from your data platform.

Update a data source owner

with settings: dataOwners

Response to a data source owner request

with settings: dataOwners

FAQ

What are connections?

What will change with connections?

There are three high-level changes:

Automatic table registration: All unregistered tables that the configured credentials have access to will be registered into Immuta in a disabled state. All tables and schemas under this connection with schema monitoring on will continue to be monitored with object sync.
Simplified table names: All data source names will now reflect the connection and hierarchy. If your tables were not already named this way, the names will be changed.
Fewer API endpoints: When this upgrade begins, a select number of data and integration API endpoints will be blocked for this connection and its tables. See the documentation, linked below, for a complete list of the impacted endpoints.

For a more in-depth look at the differences, see the Upgrading to a connection guide and Before you begin page.

How will connections affect my existing integrations?

Your integrations will continue to work throughout the upgrade process with zero downtime. The integrations will continue to be visible in the Integrations tab on the Immuta app settings page.

Post upgrade, some configuration options will now be part of the connection menu: credentials, enabling, and disabling.

How will connections affect my existing data sources?

All pre-existing data sources will continue to exist. If you have used a custom naming template, you will see names getting updated as the connection uses the information from your data platform to generate data source names.

How will connections affect my policies?

Connections do not impact any policies or user access in your data platform.

How will connections affect my users?

Connections will not affect your registered users or their access in your data platform.

However, Immuta administrators will see notable differences in the UI with a new Connections tab now being displayed.

Do I need to change my scripts running against the Immuta APIs if I want to use connections?

Most likely, since there are a number of API changes in regard to data sources and integrations. See the API changes guide for details about each affected API endpoint and the substitute.

Are the permissions required for the system user different with connections?

No, the Immuta system user still requires the same privileges in your data platform. See the Upgrading to a connection guide for more details.

What is going to happen with the integrations?

You can continue to use the integrations. However, we strongly recommend upgrading to connections due to their many benefits.

Is my environment the right choice for the connections upgrade?

Connections support Snowflake or Databricks Unity Catalog technologies. See the Upgrading to a connection guide for more details and reach out to your Immuta support professional if you are interested in the upgrade.

Can I run object sync on data sources not registered with a connection?

No. Object sync is only for data sources registered through connections. Continue to use schema monitoring for any existing data sources that are not upgraded.

Registering Metadata

A data source is how data owners expose their data across their organization to other Immuta users. Throughout this process, the data is not copied. Instead, Immuta uses metadata from the data source to determine how to expose the data. An Immuta data source is a virtual representation of data that exists in a remote data platform.

This section includes reference and how-to guides for registering and managing data sources.

Data sources in Immuta

This reference guide describes Immuta data sources and their major components.

Register data sources

These how-to guides illustrate how to register data in Immuta.

Data source settings

The guides in this section illustrate how to manage and edit data sources and data dictionaries.

Schema monitoring

The reference and how-to guides in this section describe schema monitoring and illustrate how to configure it for your integration.

Data Sources in Immuta

Data owners expose their data across their organization to other users by registering that data in Immuta as a data source.

By default, data owners can register data in Immuta without affecting existing policies on those tables in their remote system, so users who had access to a table before it was registered can still access that data without interruption. If this default behavior is disabled on the app settings page, a subscription policy that requires data owners to manually add subscribers to data sources will automatically apply to new data sources (unless a global policy you create applies), blocking access to those tables.

For information about the default subscription policy and how to manage it, see the Subscription policies guide.

Click a link below to navigate to a tutorial that details how to create a data source:

Snowflake data sources
Bulk create Snowflake data sources
Databricks data sources
Google BigQuery data sources
Starburst data sources
Redshift data sources
Azure Synapse Analytics data sources
Amazon S3 data sources

Data sources with nested columns

You can create Databricks data sources with nested columns when you enable complex data types. When complex types are enabled, Databricks data sources can have columns that are arrays, maps, or structs that can be nested. These columns get parsed into a nested data dictionary.

Data source user roles

There are various roles users and groups can play relating to each data source. These roles are managed through the members tab of the data source. Roles include the following types:

Owners: Those who create and manage new data sources and their users, documentation, and data dictionaries.
Subscribers: Those who have access to the data source data. With the appropriate data accesses and attributes, these users and groups can view files, run queries, and generate analytics against the data source data. All users and groups granted access to a data source have subscriber status.
Experts: Those who are knowledgeable about the data source data and can elaborate on it. They are responsible for managing the data source's documentation and data dictionary tags and descriptions.

See Manage data source members for a tutorial on modifying user roles.

Data dictionary

The data dictionary provides information about the columns within the data source, including column names and value types.

Dictionary columns are automatically generated when the data source is created. However, data owners and experts can tag columns in the data dictionary and add descriptions to these entries.

Register Data Sources

When a data source is exposed, policies are dynamically enforced on the data, appropriately redacting and masking information depending on the attributes or groups of the user accessing the data. Once the data source is exposed and subscribed to, the data can be accessed in a consistent manner, allowing reproducibility and collaboration.

This section includes how-to guides for registering data sources in Immuta:

Register an Amazon S3 data source
Register an Azure Synapse Analytics data source
Register a Databricks data source
Register a Google BigQuery data source
Register a Redshift data source
Register a Snowflake data source
Bulk create Snowflake data sources
Register a Starburst (Trino) data source

Amazon S3 Data Source

Private preview

The Amazon S3 integration is available to select accounts. Reach out to your Immuta representative for details.

Requirement

CREATE_S3_DATA_SOURCE Immuta permission

Prerequisite

Configure the Amazon S3 integration

Create a data source

Navigate to the Data Sources list page in Immuta.
Click Register Data Source.
Select the S3 tile in the data platform section.
Select your AWS Account/Region from the dropdown menu.
Opt to select a default domain to which data sources will be assigned.
Opt to add default tags to the data sources.
Click Next.
The prefix field is populated with the base path. Add to this prefix to create a data source for a prefix, bucket, or object.
- If the data source prefix ends in a wildcard (*), it protects all items starting with that prefix. For example, a base location of s3:// and a data source prefix surveys/2024* would protect paths like s3://surveys/2024-internal/research-dept.txt or s3://surveys/2024-customer/april/us.csv.
- If the data source prefix ends without a wildcard (*), it protects a single object. For example, a base location path of s3:// and a data source prefix of research-data/demographics would only protect the object that exactly matches s3://research-data/demographics.
Click Add Prefix, and then click Next.
Verify that your prefixes are correct and click Complete Setup.

Azure Synapse Analytics Data Source

Enter connection information

Navigate to the Data Sources list page and click Register Data Source.
Select the Azure Synapse Analytics tile in the Data Platform section.
Complete these fields in the Connection Information box:
- Server: hostname or IP address
- Port: port configured for Azure Synapse Analytics
- SSL: when enabled, ensures communication between Immuta and the remote database is encrypted
- Database: the remote database
- Username: the username to use to connect to the remote database and retrieve records for this data source
- Password: the password to use with the above username to connect to the remote database
You can then choose to enter Additional Connection String Options or Upload Certificates to connect to the database.
Click the Test Connection button.

Use SSL

Although not required, it is recommended that all connections use SSL. Additional connection string arguments may also be provided.

Note: Only Immuta uses the connection you provide and injects all policy controls when users query the system. In other words, users always connect through Immuta with policies enforced and have no direct association with this connection.

Considerations

Immuta pushes down joins to be processed on the remote database when possible. To ensure this happens, make sure the connection information matches between data sources, including host, port, ssl, username, and password. You will see performance degradation on joins against the same database if this information doesn't match.
If a client certificate is required to connect to the source database, you can add it in the Upload Certificates section.

Select virtual population

Decide how to virtually populate the data source by selecting one of the options:

Create sources for all tables in this database: This option will create data sources and keep them in sync for every table in the dataset. New tables will be automatically detected and new Immuta views will be created.
Schema / Table: This option will allow you to specify tables or datasets that you want Immuta to register.
1. Opt to Edit in the table selection box that appears.
2. By default, all schemas and tables are selected. Select and deselect by clicking the checkbox for the schemas in the Import Schemas/Tables modal. You can create multiple data sources at one time by selecting an entire schema or multiple tables.
3. After making your selection(s), click Apply.

Enter basic information

Enter the SQL Schema Name Format to be the SQL name that the data source exists under in Immuta. It must include a schema macro but you may personalize it using lowercase letters, numbers, and underscores to personalize the format. It may have up to 255 characters.
Enter the Schema Project Name Format to be the name of the schema project in the Immuta UI. If you enter a name that already exists, the name will automatically be incremented. For example, if the schema project Customer table already exists and you enter that name in this field, the name for this second schema project will automatically become Customer table 2 when you create it.
1. When selecting Create sources for all tables in this database and monitor for changes you may personalize this field as you wish, but it must include a schema macro.
2. When selecting Schema/Table this field is prepopulated with the recommended project name and you can edit freely.
Select the Data Source Name Format, which will be the format of the name of the data source in the Immuta UI.
- <Tablename>: The data source name will be the name of the remote table, and the case of the data source name will match the case of the macro.
- <Schema><Tablename>: The data source name will be the name of the remote schema followed by the name of the remote table, and the case of the data source name will match the cases of the macros.
- Custom: Enter a custom template for the Data Source Name. You may personalize this field as you wish, but it must include a tablename macro. The case of the macro will apply to the data source name (i.e., <Tablename> will result in "Data Source Name," <tablename> will result in "data source name," and <TABLENAME> will result in "DATA SOURCE NAME").
Enter the SQL Table Name Format, which will be the format of the name of the table in Immuta. It must include a table name macro, but you may personalize the format using lowercase letters, numbers, and underscores. It may have up to 255 characters.

Enable or disable schema monitoring

Schema monitoring best practices

Schema monitoring is a powerful tool that ensures tables are all governed by Immuta.

Consider using schema monitoring later in your onboarding process, not during your initial setup and configuration when tables are not in a stable state.
Consider using Immuta’s API to either run the schema monitoring job when your ETL process adds new tables or to add new tables.
Activate the new column added templated global policy to protect potentially sensitive data. This policy will null the new columns until a data owner reviews new columns that have been added, protecting your data to avoid data leaks on new columns getting added without being reviewed first.

When selecting the Schema/Table option, you can opt to enable Schema Monitoring by selecting the checkbox in this section.

Note: This step will only appear if all tables within a server have been selected for creation.

Opt to configure advanced settings

Although not required, completing these steps will help maximize the utility of your data source. Otherwise, click Create to save the data source.

Column detection

This setting monitors when remote tables' columns have been changed, updates the corresponding data sources in Immuta, and notifies Data Owners of these changes.

To enable, select the checkbox in this section.

See the Schema projects overview page to learn more about column detection.

Event time

An Event Time column denotes the time associated with records returned from this data source. For example, if your data source contains news articles, the time that the article was published would be an appropriate Event Time column.

Click the Edit button in the Event Time section.
Select the column(s).
Click Apply.

Selecting an Event Time column will enable

more statistics to be calculated for this data source including the most recent record time, which is used for determining the freshness of the data source.
the creation of time-based restrictions in the policy builder.

Latency

Click Edit in the Latency section.
Complete the Set Time field, and then select MINUTES, HOURS, or DAYS from the subsequent dropdown menu.
Click Apply.

This setting impacts how often Immuta checks for new values in a column that is driving row-level redaction policies. For example, if you are redacting rows based on a country column in the data, and you add a new country, it will not be seen by the Immuta policy until this period expires.

Data source tags

Adding tags to your data source allows users to search for the data source using the tags and Governors to apply Global policies to the data source. Note if Schema Detection is enabled, any tags added now will also be added to the tables that are detected.

To add tags,

Click the Edit button in the Data Source Tags section.
Begin typing in the Search by Tag Name box to select your tag, and then click Add.

Tags can also be added after you create your data source from the data source details page on the overview tab or the data dictionary tab.

Create the data source

Click Create to save the data source(s).

Databricks Data Source

This page details how to register Databricks data sources using the existing workflow. To register data sources using connections, see this how-to guide.

Requirements

Databricks Spark integration

When exposing a table or view from an Immuta-enabled Databricks cluster, be sure that at least one of these traits is true:

The user exposing the tables has READ_METADATA and SELECT permissions on the target views/tables (specifically if Table ACLs are enabled).
The user exposing the tables is listed in the immuta.spark.acl.whitelist configuration on the target cluster.
The user exposing the tables is a Databricks workspace administrator.

Databricks Unity Catalog integration

When exposing a table from Databricks Unity Catalog, be sure the credentials used to register the data sources have the Databricks privileges listed below.

The following privileges on the parent catalogs and schemas of those tables:
- SELECT
- USE CATALOG
- USE SCHEMA
USE SCHEMA on system.information_schema

Azure Databricks Unity Catalog limitation

Set all table-level ownership on your Unity Catalog data sources to an individual user or service principal instead of a Databricks group before proceeding. Otherwise, Immuta cannot apply data policies to the table in Unity Catalog. See the Azure Databricks Unity Catalog limitation for details.

Enter connection information

Use SSL

Although not required, it is recommended that all connections use SSL. Additional connection string arguments may also be provided.

Navigate to the Data Sources list page and click Register Data Source.
Select the Databricks tile in the Data Platform section. When exposing a table or view from an Immuta-enabled Databricks cluster, be sure that at least one of these traits is true:
- The user exposing the tables has READ_METADATA and SELECT permissions on the target views/tables (specifically if Table ACLs are enabled).
- The user exposing the tables is listed in the `immuta.spark.acl.whitelist` configuration on the target cluster.
- The user exposing the tables is a Databricks workspace administrator.
Complete the first four fields in the Connection Information box:
- Server: hostname or IP address
- Port: port configured for Databricks, typically port 443
- SSL: when enabled, ensures communication between Immuta and the remote database is encrypted
- Database: the remote database
Select your authentication method from the dropdown:
- Access Token:
  1. Enter your Databricks API Token. Use a non-expiring token so that access to the data source is not lost unexpectedly.
  2. Enter the HTTP Path of your Databricks cluster or SQL warehouse.
- OAuth machine-to-machine (M2M):
  1. Enter the HTTP Path of your Databricks cluster or SQL warehouse.
  2. Fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.cloud.databricks.com/oidc/v1/token.
  3. Fill out the Client ID. This is a combination of letters, numbers, or symbols, used as a public identifier and is the same as the service principal's application ID.
  4. Enter the Scope (string). The scope limits the operations and roles allowed in Databricks by the access token. See the OAuth 2.0 documentation for details about scopes.
  5. Enter the Client Secret. Immuta uses this secret to authenticate with the authorization server when it requests a token.
If you are using a proxy server with Databricks, specify it in the Additional Connection String Options:
```
UseProxy=1;ProxyHost=my.host.com;ProxyPort=6789
```
Click Test Connection.

Further considerations

Immuta pushes down joins to be processed on the remote database when possible. To ensure this happens, make sure the connection information matches between data sources, including host, port, ssl, username, and password. You will see performance degradation on joins against the same database if this information doesn't match.
If a client certificate is required to connect to the source database, you can add it in the Upload Certificates section.

Select virtual population

Decide how to virtually populate the data source by selecting one of the options:

Create sources for all tables in this database: This option will create data sources and keep them in sync for every table in the dataset. New tables will be automatically detected and new Immuta views will be created.
Schema / Table: This option will allow you to specify tables or datasets that you want Immuta to register.
1. Opt to Edit in the table selection box that appears.
2. By default, all schemas and tables are selected. Select and deselect by clicking the checkbox for the schemas in the Import Schemas/Tables modal. You can create multiple data sources at one time by selecting an entire schema or multiple tables.
3. After making your selection(s), click Apply.

Enter basic information

Enter the SQL Schema Name Format to be the SQL name that the data source exists under in Immuta. It must include a schema macro but you may personalize it using lowercase letters, numbers, and underscores to personalize the format. It may have up to 255 characters.
Enter the Schema Project Name Format to be the name of the schema project in the Immuta UI. If you enter a name that already exists, the name will automatically be incremented. For example, if the schema project Customer table already exists and you enter that name in this field, the name for this second schema project will automatically become Customer table 2 when you create it.
1. When selecting Create sources for all tables in this database and monitor for changes you may personalize this field as you wish, but it must include a schema macro.
2. When selecting Schema/Table this field is prepopulated with the recommended project name and you can edit freely.
Select the Data Source Name Format, which will be the format of the name of the data source in the Immuta UI.
- <Tablename>: The data source name will be the name of the remote table, and the case of the data source name will match the case of the macro.
- <Schema><Tablename>: The data source name will be the name of the remote schema followed by the name of the remote table, and the case of the data source name will match the cases of the macros.
- Custom: Enter a custom template for the Data Source Name. You may personalize this field as you wish, but it must include a tablename macro. The case of the macro will apply to the data source name (i.e., <Tablename> will result in "Data Source Name," <tablename> will result in "data source name," and <TABLENAME> will result in "DATA SOURCE NAME").
Enter the SQL Table Name Format, which will be the format of the name of the table in Immuta. It must include a table name macro, but you may personalize the format using lowercase letters, numbers, and underscores. It may have up to 255 characters.

Enable or disable schema monitoring

Note: This step will only appear if all tables within a server have been selected for creation.

Schema monitoring best practices

Schema monitoring is a powerful tool that ensures tables are all governed by Immuta.

Consider using schema monitoring later in your onboarding process, not during your initial setup and configuration when tables are not in a stable state.
Consider using Immuta’s API to either run the schema monitoring job when your ETL process adds new tables or to add new tables.
Activate the new column added templated global policy to protect potentially sensitive data. This policy will null the new columns until a data owner reviews new columns that have been added, protecting your data to avoid data leaks on new columns getting added without being reviewed first.

Generate your Immuta API Key from your user profile page. The Immuta API key used in the Databricks notebook job for schema detection must either belong to an Immuta admin or the user who owns the schema detection groups that are being targeted.
On the data source creation page, click the checkbox to enable Schema Monitoring or Detect Column Changes.
Click Download Schema Job Detection Template and then the Click Here To Download text.
Before you can run the script, follow the Databricks documentation to create the scope and secret using the Immuta API Key generated on your user profile page.
Import the Python script you downloaded into a Databricks workspace as a notebook. Note: The job template has commented out lines for specifying a particular database or table. With those two lines commented out, the schema detection job will run against ALL databases and tables in Databricks. Additionally, if you need to add proxy configuration to the job template, the template uses the Python requests library, which has a simple mechanism for configuring proxies for a request.
Schedule the script as part of a notebook job to run as often as required. Each time the job runs, it will make an API call to Immuta to trigger schema detection queries, and these queries will run on the cluster from which the request was made. Note: Use the api_immuta cluster for this job. The job in Databricks must use an Existing All-Purpose Cluster so that Immuta can connect to it over ODBC. Job clusters do not support ODBC connections.

Opt to configure advanced settings

Although not required, completing these steps will help maximize the utility of your data source. Otherwise, click Create to save the data source.

Column detection

This setting monitors when remote tables' columns have been changed, updates the corresponding data sources in Immuta, and notifies Data Owners of these changes.

To enable, select the checkbox in this section.

See the Schema projects overview page to learn more about column detection.

Event time

Click the Edit button in the Event Time section.
Select the column(s).
Click Apply.

Selecting an Event Time column will enable

more statistics to be calculated for this data source including the most recent record time, which is used for determining the freshness of the data source.
the creation of time-based restrictions in the policy builder.

Latency

Click Edit in the Latency section.
Complete the Set Time field, and then select MINUTES, HOURS, or DAYS from the subsequent dropdown menu.
Click Apply.

Sensitive data discovery

Data owners can disable sensitive data discovery for their data sources in this section.

Click Edit in this section.
Select Enabled or Disabled in the window that appears, and then click Apply.

Data source tags

To add tags,

Click the Edit button in the Data Source Tags section.
Begin typing in the Search by Tag Name box to select your tag, and then click Add.

Tags can also be added after you create your data source from the data source details page on the overview tab or the data dictionary tab.

Create the data source

Click Create to save the data source(s).

Google BigQuery Data Source

Private preview: Google BigQuery is available to select accounts. Reach out to your Immuta representative for details.

Requirements

CREATE_DATA_SOURCE Immuta permission
Google BigQuery roles:
- roles/bigquery.metadataViewer on the source table (if managed at that level) or dataset
- roles/bigquery.dataViewer (or higher) on the source table (if managed at that level) or dataset
- roles/bigquery.jobUser on the project

Prerequisite

Configure the Google BigQuery integration

Create a Google Cloud service account for creating Google BigQuery data sources

Google BigQuery data sources in Immuta must be created using a Google Cloud service account rather than a Google Cloud user account. If you do not currently have a service account for the Google Cloud project separate from the Google Cloud service account you created when configuring the Google BigQuery integration, you must create a Google Cloud service account with privileges to view and run queries against the tables you are protecting.

You have two options to create the required Google Cloud service account:

Create a service account by using Google Cloud Console.
Create a service account by using gcloud.

Create a service account using the Google Cloud web console

Using the Google Cloud documentation, create a service account with the following roles:
- BigQuery User
- BigQuery Data Viewer
Using the Google Cloud documentation, generate a service account key for the account you just created.

Create a service account using gcloud

Copy the script below and update the SERVICE_ACCOUNT, PROJECT_ID, and IMMUTA_GCP_KEY_FILE values.
- SERVICE_ACCOUNT is the name for the new service account.
- PROJECT_ID is the project ID for the Google Cloud Project that is integrated with Immuta.
- IMMUTA_GCP_KEY_FILE is the path to a new output file for the private key.

Use the script below in the gcloud command line. This script is a template; change values as necessary:

# Fill these out
# Please use .json extension for key
export SERVICE_ACCOUNT=datasource-account
export PROJECT_ID=project123
export IMMUTA_GCP_KEY_FILE=~/GCP_${SERVICE_ACCOUNT}_key.json

# Create service account for creating data sources
gcloud iam service-accounts create ${SERVICE_ACCOUNT} --project ${PROJECT_ID}

# Generate keyfile
gcloud iam service-accounts keys create ${IMMUTA_GCP_KEY_FILE} --iam-account=${SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com

# Allow account to execute queries
#gcloud projects add-iam-policy-binding ${PROJECT_ID} \
#--member="serviceAccount:${SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com" --role=projects/${PROJECT_ID}/roles/bigquery.user
gcloud projects add-iam-policy-binding ${PROJECT_ID} \
--member="serviceAccount:${SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com" --role=roles/bigquery.user

# Allow account to view data
gcloud projects add-iam-policy-binding ${PROJECT_ID} \
--member="serviceAccount:${SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com" --role=roles/bigquery.dataViewer

echo if something went wrong and you want to delete the service account, run:
echo gcloud iam service-accounts delete ${SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com --project ${PROJECT_ID}

Register data sources in Immuta

Required Google BigQuery roles

Ensure that the user creating the data source has these Google BigQuery roles:

roles/bigquery.metadataViewer on the source table (if managed at that level) or dataset
roles/bigquery.dataViewer (or higher) on the source table (if managed at that level) or dataset
roles/bigquery.jobUser on the project

Navigate to the Data Sources list page.
Click Register Data Source.
Select the Google BigQuery tile in the Data Platform section.
Complete these fields in the Connection Information box:
- Account Email Address: Enter the email address of a user with access to the dataset and tables. This is the account created in the Google BigQuery configuration guide.
- Project: Enter the name of the project that has been integrated with Immuta.
- Dataset: Enter the name of the dataset with the tables you want Immuta to ingest.
Upload a BigQuery Key File in the modal. Note that the account in the key file must match the account email address entered in the previous step.
Click the Test Connection button. If the connection is successful, a check mark and successful connection notification will appear and you will be able to proceed. If an error occurs when attempting to connect, the error will be displayed in the UI. In order to proceed to the next step of data source creation, you must be able to connect to this data source using the connection information that you just entered.
Decide how to virtually populate the data source by selecting one of the options:
- Create sources for all tables in this database: This option will create data sources and keep them in sync for every table in the dataset. New tables will be automatically detected and new Immuta views will be created.
- Schema / Table: This option will allow you to specify tables or datasets that you want Immuta to register.
Provide basic information about your data source to make it discoverable to users.
- Enter the SQL Schema Name Format to be the SQL name that the data source exists under in Immuta. For BigQuery the schema will be the BigQuery dataset. The format must include a schema macro but you may personalize it using lowercase letters, numbers, and underscores to personalize the format. It can have up to 255 characters.
- Enter the Schema Project Name Format to be the name of the schema project in the Immuta UI. This is an Immuta project that will hold all of the metadata for the tables in a single dataset.
  - When selecting Create sources for all tables in this database and monitor for changes, you may personalize this field as you wish, but it must include a schema macro to represent the dataset name.
  - When selecting Schema/Table, this field is pre-populated with the recommended project name and you can edit freely.
- Select the Data Source Name Format, which will be the format of the name of the data source in the Immuta UI.
  - <Tablename>: The Immuta data source will have the same name as the original table.
  - <Schema><Tablename>: The Immuta data source will have both the dataset and original table name.
  - Custom: This is a template you create to make the data source name. You may personalize this field as you wish, but it must include a tablename macro. The case of the macro will apply to the data source name (i.e., <Tablename> will result in "Data Source Name," <tablename> will result in "data source name," and <TABLENAME> will result in "DATA SOURCE NAME").
- Enter the SQL Table Name Format, which will be the format of the name of the table in Immuta. It must include a table name macro, but you may personalize the format using lowercase letters, numbers, and underscores. It may have up to 255 characters.
When selecting the Schema/Table option, you can opt to enable schema monitoring by selecting the checkbox in this section. This step will only appear if all tables within a server have been selected for creation.
Optional Advanced Settings:
- Column Detection: To enable, select the checkbox in this section. This setting monitors when remote tables' columns have been changed, updates the corresponding data sources in Immuta, and notifies data owners of these changes. See schema projects overview to learn more about column detection.
- Data Source Tags: Adding tags to your data source allows users to search for the data source using the tags and governors to apply global policies to the data source. Note if schema detection is enabled, any tags added now will also be added to the tables that are detected.
  - Click the Edit button in the Data Source Tags section.
  - Begin typing in the Search by Tag Name box to select your tag, and then click Add.
Click Create to save the data source(s).

Next steps

With data sources registered in Immuta, your organization can now start

building global subscription and data policies to govern data.
creating projects to collaborate.

Redshift Data Source

Redshift data sources

Redshift Spectrum data sources must be registered via the Immuta CLI or V2 API using this payload.
Registering Redshift datashares as Immuta data sources is unsupported.

Requirement

The enable_case_sensitive_identifier parameter must be set to false (default setting) for your Redshift cluster.

Enter connection information

Navigate to the Data Sources list page and click Register Data Source.
Select the Redshift tile in the Data Platform section.
Complete these fields in the Connection Information box:
- Server: hostname or IP address
- Port: port configured for Redshift, typically port 5439
- SSL: when enabled, ensures communication between Immuta and the remote database is encrypted
- Database: the remote database
- Username: the username to use to connect to the remote database and retrieve records for this data source
- Password: the password to use with the above username to connect to the remote database
You can then choose to enter Additional Connection String Options or Upload Certificates to connect to the database.
Click the Test Connection button.

Use SSL

Although not required, it is recommended that all connections use SSL. Additional connection string arguments may also be provided.

Further considerations

Immuta pushes down joins to be processed on the remote database when possible. To ensure this happens, make sure the connection information matches between data sources, including host, port, ssl, username, and password. You will see performance degradation on joins against the same database if this information doesn't match.
If a client certificate is required to connect to the source database, you can add it in the Upload Certificates section.

Select virtual population

Decide how to virtually populate the data source by selecting one of the options:

Create sources for all tables in this database: This option will create data sources and keep them in sync for every table in the dataset. New tables will be automatically detected and new Immuta views will be created.
Schema / Table: This option will allow you to specify tables or datasets that you want Immuta to register.
1. Opt to Edit in the table selection box that appears.
2. By default, all schemas and tables are selected. Select and deselect by clicking the checkbox for the schemas in the Import Schemas/Tables modal. You can create multiple data sources at one time by selecting an entire schema or multiple tables.
3. After making your selection(s), click Apply.

Enter basic information

Enter the SQL Schema Name Format to be the SQL name that the data source exists under in Immuta. It must include a schema macro but you may personalize it using lowercase letters, numbers, and underscores to personalize the format. It may have up to 255 characters.
Enter the Schema Project Name Format to be the name of the schema project in the Immuta UI. If you enter a name that already exists, the name will automatically be incremented. For example, if the schema project Customer table already exists and you enter that name in this field, the name for this second schema project will automatically become Customer table 2 when you create it.
1. When selecting Create sources for all tables in this database and monitor for changes you may personalize this field as you wish, but it must include a schema macro.
2. When selecting Schema/Table this field is prepopulated with the recommended project name and you can edit freely.
Select the Data Source Name Format, which will be the format of the name of the data source in the Immuta UI.
- <Tablename>: The data source name will be the name of the remote table, and the case of the data source name will match the case of the macro.
- <Schema><Tablename>: The data source name will be the name of the remote schema followed by the name of the remote table, and the case of the data source name will match the cases of the macros.
- Custom: Enter a custom template for the Data Source Name. You may personalize this field as you wish, but it must include a tablename macro. The case of the macro will apply to the data source name (i.e., <Tablename> will result in "Data Source Name," <tablename> will result in "data source name," and <TABLENAME> will result in "DATA SOURCE NAME").
Enter the SQL Table Name Format, which will be the format of the name of the table in Immuta. It must include a table name macro, but you may personalize the format using lowercase letters, numbers, and underscores. It may have up to 255 characters.

Enable or disable schema monitoring

Schema monitoring best practices

Schema monitoring is a powerful tool that ensures tables are all governed by Immuta.

Consider using schema monitoring later in your onboarding process, not during your initial setup and configuration when tables are not in a stable state.
Consider using Immuta’s API to either run the schema monitoring job when your ETL process adds new tables or to add new tables.
Activate the new column added templated global policy to protect potentially sensitive data. This policy will null the new columns until a data owner reviews new columns that have been added, protecting your data to avoid data leaks on new columns getting added without being reviewed first.

When selecting the Schema/Table option, you can opt to enable Schema Monitoring by selecting the checkbox in this section.

Note: This step will only appear if all tables within a server have been selected for creation.

Opt to configure advanced settings

Although not required, completing these steps will help maximize the utility of your data source. Otherwise, click Create to save the data source.

Column detection

This setting monitors when remote tables' columns have been changed, updates the corresponding data sources in Immuta, and notifies Data Owners of these changes.

To enable, select the checkbox in this section.

See the Schema projects overview page to learn more about column detection.

Event time

Click the Edit button in the Event Time section.
Select the column(s).
Click Apply.

Selecting an Event Time column will enable

more statistics to be calculated for this data source including the most recent record time, which is used for determining the freshness of the data source.
the creation of time-based restrictions in the policy builder.

Latency

Click Edit in the Latency section.
Complete the Set Time field, and then select MINUTES, HOURS, or DAYS from the subsequent dropdown menu.
Click Apply.

Sensitive data discovery

Data owners can disable sensitive data discovery for their data sources in this section.

Click Edit in this section.
Select Enabled or Disabled in the window that appears, and then click Apply.

Data source tags

To add tags,

Click the Edit button in the Data Source Tags section.
Begin typing in the Search by Tag Name box to select your tag, and then click Add.

Tags can also be added after you create your data source from the data source details page on the overview tab or the data dictionary tab.

Create the data source

Click Create to save the data source(s).

Snowflake Data Source

This page details how to register Snowflake data sources using the existing workflow. To register data sources using the connections, see this how-to guide.

Requirements

CREATE_DATA_SOURCE Immuta permission
USAGE Snowflake privilege on the schema and database
REFERENCES Snowflake privilege on the tables

Snowflake imported databases

Immuta does not support Snowflake tables from imported databases. Instead, create a view of the table and register that view as a data source.

Enter connection information

Use SSL

Although not required, all connections should use SSL. Additional connection string arguments may also be provided.

Navigate to the Data Sources list page and click Register Data Source.
Select the Snowflake tile in the Data Platform section.
Complete these fields in the Connection Information box:
- Server: hostname or IP address
- Port: port configured for Snowflake, typically port 443
- SSL: when enabled, ensures communication between Immuta and the remote database is encrypted
- Warehouse: Snowflake warehouse that contains the remote database
- Database: remote database
From the Select Authentication Method Dropdown, select either Username and Password, Key Pair Authentication or Snowflake External OAuth:
- Username and Password
  1. Enter a Username. This username will be used to connect to the remote database and retrieve records for this data source.
  2. Enter a Password. This password will be used with the above username to connect to the remote database.
  3. You can then choose to enter Additional Connection String Options or Upload Certificates to connect to the database.
- Key Pair Authentication
  1. Enter a Username. This username will be used to connect to the remote database and retrieve records for this data source.
  2. Opt to enter the private key file password in the Additional Connection String Options. Use the following format: PRIV_KEY_FILE_PWD=<your_pw>.
  3. Click Select a File, and upload a Snowflake key pair file.
- Snowflake External OAuth
  1. Fill out the Token Endpoint, which is where the generated token is sent.
  2. Fill out the Client ID, which is the subject of the generated token.
  3. To use a certificate, keep the Use Certificate checkbox enabled and complete the steps below. You cannot pass a client secret if you use this method for obtaining the access token.
    Opt to fill out the Resource field with a URI of the resource where the requested token will be used.
    Enter the x509 Certificate Thumbprint. This identifies the corresponding key to the token and is often abbreviated as x5t or is called sub (Subject).
    Upload the PEM Certificate, which is the client certificate that is used to sign the authorization request.
  4. To pass a client secret, uncheck the Use Certificate checkbox and complete the fields below. You cannot use a certificate if you use this method for obtaining the access token.
    Scope (string): The scope limits the operations and roles allowed in Snowflake by the access token. See the Snowflake documentation for details about creating scopes for External OAuth.
    Client Secret (string): Immuta uses this secret to authenticate with the authorization server when it requests a token.
Click the Test Connection button.

Considerations

Immuta pushes down joins to be processed on the remote database when possible. To ensure this happens, make sure the connection information matches between data sources, including host, port, ssl, username, and password. You will see performance degradation on joins against the same database if this information doesn't match.
If a client certificate is required to connect to the source database, you can add it in the Upload Certificates section.

File naming convention

If you are uploading more than one file, ensure the certificate used for the OAuth authentication has the key name "oauth client certificate."

Select virtual population

Decide how to virtually populate the data source by selecting one of the options:

Create sources for all tables in this database: This option will create data sources and keep them in sync for every table in the dataset. New tables will be automatically detected and new Immuta views will be created.
Schema / Table: This option will allow you to specify tables or datasets that you want Immuta to register.
1. Opt to Edit in the table selection box that appears.
2. By default, all schemas and tables are selected. Select and deselect by clicking the checkbox for the schemas in the Import Schemas/Tables modal. You can create multiple data sources at one time by selecting an entire schema or multiple tables.
3. After making your selection(s), click Apply.

Enter basic information

Enter the SQL Schema Name Format to be the SQL name that the data source exists under in Immuta. It must include a schema macro but you may personalize it using lowercase letters, numbers, and underscores to personalize the format. It may have up to 255 characters.
Enter the Schema Project Name Format to be the name of the schema project in the Immuta UI. If you enter a name that already exists, the name will automatically be incremented. For example, if the schema project Customer table already exists and you enter that name in this field, the name for this second schema project will automatically become Customer table 2 when you create it.
1. When selecting Create sources for all tables in this database and monitor for changes you may personalize this field as you wish, but it must include a schema macro.
2. When selecting Schema/Table this field is prepopulated with the recommended project name and you can edit freely.
Select the Data Source Name Format, which will be the format of the name of the data source in the Immuta UI.
- <Tablename>: The data source name will be the name of the remote table, and the case of the data source name will match the case of the macro.
- <Schema><Tablename>: The data source name will be the name of the remote schema followed by the name of the remote table, and the case of the data source name will match the cases of the macros.
- Custom: Enter a custom template for the Data Source Name. You may personalize this field as you wish, but it must include a tablename macro. The case of the macro will apply to the data source name (i.e., <Tablename> will result in "Data Source Name," <tablename> will result in "data source name," and <TABLENAME> will result in "DATA SOURCE NAME").

Enable or disable schema monitoring

Schema monitoring best practices

Schema monitoring is a powerful tool that ensures tables are all governed by Immuta.

Consider using schema monitoring later in your onboarding process, not during your initial setup and configuration when tables are not in a stable state.
Consider using Immuta’s API to either run the schema monitoring job when your ETL process adds new tables or to add new tables.
Activate the new column added templated global policy to protect potentially sensitive data. This policy will null the new columns until a data owner reviews new columns that have been added, protecting your data to avoid data leaks on new columns getting added without being reviewed first.

When selecting the Schema/Table option, opt to enable Schema Monitoring by selecting the checkbox in this section.

Note: This step will only appear if all tables within a server have been selected for creation.

Opt to configure advanced settings

Although not required, completing these steps will help maximize the utility of your data source. Otherwise, click Create to save the data source.

Column detection

This setting monitors when remote tables' columns have been changed, updates the corresponding data sources in Immuta, and notifies Data Owners of these changes.

To enable, select the checkbox in this section.

See the Schema projects overview page to learn more about column detection.

Event time

Click the Edit button in the Event Time section.
Select the column(s).
Click Apply.

Selecting an Event Time column will enable

more statistics to be calculated for this data source including the most recent record time, which is used for determining the freshness of the data source.
the creation of time-based restrictions in the policy builder.

Latency

Click Edit in the Latency section.
Complete the Set Time field, and then select MINUTES, HOURS, or DAYS from the subsequent dropdown menu.
Click Apply.

Sensitive data discovery

Data owners can disable sensitive data discovery for their data sources in this section.

Click Edit in this section.
Select Enabled or Disabled in the window that appears, and then click Apply.

Data source tags

To add tags,

Click the Edit button in the Data Source Tags section.
Begin typing in the Search by Tag Name box to select your tag, and then click Add.

Tags can also be added after you create your data source from the data source details page on the overview tab or the data dictionary tab.

Create the data source

Click Create to register your data source.

Bulk Create Snowflake Data Sources

Private preview

This feature is only available to select accounts. Reach out to your Immuta representative to enable this feature.

Requirements

Snowflake Enterprise Edition
Snowflake X-Large or Large warehouse is strongly recommended

Create Snowflake data sources

Set the default subscription policy to None for bulk data source creation. This will simplify the data source creation process by not automatically applying policies.
Make a request to the Immuta V2 API create data source endpoint, as the Immuta UI does not support creating more than 1000 data sources. The following options must be specified in your request to ensure the maximum performance benefits of bulk data source creation. The Skip Stats Job tag is only required if you are using specific policies that require stats; otherwise, Snowflake data sources automatically skip the stats job.
```
"options": {
    "disableSensitiveDataDiscovery": true,
    "tableTags": [
        "Skip Stats Job"
    ]
}
```

Specifying disableSensitiveDataDiscovery as true ensures that sensitive data discovery will not be applied when the new data sources are created in Immuta, regardless of how it is configured for the Immuta tenant. Disabling sensitive data discovery improves performance during data source creation.

Applying the Skip Stats Job tag using the tableTag value will ensure that some jobs that are not vital to data source creation are skipped, specifically the fingerprint and high cardinality check jobs.

When the Snowflake bulk data source creation feature is configured, the create data source endpoint operates asynchronously and responds immediately with a bulkId that can be used for monitoring progress.

Monitor progress

To monitor the progress of the background jobs for the bulk data source creation, make the following request using the bulkId from the response of the previous step:

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    --data @example_payload.json
    https://your-immuta-url.com/jobs?bulkId=<your-bulkId>

The response will contain a list of job states and the number of jobs currently in each state. If errors were encountered during processing, a list of errors will be included in the response:

    {
      "total":"99893",
      "completed":"99892",
      "failed":"0",
      "pending":"1",
      "errors":null
    }

With these recommended configurations, bulk creating 100,000 Snowflake data sources will take between six and seven hours for all associated jobs to complete.

Create a Starburst (Trino) Data Source

Using OAuth authentication to create Starburst (Trino) data sources

If you are using OAuth or asynchronous authentication to create Starburst (Trino) data sources, work with your Immuta representative to configure the globalAdminUsername property. See the Starburst (Trino) reference page for details.

Enter connection information

Navigate to the Data Sources list page and click Register Data Source.
Select the Starburst (Trino) tile in the Data Platform section.
Complete these fields in the Connection Information box:
- Server: hostname or IP address
- Port: port configured for Starburst (Trino)
- SSL: when enabled, ensures communication between Immuta and the remote database is encrypted
- Catalog: the remote catalog
- Username: the username to use to connect to the remote database and retrieve records for this data source
- Password: the password to use with the above username to connect to the remote database
If you are using a proxy server with Starburst (Trino), specify it in the Additional Connection String Options:
```
UseProxy=1;ProxyHost=my.host.com;ProxyUID=your-username;ProxyPort=6789;ProxyPwd=your-password
```
Opt to Upload Certificates to connect to the database.
Click the Test Connection button.

Using OAuth authentication to create Starburst (Trino) data sources

Use SSL

Although not required, it is recommended that all connections use SSL. Additional connection string arguments may also be provided.

Considerations

Immuta pushes down joins to be processed on the remote database when possible. To ensure this happens, make sure the connection information matches between data sources, including host, port, ssl, username, and password. You will see performance degradation on joins against the same database if this information doesn't match.
If a client certificate is required to connect to the source database, you can add it in the Upload Certificates section.

Select virtual population

Decide how to virtually populate the data source by selecting one of the options:

Create sources for all tables in this database: This option will create data sources and keep them in sync for every table in the dataset. New tables will be automatically detected and new Immuta views will be created.
Schema / Table: This option will allow you to specify tables or datasets that you want Immuta to register.
1. Opt to Edit in the table selection box that appears.
2. By default, all schemas and tables are selected. Select and deselect by clicking the checkbox for the schemas in the Import Schemas/Tables modal. You can create multiple data sources at one time by selecting an entire schema or multiple tables.
3. After making your selection(s), click Apply.

Enter basic information

Enter the SQL Schema Name Format to be the SQL name that the data source exists under in Immuta. It must include a schema macro but you may personalize it using lowercase letters, numbers, and underscores to personalize the format. It may have up to 255 characters.
Enter the Schema Project Name Format to be the name of the schema project in the Immuta UI. If you enter a name that already exists, the name will automatically be incremented. For example, if the schema project Customer table already exists and you enter that name in this field, the name for this second schema project will automatically become Customer table 2 when you create it.
1. When selecting Create sources for all tables in this database and monitor for changes you may personalize this field as you wish, but it must include a schema macro.
2. When selecting Schema/Table this field is prepopulated with the recommended project name and you can edit freely.
Select the Data Source Name Format, which will be the format of the name of the data source in the Immuta UI.
- <Tablename>: The data source name will be the name of the remote table, and the case of the data source name will match the case of the macro.
- <Schema><Tablename>: The data source name will be the name of the remote schema followed by the name of the remote table, and the case of the data source name will match the cases of the macros.
- Custom: Enter a custom template for the Data Source Name. You may personalize this field as you wish, but it must include a tablename macro. The case of the macro will apply to the data source name (i.e., <Tablename> will result in "Data Source Name," <tablename> will result in "data source name," and <TABLENAME> will result in "DATA SOURCE NAME").
Enter the SQL Table Name Format, which will be the format of the name of the table in Immuta. It must include a table name macro, but you may personalize the format using lowercase letters, numbers, and underscores. It may have up to 255 characters.

Enable or disable schema monitoring

Schema monitoring best practices

Schema monitoring is a powerful tool that ensures tables are all governed by Immuta.

Consider using schema monitoring later in your onboarding process, not during your initial setup and configuration when tables are not in a stable state.
Consider using Immuta’s API to either run the schema monitoring job when your ETL process adds new tables or to add new tables.
Activate the new column added templated global policy to protect potentially sensitive data. This policy will null the new columns until a data owner reviews new columns that have been added, protecting your data to avoid data leaks on new columns getting added without being reviewed first.

When selecting the Schema/Table option, you can opt to enable Schema Monitoring by selecting the checkbox in this section.

Note: This step will only appear if all tables within a server have been selected for creation.

Opt to configure advanced settings

Although not required, completing these steps will help maximize the utility of your data source. Otherwise, click Create to save the data source.

Column detection

This setting monitors when remote tables' columns have been changed, updates the corresponding data sources in Immuta, and notifies Data Owners of these changes.

To enable, select the checkbox in this section.

See the Schema projects overview page to learn more about column detection.

Event time

Click the Edit button in the Event Time section.
Select the column(s).
Click Apply.

Selecting an Event Time column will enable

more statistics to be calculated for this data source including the most recent record time, which is used for determining the freshness of the data source.
the creation of time-based restrictions in the policy builder.

Latency

Click Edit in the Latency section.
Complete the Set Time field, and then select MINUTES, HOURS, or DAYS from the subsequent dropdown menu.
Click Apply.

Sensitive data discovery

Data owners can disable sensitive data discovery for their data sources in this section.

Click Edit in this section.
Select Enabled or Disabled in the window that appears, and then click Apply.

Data source tags

To add tags,

Click the Edit button in the Data Source Tags section.
Begin typing in the Search by Tag Name box to select your tag, and then click Add.

Tags can also be added after you create your data source from the data source details page on the overview tab or the data dictionary tab.

Create the data source

Click Create to save the data source(s).

Data Source Settings

Once a data source is created, the data owner can manage data source policies, members, data dictionary, and tags.

The reference and how-to guides in this section cover topics related to managing existing data sources.

How-to guides

Manage data source settings: Edit data source settings or disable and delete a data source.
Manage data source members: Add, remove, or modify users on a data source.
Manage data source access requests: Approve and deny subscriptions requests on data source.
Disable data sampling: Disable metadata collection that requires sampling data.
Data dictionary: Manage the data dictionary descriptions and tags.

Reference guide

Data source health checks: This reference guide defines the data source health check jobs that are run when a data source is created.

How-to Guides

Manage Data Source Settings

As a data owner, you can edit your data source settings and disable, delete, and re-enable a data source.

For other guides related to data source members and management, see the Related guides section.

Bulk edit data sources

Data owners can bulk edit data sources.

Navigate to the data sources list page.
Select the checkboxes for the data sources you want to edit. Note that when editing a connection string using bulk edit, all data sources from that connection must be selected.
Select the action you want or click More Actions for additional options.
Confirm your edits by following the prompts in the modals that appear.

Disable a data source

Disabling a data source hides it and its data from all users except the data owner. While in this state, the data source will display as disabled in the console for the data owner and other users will not be able to see it at all.

Navigate to the data source.
Click on the more actions icon and select Disable.

A label will appear next to the data source indicating it is now disabled, and a notification will be sent to all users of the data source informing them that the data source has been disabled.

Enable a disabled data source

Navigate to the data source.
Click on the more actions icon and select Enable.

A notification will be sent out to all users of the data source informing them that the data source has been enabled.

Delete a data source

Deleting a data source permanently removes it from Immuta. Data sources must first be disabled before they can be deleted.

Disable the data source.
Navigate to the data source, click the more actions icon and select Delete.
Confirm that the data source should be deleted by clicking Delete.

A notification will be sent out to all users of the data source informing them that the data source has been deleted.

Reference guides

For information about data sources and policies, see the following guides:

How-to guides

In addition to adding and managing data source settings as outlined above, data owners can manage data source

Manage Data Source Members

In addition to creating and managing data sources, data owners can add and manage data source members manually. While this is supported, it is not recommended and instead it is much more scalable to manage user access through subscription policies

For other guides related to data source members and management, see the Related guides section.

Add members to a data source

Navigate to the data source and click the Members tab.
Click Add Members and enter the group name or username.
Select their Role:
- Subscriber: The role can have read or write access to the table. This role is only available if there are read access policies on the data source.
- Owner: The role can manage data source members and policies and have read or write access to the table.
- Expert: The role can manage the data dictionary descriptions and have read or write access to the table. This role is only available if there are read access policies on the data source.
You can also opt to specify an expiration date for when the user’s access should expire.
Select Read or Write from the Access Grant dropdown. This option is only available if write policies have been enabled.
Click Add.

Bulk add users to multiple data sources

Navigate to the data sources list page.
Select the data sources you want to add users to by clicking the checkbox next to the data source.
Select Add Users.
In the modal, type the user name or group name and select the user or group you want to add from the dropdown menu.
Opt to set an Expiration for the users' subscriptions. Additionally, you can change the role from Subscriber to Expert or Owner for the users or groups using the dropdown menu in the Role column.
Click Add. All users and groups will be added to the data sources you selected.

Set user access expiration date for a data source

As a data owner, you can limit the amount of time a user or group has access to your data source by setting an access expiration date.

Navigate to the Members tab.
Adjust the number of days under the Expires column for the user/group whose access you want to limit (the limit is counting from today, so users/groups with 0 days left means their access will be revoked by the end of today and users with 1 day left means their access will be revoked by the end of tomorrow).
Save your changes.

To remove the limit (or set the limit to Never), delete the number from the field and save your changes.

Modify user or group roles within a data source

Navigate to the Members tab.
Click the drop-down arrow under the Role column next to the user/group whose role you’d like to change.
Select another role (subscribed, expert, owner or ingest user, if applicable).

Notifications about the change will be sent to the affected users and groups (as well as alternative Owners).

View user or group subscription history

Navigate to the Members tab.
Click the Name of the user or group whose history you want to review.

Remove users or groups from a data source

As a data owner, you can deny access to any users or groups at any time.

Navigate to the Members tab.
To remove a user or group from a data source, click Deny in the Actions column next to the user or group you want to remove.
Complete the Deny Access form, including a reason for revoking the access.

This action will immediately update users' or groups' subscription status, and they will no longer have any access to the data source. Notifications will be sent to the affected users (as well as alternative data owners) informing them of the change in subscription status.

Reference guide

For information about data source members and subscriptions, see the data source user roles section.

How-to guides

In addition to adding and managing data source members as outlined above, data owners can manage data source

column tags
data dictionaries
settings

Manage Access Requests and Tasks

Your outgoing and incoming requests are consolidated on the requests tab on your user profile page. Similar to notifications, a red dot displays on the request icon whenever you have pending requests. The sections below guide you through managing these requests.

Manage access requests

Navigate to your Profile page, and then click the Requests tab. The names of the users who have submitted requests are displayed in the Requests section. Once a user is selected, the corresponding Pending Requests are displayed.
To view more information about the request, click the Details button in the Actions column of a request.
Click the Approve or Deny button in the Actions column of the request.

Bulk approvals

To approve or deny multiple access requests simultaneously,

Navigate to your Profile page, and then click the Requests tab.
Select the checkbox next to each request you want to address, and then click the Approve Selected or Deny Selected button.

Manage data source requests

If a policy that includes the New tag is active and schema monitoring is enabled or you have registered a connection, Immuta applies a New tag to new data sources, new columns, or changed columns and sends data owners a request to validate those changes.

Navigate to your Profile page, and then click the Requests tab.
Click the approvals count in the Request Information column to view information about the change to the data source. The change will be one of the following:
- Column added
- Column changed
- Column deleted
- Data source created
After verifying the change, click Validate.

For more information about these requests, see the Schema monitoring guide or the Connections guide.

Manage unmask requests

Deprecation notice

Support for this feature has been deprecated.

If users make an unmask request, a tasks tab will appear for the data source listing the target and requesting users, the task type, and the state of the task. From this tab, users can view and manage two different task views:

Your Created Tasks: This page lists the status and information of the unmask requests you've submitted.
Tasks For You: This page lists the status and information of the unmask requests that have been submitted to you.

To complete a task,

Navigate to the Tasks tab from the Data Source Overview page, and then click the Tasks For You toggle.
Click the Unmask Values icon in the Actions column of the task.
A dialog box will appear with the masked and unmasked value. Note: You can view information about this request, including the reason for the request and the date is was created, by clicking the Task Info button in the Actions column.

To delete a task, click Delete Task in the Actions column of the relevant task.

How-to guides

In addition to managing data source requests as outlined above, data owners can manage data source

column tags
data dictionaries
policies
members
settings

Manage Data Dictionary Descriptions

The data dictionary provides information about the columns within the data source, including column names and value types.

As a data owner, you can manage data dictionary descriptions and column tags. For other guides related to the data dictionary, see the Related guides section.

Manage data dictionary descriptions

Navigate to the Data Dictionary tab.
To add or edit column descriptions, click the menu icon in the Actions column next to the entry you want to change and select Edit.
Complete the fields in the form that appears, and then click Save.

Reference guide

For information about the data dictionary, see the Data sources in Immuta overview.

How-to guide

In addition to managing data dictionary descriptions as outlined above, data owners or experts can also manage column tags.

Disable Immuta from Sampling Raw Data

If you want to disable the metadata collection that requires sampling data, you must

Stop all data source health checks.
Add the Skip Stats Job tag to all data sources.

These steps will ensure that Immuta queries no data, under any circumstances. Without this sample data, some Immuta features will be unavailable. Sensitive data discovery (SDD) cannot be used to automatically detect sensitive data in your data sources, and the following masking policies (which are only available in the Snowflake integration) will not work:

Masking with format preserving masking
Masking with k-anonymization
Masking using randomized response

Data Source Health Checks

Reach out to your Immuta representative to disable health checks on all data sources.

Skip Stats Job Tag

Tag each data source with the seeded Skip Stats Job tag to stop Immuta from collecting a sample and running table stats on the sample. You can tag data sources as you create them in the UI or via the Immuta API.

Note that data sources automatically skip the stats job upon registration, without the Skip Stats Job tag, as long as there are no active policies requiring them. The following policies require stats:

Column masking with randomized response
Column masking with format preserving masking
Column masking with k-anonymization
Column masking with rounding
Column masking with reversibility
Row minimization

Data Source Health Checks Reference Guide

When an Immuta data source is created, background jobs use the connection information provided to compute health checks dependent on the type of data source created and how it was configured. These data source health checks include the

blob crawl status: indicates whether the blob was successfully crawled.
column detection status: indicates whether the job run to determine if a column was added or removed from the remote table registered as an Immuta data source was successful.
external catalog link status: indicates whether or not the external catalog was successfully linked to the data source.
fingerprint generation status: indicates whether or not the data source fingerprint was successfully generated. Fingerprints are only available for Snowflake data sources.
framework classification status: indicates whether classification was successfully run on the data source to determine the sensitivity of the data source.
global policy applied status: indicates whether global policies were successfully applied to the data source.
high cardinality calculation status: indicates whether the data source's high cardinality column was successfully calculated.
SQL sync status (for Snowflake data sources): indicates whether Snowflake governance policies have been successfully synced.
SQL view creation status (for Redshift data sources): indicates whether views were properly created for Redshift tables registered in Immuta.
row count status: indicates whether the number of rows in the data source was successfully calculated.
schema detection status: indicates whether the job run to determine if a remote table was added or removed from the schema was successful.
sensitive data discovery status: indicates whether sensitive data discovery was successfully run on the data source.

After these jobs complete, the health status for each is updated to indicate whether the status check passed, was skipped, is unknown, or failed.

These background jobs can be disabled during data source creation by adding a specific tag to prevent automatic table statistics. This prevent statistics tag can be set on the app settings page by a system administrator. However, with automatic table statistics disabled these policies will be unavailable until the data source owner manually generates the fingerprint (available only for Snowflake data sources):

Masking with format preserving masking
Masking with k-anonymization
Masking using randomized response

Unhealthy Databricks data sources

Unhealthy data sources may fail their row count queries if they run against a cluster that has the Databricks query watchdog enabled.

Limitations

Data sources with over 1600 columns will not have health checks run, but will still appear as healthy. The health check cannot be run automatically or manually.

Schema Monitoring

With schema monitoring enabled, Immuta monitors your organization's servers to find when new tables or columns are created or deleted and automatically registers (or disables) those tables in Immuta.

How-to guides

Manage schema monitoring: Edit connection information, schema project owner, or the naming conventions of data registered in the schema.
Run schema monitoring jobs: Manually trigger schema monitoring.

Reference guides

Schema monitoring: This reference guide describes the design and components of schema monitoring.
Schema projects: This reference guide describes schema projects, which group all the data sources of a schema.

Concept guide

Why use schema monitoring?: This explanatory guide provides a conceptual overview of schema monitoring. It offers a discussion of the benefits of the feature, context for why it was developed, and insights into the features schema monitoring pairs with. This guide is designed to deepen your understanding of schema monitoring's purpose as you implement it.

How-to Guides

Manage Schema Monitoring

Edit Schema Project Connection

Requirement: Must be an owner of the schema project

Navigate to the Project Overview tab.
Click Edit Connection.
Use the Connection Information modal to make any necessary changes.
Click Save.

Edit Schema Monitoring Naming Convention

Requirement: Must be an owner of the schema project

Navigate to the Project Overview tab.
Click Edit Schema Monitoring.
Use the Basic Information modal to make any necessary changes to naming formats.
Click Save.

Add New Schema Monitoring Owner

Requirement: Must be an owner of the schema project

Navigate to the Project Overview tab.
Click Edit Schema Monitoring.
Use the dropdown menu in the Schema Monitoring modal to select a new schema detection owner. The new owner must be an owner of one or more of the data sources belonging to that schema.
Click Save.

Run Schema Monitoring and Column Detection Jobs

Manually Run Schema Monitoring Jobs

Manually Run Schema Monitoring Job for All Data Sources

Requirement: Immuta permission USER_ADMIN

You can manually run a schema monitoring job globally using the /dataSource/detectRemoteChanges endpoint of the Immuta API with an empty payload.

Manually Run Schema Monitoring Job as a Data Owner

You can manually run a schema monitoring job for all data sources that you own using the /dataSource/detectRemoteChanges endpoint of the Immuta API with a payload containing the hostname for your data sources or their individual IDs.

Manually Run Schema Monitoring Job as a Data User

You can manually run a schema monitoring job for data sources you are subscribed to using the /dataSource/detectRemoteChanges endpoint of the Immuta API with a payload containing the hostname for your data source and the table name or data source ID.

Manually Run a Column Detection Job

Navigate to the data source overview page.
Click on the health check icon.
Scroll to Column Detection, and click Trigger Detection.

Reference Guides

Schema Monitoring

Schema monitoring allows organizations to monitor their data environments. When it is enabled, Immuta monitors the organization's servers to detect when new tables or columns are created or deleted, and automatically registers (or disables) those tables in Immuta. These newly updated data sources will then have any global policies and tags that are set in Immuta applied to them. The Immuta data dictionary will be updated with any column changes, and the Immuta environment will be in sync with the organization's data environment. This automated process helps organizations keep compliant without the need to manually keep data sources up to date.

Schema monitoring is enabled while creating or editing a data source and only registers new tables and columns within known schemas. It does not register new schemas. Data owners or governors can edit the naming convention for newly detected data sources and the schema detection owner from the schema project page after it has been enabled.

See the Register a data source guides for instructions on enabling schema monitoring or Manage schema monitoring for instructions on editing the schema monitoring settings.

Column detection

Column detection is a part of schema monitoring, but can also be enabled on its own to detect the column changes of a select group of tables. Column detection monitors when columns are added or removed from a table and when column types are changed and updates those changes in the appropriate Immuta data source's data dictionary.

See one of the Register a data source guides for instructions on enabling column detection.

Tracking new data sources and columns

When new data sources and columns are detected and added to Immuta, or when column types have changed, they will always automatically be tagged with the New tag. This allows governors to use the seeded New Column Added global policy to mask columns with the New tag, since they could contain sensitive data.

The New Column Added global policy is staged (inactive) by default.

See the Clone, activate, or stage a global policy guide to activate this seeded global policy if you want any columns with the New tag to be automatically masked.

Data source requests

When schema monitoring is enabled and there is an active policy that targets the New tag, Immuta sends validation requests to data owners for the following changes made in the remote data platform:

Column added: Immuta applies the New tag on the column that has been added and sends a request to the data owner to validate if the new column contains sensitive data. Once the data owner confirms they have validated the content of the column, Immuta removes the New tag from it and as a result any policy that targets the New column tag no longer applies.
Column data type changed: Immuta applies the New tag on the column where the data type has been changed and sends a request to the data owner to validate if the column contains sensitive data. Once the data owner confirms they have validated the content of the column, Immuta removes the New tag from it and as a result any policy that targets the New column tag no longer applies.
Column deleted: Immuta deletes the column from the data source's data dictionary in Immuta. Then, Immuta sends a request to the data owner to validate the deleted column.
Data source created: Immuta applies the New tag on the data source that has been newly created and sends a request to the data owner to validate if the new data source contains sensitive data. Once the data owner confirms they have validated the content of the data source, Immuta removes the New tag from it and as a result any policy that targets the New data source tag no longer applies.

For instructions on how to view and manage your assigned tasks in the Immuta UI, see the Manage data source requests guide. To view and manage your assigned tasks via the Immuta API, see the Manage data source requests section of the API documentation.

Workflow

Immuta user registers a data source with schema monitoring enabled.
Every 24 hours, at 12:30 a.m. UTC by default, Immuta checks the servers for any changes to tables and columns.
If Immuta finds a change, it will update the appropriate Immuta data source or column:
1. If Immuta finds a new table, then Immuta creates an Immuta data source for that table and tags it New.
2. If Immuta finds a table has been deleted, then Immuta disables that table's data source.
3. If Immuta finds a previously deleted table has been re-created, then Immuta restores that table's data source and tags it New.
4. If Immuta finds that the backing object type of a data source has been changed (for example, from a TABLE to a VIEW) in Snowflake or Databricks Unity Catalog, Immuta will reapply existing policies on the data source. Note that because of policy limitations on Unity Catalog views, changing a Databricks Unity Catalog object type from a table to a view could result in some types of data policies being removed. See the Databricks Unity Catalog integration reference guide for a list of data policies that are not supported for views.
5. If Immuta finds a new column within a table, then Immuta adds that column to the data dictionary and tags it New.
6. If Immuta finds a column has been deleted, then Immuta deletes that column from the data dictionary.
7. If Immuta finds a column type has changed, then Immuta updates the column type in the data dictionary and tags it New.
Active policies that target the New data source or column tag will be applied until a data owner validates the changes.

To run schema monitoring or column detection manually, see the Run schema monitoring and column detection jobs page.

Schedule

The default schedule for schema monitoring to run is every 24 hours. Some organizations may need to schedule it to run more often; however, this needs careful consideration as it can impact performance and compute costs.

Schema monitoring best practices

Manually trigger schema monitoring (filtered down to the database) after your dbt or other transform workflows run. For more information, see the dbt and transform workflow for limited policy downtime guide.
When manually triggering schema monitoring, specify a table or database for maximum performance efficiency and to reduce data or policy downtime. For more information on triggering schema monitoring, see the Manually run schema monitoring guide.
If you are manually managing data tags, activate the "New Column Added" global policy to protect newly found and potentially sensitive data. This policy sets all columns with the tag New to NULL until a data owner reviews and validates their content. Using this workflow protects your data and avoids data leaks on new columns getting automatically added. This recommendation is unnecessary for users leveraging sensitive data discovery (SDD) or using an external data catalog.

Schema Projects

Schema projects are automatically created and managed by Immuta. They group all the data sources of the schema, and when new data sources are created, manually or with schema monitoring, they are automatically added to the schema project. They work as a tool to organize all the data sources within a schema, which is particularly helpful with schema monitoring enabled.

Schema projects are created when tables are registered as data sources in Immuta. The user creating the data source does not need the CREATE_PROJECT permission to have the project auto-create because no data sources can be added by the owner. Instead, new data sources are managed by Immuta. The user can manage Subscription policies for schema projects, but they cannot apply Data policies or purposes to them.

The schema settings, such as schema evolution and connection information, can be edited from the project overview tab. Note: Deleting the project will delete all of the data sources within it as well.

Schema Project Actions

Schema settings are edited from the project overview tab:

Schema Project Connection Details: Editing these details will update them for all the data sources within the schema project.
Data Source Naming Convention: When schema monitoring is enabled, new data sources will be automatically detected and added to the schema project. Updating the naming convention will change how these newly detected data sources are named by Immuta.
Schema Detection Owner: When schema monitoring is enabled, a user is assigned to be the owner of any detected and Immuta created data source.
Disable or delete your schema project: Deleting the project will delete all of the data sources within it as well.

Why Use Schema Monitoring Concept Guide

Immuta is a live metadata aggregator - metadata about your data and your users. With data metadata specifically, Immuta can monitor changes in your database and reflect those changes in your Immuta tenant through schema monitoring.

When schema monitoring is enabled, Immuta monitors your organization's servers to identify when new tables or columns are created or deleted, and automatically registers (or disables) those tables in Immuta. The newly updated data sources then have global policies and tags applied to them, and the Immuta data dictionary is updated with column changes.

Schema monitoring keeps Immuta in sync with your data environment, helping you remain compliant without having to manually update individual data sources.

Anti-patterns: Using Immuta without schema monitoring

Without schema monitoring, data owners have to manually add and remove Immuta data sources when users add or remove tables from databases in their data platforms. At worst, data owners are not aware of these changes; at best they are aware of the changes and have to manually update Immuta with those changes, which is a time-consuming, error-prone process.

Beyond draining data owners' time, manually updating data sources to reflect the state of the data platform also complicates the process: not only must they understand when a new table is present, but they then must remember to tag it and protect it appropriately. This leaves organizations ripe for data leaks as new data is created across the business, perhaps daily.

Schema monitoring, by contrast, is scalable and accounts for the evolution of your schemas and policies. Instead of manually managing access to these tables or adding and removing data sources, you are empowered to register a schema, create policies, and allow Immuta to manage those policies and changes to your schema for you to keep your data in sync and restrict access appropriately.

Business value

Both monitoring for new data and discovering and tagging sensitive data align with the concepts of scalability and evolvability, removing redundant and arduous work. Once tables are registered and tagged, policies can immediately be applied - this means humans can be completely removed from the process by creating tag-based policies that dynamically apply themselves to new tables.

Then, your business reaps the following benefits:

Increased revenue: Accelerate data access and time-to-data access because where sensitive data lives is well understood.
Decreased cost: Operate efficiently and move with agility at scale.
Decreased risk: Discover and protect sensitive data immediately.

What features does it pair with?

Schema monitoring pairs with the following features:

Column detection: Column detection identifies when a column has been added to or removed from a table and adds or removes that column from the data source in Immuta.
New column added templated global policy: When paired with column detection or schema monitoring, this policy locks down access to those newly added columns and tables to prevent data leaks.
Sensitive data discovery: When the tables are discovered through the registration process, Immuta evaluates the table data for sensitive information and tags it as such. These tags are critical for scaling tag-based policies.
Global data and subscription policies: Global data and subscription policies can be created using tags so that they immediately enforce appropriate access restrictions on tables and columns when they are added.