This feature is being gradually rolled out to customers and may not be available to your account yet.
Requirement: Immuta permission CREATE_DATA_SOURCE
Prerequisite: Host registered using the enhanced onboarding and data registration workflow for Snowflake or Databricks Unity Catalog
Click Data and select the Infrastructure tab in the navigation menu.
Select the host.
Click the user action menu and select Re-crawl Host.
Click Data and select the Infrastructure tab in the navigation menu.
Select the host.
Click the menu icon in the Action column for the database you want to crawl and select Re-crawl Database.
Click Data and select the Infrastructure tab in the navigation menu.
Select the host.
Select the database in the data objects table.
Click the menu icon in the Action column for the schema you want to crawl and select Re-crawl Schema.
Click Data and select the Infrastructure tab in the navigation menu.
Select the host.
Select the database and then the schema in the data objects table.
Click the menu icon in the Action column for the table you want to crawl and select Re-crawl Table.
This feature is being gradually rolled out to customers and may not be available to your account yet.
The enhanced onboarding and data source registration workflow allows you to register your data at the host level, making data registration more scalable for your organization. Instead of registering schema and databases individually, you can register them all at once and allow Immuta to monitor your host for changes so that data sources are added and removed automatically to reflect the state of data on your host.
Once you register your host, Immuta presents a hierarchical view of your data that reflects the hierarchy of objects in your data platform:
Host: This first tier represents your server or data platform account.
Folder: This second tier represents your database or schema (depending on the structure of your remote platform).
Data source: This third tier represents individual data objects within your schema or database.
For example, the following object hierarchy for Snowflake hosts would be displayed on the Immuta infrastructure page:
Host
Database
Schema
Data source
Beyond making the registration of your data more intuitive, enhanced onboarding provides more control. Instead of performing operations on individual schemas or tables, you can perform operations (such as object discovery) at the host level.
See the Snowflake or Databricks Unity Catalog host registration how-to guides for a list of requirements.
In this enhanced onboarding workflow, you configure the integration and register data sources simultaneously. Once you save your configuration, Immuta manages and applies Snowflake or Unity Catalog governance features to data registered in Immuta.
Then, Immuta crawls your host to register all tables within every schema and database that the Snowflake role or Databricks account credentials you provided during the configuration has access to. The object metadata, user metadata, and policy definitions are stored in the Immuta metadata database, and this metadata is used to enforce policies for users accessing this data.
After initial registration, your host can be crawled in two ways:
Periodic crawl: This crawl happens once every 24 hours. Currently, updating this schedule is not configurable.
Manual crawl: You can manually trigger a crawl of your host.
During these subsequent crawls of your host, Immuta identifies tables, schemas, or databases that have been added or removed. If tables are added, new data sources are created in Immuta. If remote tables are deleted, the corresponding data sources will be disabled in Immuta.
For more information about the Snowflake or Databricks Unity Catalog integration and and how policies are enforced, see the Snowflake integration reference guide or Databricks Unity Catalog integration reference guide.
When there is an active policy that targets the New
tag, Immuta sends validation requests to data owners for the following changes made in the remote data platform:
Column added: Immuta applies the New
tag on the column that has been added and sends a request to the data owner to validate if the new column contains sensitive data. Once the data owner confirms they have validated the content of the column, Immuta removes the New
tag from it and as a result any policy that targets the New
column tag no longer applies.
Column data type changed: Immuta applies the New
tag on the column where the data type has been changed and sends a request to the data owner to validate if the column contains sensitive data. Once the data owner confirms they have validated the content of the column, Immuta removes the New
tag from it and as a result any policy that targets the New
column tag no longer applies.
Column deleted: Immuta deletes the column from the data source's data dictionary in Immuta. Then, Immuta sends a request to the data owner to validate the deleted column.
Data source created: Immuta applies the New
tag on the data source that has been newly created and sends a request to the data owner to validate if the new data source contains sensitive data. Once the data owner confirms they have validated the content of the data source, Immuta removes the New
tag from it and as a result any policy that targets the New
data source tag no longer applies.
For instructions on how to view and manage your tasks and requests in the Immuta UI, see the Manage access requests guide. To view and manage your tasks and requests via the Immuta API, see the Manage data source requests section of the API documentation.
When registering a host, Immuta sets the configuration to the recommended default settings to protect your . The recommended settings are described below:
Infrastructure object discovery: This setting allows Immuta to monitor schemas for changes. When Immuta identifies a new table, a data source will automatically be created. Similarly, if remote tables are deleted, the corresponding data sources will be disabled. This setting is enabled by default.
Default run schedule: This sets the time interval for Immuta to check for new objects. By default, this schedule is set to 24 hours.
Sensitive data discovery: This setting enables sensitive data discovery and allows you to select the sensitive data discovery framework that Immuta will apply to your data objects. This setting is enabled by default to use the preconfigured or global framework.
Impersonation: This setting enable and defines the role for user impersonation in Snowflake. User impersonation is not supported in the Databricks Unity Catalog integration. This setting is disabled by default.
Project workspaces: This setting enables Snowflake project workspaces. If you use Snowflake secure data sharing with Immuta, enable this setting, as project workspaces are required. If you use Snowflake table grants, disable this setting; project workspaces cannot be used when Snowflake table grants are enabled. Project workspaces are not supported in the Databricks Unity Catalog integration. This setting is disabled by default.
Unregistering a host automatically deletes all of its child objects in Immuta. However, Immuta will not remove the objects in your Snowflake or Databricks account.
Users can currently register a host, unregister a host, and update the connection information for a host.
Snowflake and Databricks Unity Catalog are currently the only integrations that support the simplified data registration workflow.
Databricks Unity Catalog:
Only managed and external tables will be registered as data sources.
Delta shares are unsupported.
This feature is being gradually rolled out to customers and may not be available to your account yet.
No Databricks Unity Catalog integrations can already be configured in Immuta. If your Databricks Unity Catalog integration is already configured on the app settings page, register your data sources using the legacy method.
Several different accounts are used to set up and maintain the Databricks Unity Catalog integration. The permissions required for each are outlined below.
Immuta account (required): This user configures the integration and registers the host. This user needs the following permission:
CREATE_DATA_SOURCE
Immuta permission
Databricks service principal (required): This service principal is used continuously by Immuta to orchestrate Unity Catalog policies and maintain state between Immuta and Databricks. This service principal needs the following Databricks privileges:
OWNER
permission on the Immuta catalog you configure.
OWNER
privilege on one of the securables below so that Immuta can administer Unity Catalog row-level and column-level security controls.
on catalogs with schemas and tables registered as Immuta data sources. This permission could also be applied by granting OWNER
on a catalog to a Databricks group that includes the Immuta service principal to allow for multiple owners.
on schemas with tables registered as Immuta data sources.
on all tables registered as Immuta data sources - if the OWNER
permission cannot be applied at the catalog- or schema-level. In this case, each table registered as an Immuta data source must individually have the OWNER
permission granted to the Immuta service principal.
USE CATALOG
and USE SCHEMA
on parent catalogs and schemas of tables registered as Immuta data sources so that the Immuta service principal can SELECT
and MODIFY
securables within the parent catalog and schema.
SELECT
and MODIFY
on all tables registered as Immuta data sources so that the Immuta service principal can grant and revoke access to tables and apply Unity Catalog row- and column-level security controls.
Databricks account (required): This user account can manually configure the integration in Databricks to create the Immuta-managed catalog. To do so, this account requires the following Databricks privileges:
CREATE CATALOG
on the Unity Catalog metastore
ACCOUNT ADMIN
on the Unity Catalog metastore for native query audit (optional)
Click the App Settings icon in the navigation menu.
Scroll to the Native Integration Settings section and check the Enable Databricks Unity Catalog support in Immuta checkbox. The additional settings in this section are only relevant to the Databricks Spark with Unity Catalog integration and will not have any effect on the Unity Catalog integration. These can be left with their default values.
Click Save and confirm your changes.
Click Data and select the Infrastructure tab in the navigation menu.
Click the + Add Host button.
Select the Databricks data platform tile.
Enter the host connection information:
Host: The hostname of your Databricks workspace.
Port: Your Databricks port.
HTTP Path: The HTTP path of your Databricks cluster or SQL warehouse.
Immuta Catalog: The name of the catalog Immuta will create to store internal entitlements and other user data specific to Immuta. This catalog will only be readable for the Immuta service principal and should not be granted to other users. The catalog name may only contain letters, numbers, and underscores and cannot start with a number.
Connection Key: A unique name for your host. This connection key will be used to create data source names for this host.
Click Next.
Select Access Token authentication method from the dropdown menu.
Enter the Access Token in the Immuta System Account Credentials section. This is the access token for the Immuta service principal. This service principal must have the metastore privileges listed in the requirements section at the top of this page for the metastore associated with the Databricks workspace. If this token is configured to expire, update this field regularly for the integration to continue to function. This authentication information will be included in the script populated later on the page.
Copy the provided script and run it in Databricks as a user with the CREATE CATALOG
privilege on the Unity Catalog metastore.
Click Validate Connection.
If the connection is successful, click Next. If there are any errors, check the connection details and credentials to ensure they are correct and try again.
Ensure all the details are correct in the summary and click Complete Setup.
The enhanced onboarding and data source registration workflow allows you to register your data at the host level, making data registration more scalable for your organization. Instead of registering schema and databases individually, you can register them all at once and allow Immuta to monitor your host for changes so that data sources are added and removed automatically to reflect the state of data on your host.
Connect a Snowflake host: Configure a Snowflake integration and register a host.
Connect a Databricks host: Configure a Databricks Unity Catalog integration and register a host.
Crawl a host or object: Trigger a manual crawl of a host or object to sync the infrastructure of your remote data platform with Immuta.
Enhanced onboarding and data source registration: This reference guide discusses the major concepts, design, and settings of the enhanced onboarding and data source registration workflow.
This feature is being gradually rolled out to customers and may not be available to your account yet.
Requirements:
Immuta permission CREATE_DATA_SOURCE
USAGE
Snowflake privilege on the schema and database to register data sources
No Snowflake integrations configured in Immuta. If your Snowflake integration is already configured on the app settings page, register your data sources using the legacy method.
To register a Snowflake host with all its schemas and data sources, follow the instructions below.
Click Data and select the Infrastructure tab in the navigation menu.
Click the + Add Host button.
Select the Snowflake data platform tile.
Enter the host connection information:
Host: The URL of your Snowflake account.
Port: Your Snowflake port.
Warehouse: The warehouse the Immuta system account user will use to run queries and perform Snowflake operations.
Immuta Database: The new, empty database for Immuta to manage. This is where system views, user entitlements, row access policies, column-level policies, procedures, and functions managed by Immuta will be created and stored.
Role: The default Snowflake role for the Immuta system account user.
Connection Key: A unique name for your host. This connection key will be used to create data source names for this host.
Click Next.
Select an authentication method from the dropdown menu. This authentication information will be included in the script populated later on the page.
Username and password: Choose one of the following options.
Select Immuta Generated to have Immuta populate the system account name and password.
Select User Provided to enter your own name and password for the Immuta system account.
Snowflake External OAuth:
Fill out the Token Endpoint, which is where the generated token is sent. It is also known as aud
(audience) and iss
(issuer).
Fill out the Client ID, which is the subject of the generated token. It is also known as sub
(subject).
Opt to fill out the Resource field with a URI of the resource where the requested token will be used.
Enter the x509 Certificate Thumbprint. This identifies the corresponding key to the token and is often abbreviated as x5t
or is called kid
(key identifier).
Upload the PEM Certificate, which is the client certificate that is used to sign the authorization request.
Key Pair Authentication:
Complete the Username field. This username will be used to connect to the remote database and retrieve records for this data source.
If using a private key, enter the Private Key Password.
Click Select a File, and upload a Snowflake key pair file.
The Role is prepopulated from the entry on the previous page.
Copy the provided script and run it in Snowflake with the following Snowflake permissions:
CREATE DATABASE ON ACCOUNT WITH GRANT OPTION
CREATE ROLE ON ACCOUNT WITH GRANT OPTION
CREATE USER ON ACCOUNT WITH GRANT OPTION
MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION
APPLY MASKING POLICY ON ACCOUNT WITH GRANT OPTION
APPLY ROW ACCESS POLICY ON ACCOUNT WITH GRANT OPTION
Click Test Connection.
If the connection is successful, click Next. If there are any errors, check the connection details and credentials to ensure they are correct and try again.
Ensure all the details are correct in the summary and click Complete Setup.