Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
This feature is being gradually rolled out to customers and may not be available to your account yet.
Requirements:
Immuta permission CREATE_DATA_SOURCE
USAGE
Snowflake privilege on the schema and database to register data sources
REFERENCES
Snowflake privilege on all tables
No Snowflake integrations configured in Immuta. If your Snowflake integration is already configured on the app settings page, register your data sources using the legacy method.
To register a Snowflake host with all its schemas and data sources, follow the instructions below.
Click Data and select the Infrastructure tab in the navigation menu.
Click the + Add Host button.
Select the Snowflake data platform tile.
Enter the host connection information:
Host: The URL of your Snowflake account.
Port: Your Snowflake port.
Warehouse: The warehouse the Immuta system account user will use to run queries and perform Snowflake operations.
Immuta Database: The new, empty database for Immuta to manage. This is where system views, user entitlements, row access policies, column-level policies, procedures, and functions managed by Immuta will be created and stored.
Role: The default Snowflake role for the Immuta system account user.
Connection Key: A unique name for your host. This connection key will be used to create data source names for this host.
Click Next.
Select an authentication method from the dropdown menu. This authentication information will be included in the script populated later on the page.
Username and password: Choose one of the following options.
Select Immuta Generated to have Immuta populate the system account name and password.
Select User Provided to enter your own name and password for the Immuta system account.
Snowflake External OAuth:
Fill out the Token Endpoint, which is where the generated token is sent. It is also known as aud
(audience) and iss
(issuer).
Fill out the Client ID, which is the subject of the generated token. It is also known as sub
(subject).
Opt to fill out the Resource field with a URI of the resource where the requested token will be used.
Enter the x509 Certificate Thumbprint. This identifies the corresponding key to the token and is often abbreviated as x5t
or is called kid
(key identifier).
Upload the PEM Certificate, which is the client certificate that is used to sign the authorization request.
Key Pair Authentication:
Complete the Username field. This username will be used to connect to the remote database and retrieve records for this data source.
If using a private key, enter the Private Key Password.
Click Select a File, and upload a Snowflake key pair file.
The Role is prepopulated from the entry on the previous page.
Copy the provided script and run it in Snowflake with the following Snowflake permissions:
CREATE DATABASE ON ACCOUNT WITH GRANT OPTION
CREATE ROLE ON ACCOUNT WITH GRANT OPTION
CREATE USER ON ACCOUNT WITH GRANT OPTION
MANAGE GRANTS ON ACCOUNT WITH GRANT OPTION
APPLY MASKING POLICY ON ACCOUNT WITH GRANT OPTION
APPLY ROW ACCESS POLICY ON ACCOUNT WITH GRANT OPTION
Click Test Connection.
If the connection is successful, click Next. If there are any errors, check the connection details and credentials to ensure they are correct and try again.
Ensure all the details are correct in the summary and click Complete Setup.
This feature is being gradually rolled out to customers and may not be available to your account yet.
No Databricks Unity Catalog integrations can already be configured in Immuta. If your Databricks Unity Catalog integration is already configured on the app settings page, register your data sources using the legacy method.
Several different accounts are used to set up and maintain the Databricks Unity Catalog integration. The permissions required for each are outlined below.
Immuta account (required): This user configures the integration and registers the host. This user needs the following permission:
CREATE_DATA_SOURCE
Immuta permission
Databricks service principal (required): This service principal is used continuously by Immuta to orchestrate Unity Catalog policies and maintain state between Immuta and Databricks. This service principal needs the following Databricks privileges:
OWNER
permission on the Immuta catalog you configure.
OWNER
privilege on one of the securables below so that Immuta can administer Unity Catalog row-level and column-level security controls.
on catalogs with schemas and tables registered as Immuta data sources. This permission could also be applied by granting OWNER
on a catalog to a Databricks group that includes the Immuta service principal to allow for multiple owners.
on schemas with tables registered as Immuta data sources.
on all tables registered as Immuta data sources - if the OWNER
permission cannot be applied at the catalog- or schema-level. In this case, each table registered as an Immuta data source must individually have the OWNER
permission granted to the Immuta service principal.
USE CATALOG
and USE SCHEMA
on parent catalogs and schemas of tables registered as Immuta data sources so that the Immuta service principal can SELECT
and MODIFY
securables within the parent catalog and schema.
SELECT
and MODIFY
on all tables registered as Immuta data sources so that the Immuta service principal can grant and revoke access to tables and apply Unity Catalog row- and column-level security controls.
Databricks account (required): This user account can manually configure the integration in Databricks to create the Immuta-managed catalog. To do so, this account requires the following Databricks privileges:
CREATE CATALOG
on the Unity Catalog metastore
ACCOUNT ADMIN
on the Unity Catalog metastore for native query audit (optional)
Click Data and select the Infrastructure tab in the navigation menu.
Click the + Add Host button.
Select the Databricks data platform tile.
Enter the host connection information:
Host: The hostname of your Databricks workspace.
Port: Your Databricks port.
HTTP Path: The HTTP path of your Databricks cluster or SQL warehouse.
Immuta Catalog: The name of the catalog Immuta will create to store internal entitlements and other user data specific to Immuta. This catalog will only be readable for the Immuta service principal and should not be granted to other users. The catalog name may only contain letters, numbers, and underscores and cannot start with a number.
Connection Key: A unique name for your host. This connection key will be used to create data source names for this host.
Click Next.
Select Access Token authentication method from the dropdown menu.
Enter the Access Token in the Immuta System Account Credentials section. This is the access token for the Immuta service principal. This service principal must have the metastore privileges listed in the requirements section at the top of this page for the metastore associated with the Databricks workspace. If this token is configured to expire, update this field regularly for the integration to continue to function. This authentication information will be included in the script populated later on the page.
Copy the provided script and run it in Databricks as a user with the CREATE CATALOG
privilege on the Unity Catalog metastore.
Click Validate Connection.
If the connection is successful, click Next. If there are any errors, check the connection details and credentials to ensure they are correct and try again.
Ensure all the details are correct in the summary and click Complete Setup.
This feature is being gradually rolled out to customers and may not be available to your account yet.
The enhanced onboarding and data source registration workflow allows you to register your data at the host level, making data registration more scalable for your organization. Instead of registering schema and databases individually, you can register them all at once and allow Immuta to monitor your host for changes so that data sources are added and removed automatically to reflect the state of data on your host.
Once you register your host, Immuta presents a hierarchical view of your data that reflects the hierarchy of objects in your data platform:
Host: This first tier represents your server or data platform account.
Folder: This second tier represents your database or schema (depending on the structure of your remote platform).
Data source: This third tier represents individual data objects within your schema or database.
For example, the following object hierarchy for Snowflake hosts would be displayed on the Immuta infrastructure page:
Host
Database
Schema
Data source
Beyond making the registration of your data more intuitive, enhanced onboarding provides more control. Instead of performing operations on individual schemas or tables, you can perform operations (such as object discovery) at the host level.
See the Snowflake or Databricks Unity Catalog host registration how-to guides for a list of requirements.
In this enhanced onboarding workflow, you configure the integration and register data sources simultaneously. Once you save your configuration, Immuta manages and applies Snowflake or Unity Catalog governance features to data registered in Immuta.
Then, Immuta crawls your host to register all tables within every schema and database that the Snowflake role or Databricks account credentials you provided during the configuration has access to. The object metadata, user metadata, and policy definitions are stored in the Immuta metadata database, and this metadata is used to enforce policies for users accessing this data.
After initial registration, your host can be crawled in two ways:
Periodic crawl: This crawl happens once every 24 hours. Currently, updating this schedule is not configurable.
Manual crawl: You can manually trigger a crawl of your host.
During these subsequent crawls of your host, Immuta identifies tables, schemas, or databases that have been added or removed. If tables are added, new data sources are created in Immuta. If remote tables are deleted, the corresponding data sources will be disabled in Immuta.
For more information about the Snowflake or Databricks Unity Catalog integration and and how policies are enforced, see the Snowflake integration reference guide or Databricks Unity Catalog integration reference guide.
When there is an active policy that targets the New
tag, Immuta sends validation requests to data owners for the following changes made in the remote data platform:
Column added: Immuta applies the New
tag on the column that has been added and sends a request to the data owner to validate if the new column contains sensitive data. Once the data owner confirms they have validated the content of the column, Immuta removes the New
tag from it and as a result any policy that targets the New
column tag no longer applies.
Column data type changed: Immuta applies the New
tag on the column where the data type has been changed and sends a request to the data owner to validate if the column contains sensitive data. Once the data owner confirms they have validated the content of the column, Immuta removes the New
tag from it and as a result any policy that targets the New
column tag no longer applies.
Column deleted: Immuta deletes the column from the data source's data dictionary in Immuta. Then, Immuta sends a request to the data owner to validate the deleted column.
Data source created: Immuta applies the New
tag on the data source that has been newly created and sends a request to the data owner to validate if the new data source contains sensitive data. Once the data owner confirms they have validated the content of the data source, Immuta removes the New
tag from it and as a result any policy that targets the New
data source tag no longer applies.
For instructions on how to view and manage your tasks and requests in the Immuta UI, see the Manage access requests guide. To view and manage your tasks and requests via the Immuta API, see the Manage data source requests section of the API documentation.
When registering a host, Immuta sets the configuration to the recommended default settings to protect your . The recommended settings are described below:
Infrastructure object discovery: This setting allows Immuta to monitor schemas for changes. When Immuta identifies a new table, a data source will automatically be created. Similarly, if remote tables are deleted, the corresponding data sources will be disabled. This setting is enabled by default.
Default run schedule: This sets the time interval for Immuta to check for new objects. By default, this schedule is set to 24 hours.
Sensitive data discovery: This setting enables sensitive data discovery and allows you to select the sensitive data discovery framework that Immuta will apply to your data objects. This setting is enabled by default to use the preconfigured or global framework.
Impersonation: This setting enable and defines the role for user impersonation in Snowflake. User impersonation is not supported in the Databricks Unity Catalog integration. This setting is disabled by default.
Project workspaces: This setting enables Snowflake project workspaces. If you use Snowflake secure data sharing with Immuta, enable this setting, as project workspaces are required. If you use Snowflake table grants, disable this setting; project workspaces cannot be used when Snowflake table grants are enabled. Project workspaces are not supported in the Databricks Unity Catalog integration. This setting is disabled by default.
Unregistering a host automatically deletes all of its child objects in Immuta. However, Immuta will not remove the objects in your Snowflake or Databricks account.
Users can currently register a host, unregister a host, and update the connection information for a host.
Snowflake and Databricks Unity Catalog are currently the only integrations that support the simplified data registration workflow.
Databricks Unity Catalog:
Only managed and external tables will be registered as data sources.
Delta shares are unsupported.
A data source is how data owners expose their data across their organization to other Immuta users. Throughout this process, the data is not copied. Instead, Immuta uses metadata from the data source to determine how to expose the data. An Immuta data source is a virtual representation of data that exists in a remote .
This section includes reference and how-to guides for registering and managing data sources.
This reference guide describes Immuta data sources and their major components.
These how-to guides illustrate how to register data in Immuta.
The guides in this section illustrate how to manage and edit data sources and data dictionaries.
The reference and how-to guides in this section describe schema monitoring and illustrate how to configure it for your integration.
Immuta integrates with your data platforms and external catalogs so you can register your data and effectively manage access controls on that data.
This section includes concept, reference, and how-to guides for registering data sources so that you can discover, monitor, and protect sensitive data.
This section includes reference and how-to guides for configuring your integration and registering data at the host level in a single workflow.
This section covers concepts related to registering your data with Immuta.
The how-to and references guides in this section illustrate how to use domains. Domains are containers of data sources that allow you to assign data ownership and access management to specific business units, subject matter experts, or teams at the nexus of cross-functional groups.
This section covers concepts related to tags and how to use them in Immuta.
The enhanced onboarding and data source registration workflow allows you to register your data at the host level, making data registration more scalable for your organization. Instead of registering schema and databases individually, you can register them all at once and allow Immuta to monitor your host for changes so that data sources are added and removed automatically to reflect the state of data on your host.
: Configure a Snowflake integration and register a host.
: Configure a Databricks Unity Catalog integration and register a host.
: Trigger a manual crawl of a host or object to sync the infrastructure of your remote data platform with Immuta.
: This reference guide discusses the major concepts, design, and settings of the enhanced onboarding and data source registration workflow.
When a data source is exposed, policies are dynamically enforced on the data, appropriately redacting and masking information depending on the attributes or groups of the user accessing the data. Once the data source is exposed and subscribed to, the data can be accessed in a consistent manner, allowing reproducibility and collaboration.
This section includes how-to guides for registering data sources in Immuta:
Data owners expose their data across their organization to other users by registering that data in Immuta as a data source.
By default, data owners can register data in Immuta without affecting existing policies on those tables in their remote system, so users who had access to a table before it was registered can still access that data without interruption. If this default behavior is disabled on the app settings page, a subscription policy that requires data owners to manually add subscribers to data sources will automatically apply to new data sources (unless a global policy you create applies), blocking access to those tables.
For information about the default subscription policy and how to manage it, see the Subscription policies guide.
Click a link below to navigate to a tutorial that details how to create a data source:
You can create Databricks data sources with nested columns when you enable complex data types. When complex types are enabled, Databricks data sources can have columns that are arrays, maps, or structs that can be nested. These columns get parsed into a nested data dictionary.
There are various roles users and groups can play relating to each data source. These roles are managed through the members tab of the data source. Roles include the following types:
Owners: Those who create and manage new data sources and their users, documentation, and data dictionaries.
Subscribers: Those who have access to the data source data. With the appropriate data accesses and attributes, these users and groups can view files, run queries, and generate analytics against the data source data. All users and groups granted access to a data source have subscriber status.
Experts: Those who are knowledgeable about the data source data and can elaborate on it. They are responsible for managing the data source's documentation and data dictionary tags and descriptions.
See Manage data source members for a tutorial on modifying user roles.
The data dictionary provides information about the columns within the data source, including column names and value types.
Dictionary columns are automatically generated when the data source is created. However, data owners and experts can tag columns in the data dictionary and add descriptions to these entries.
This feature is being gradually rolled out to customers and may not be available to your account yet.
Requirement: Immuta permission CREATE_DATA_SOURCE
Prerequisite: Host registered using the enhanced onboarding and data registration workflow for Snowflake or Databricks Unity Catalog
Click Data and select the Infrastructure tab in the navigation menu.
Select the host.
Click the user action menu and select Re-crawl Host.
Click Data and select the Infrastructure tab in the navigation menu.
Select the host.
Click the menu icon in the Action column for the database you want to crawl and select Re-crawl Database.
Click Data and select the Infrastructure tab in the navigation menu.
Select the host.
Select the database in the data objects table.
Click the menu icon in the Action column for the schema you want to crawl and select Re-crawl Schema.
Click Data and select the Infrastructure tab in the navigation menu.
Select the host.
Select the database and then the schema in the data objects table.
Click the menu icon in the Action column for the table you want to crawl and select Re-crawl Table.
Private preview: Google BigQuery is available to select accounts. Reach out to your Immuta representative for details.
CREATE_DATA_SOURCE
Immuta permission
Google BigQuery roles:
roles/bigquery.metadataViewer
on the source table (if managed at that level) or dataset
roles/bigquery.dataViewer
(or higher) on the source table (if managed at that level) or dataset
roles/bigquery.jobUser
on the project
Configure the Google BigQuery integration
Google BigQuery data sources in Immuta must be created using a Google Cloud service account rather than a Google Cloud user account. If you do not currently have a service account for the Google Cloud project separate from the Google Cloud service account you created when configuring the Google BigQuery integration, you must create a Google Cloud service account with privileges to view and run queries against the tables you are protecting.
You have two options to create the required Google Cloud service account:
Using the Google Cloud documentation, create a service account with the following roles:
BigQuery User
BigQuery Data Viewer
Using the Google Cloud documentation, generate a service account key for the account you just created.
Copy the script below and update the SERVICE_ACCOUNT, PROJECT_ID, and IMMUTA_GCP_KEY_FILE
values.
SERVICE_ACCOUNT is the name for the new service account.
PROJECT_ID is the project ID for the Google Cloud Project that is integrated with Immuta.
IMMUTA_GCP_KEY_FILE
is the path to a new output file for the private key.
Use the script below in the gcloud
command line. This script is a template; change values as necessary:
Required Google BigQuery roles
Ensure that the user creating the data source has these Google BigQuery roles:
roles/bigquery.metadataViewer
on the source table (if managed at that level) or dataset
roles/bigquery.dataViewer
(or higher) on the source table (if managed at that level) or dataset
roles/bigquery.jobUser
on the project
Click the + button in the top-left corner of the screen and select New Data Source.
Select the Google BigQuery tile in the Data Platform section.
Complete these fields in the Connection Information box:
Account Email Address: Enter the email address of a user with access to the dataset and tables. This is the account created in the Google BigQuery configuration guide.
Project: Enter the name of the project that has been integrated with Immuta.
Dataset: Enter the name of the dataset with the tables you want Immuta to ingest.
Upload a BigQuery Key File in the modal. Note that the account in the key file must match the account email address entered in the previous step.
Click the Test Connection button. If the connection is successful, a check mark and successful connection notification will appear and you will be able to proceed. If an error occurs when attempting to connect, the error will be displayed in the UI. In order to proceed to the next step of data source creation, you must be able to connect to this data source using the connection information that you just entered.
Decide how to virtually populate the data source by selecting one of the options:
Create sources for all tables in this database: This option will create data sources and keep them in sync for every table in the dataset. New tables will be automatically detected and new Immuta views will be created.
Schema / Table: This option will allow you to specify tables or datasets that you want Immuta to register.
Provide basic information about your data source to make it discoverable to users.
Enter the SQL Schema Name Format to be the SQL name that the data source exists under in Immuta. For BigQuery the schema will be the BigQuery dataset. The format must include a schema macro but you may personalize it using lowercase letters, numbers, and underscores to personalize the format. It can have up to 255 characters.
Enter the Schema Project Name Format to be the name of the schema project in the Immuta UI. This is an Immuta project that will hold all of the metadata for the tables in a single dataset.
When selecting Create sources for all tables in this database and monitor for changes, you may personalize this field as you wish, but it must include a schema macro to represent the dataset name.
When selecting Schema/Table, this field is pre-populated with the recommended project name and you can edit freely.
Select the Data Source Name Format, which will be the format of the name of the data source in the Immuta UI.
<Tablename>
: The Immuta data source will have the same name as the original table.
<Schema><Tablename>
: The Immuta data source will have both the dataset and original table name.
Custom: This is a template you create to make the data source name. You may personalize this field as you wish, but it must include a tablename macro. The case of the macro will apply to the data source name (i.e., <Tablename>
will result in "Data Source Name," <tablename>
will result in "data source name," and <TABLENAME>
will result in "DATA SOURCE NAME").
Enter the SQL Table Name Format, which will be the format of the name of the table in Immuta. It must include a table name macro, but you may personalize the format using lowercase letters, numbers, and underscores. It may have up to 255 characters.
When selecting the Schema/Table option, you can opt to enable schema monitoring by selecting the checkbox in this section. This step will only appear if all tables within a server have been selected for creation.
Optional Advanced Settings:
Column Detection: To enable, select the checkbox in this section. This setting monitors when remote tables' columns have been changed, updates the corresponding data sources in Immuta, and notifies data owners of these changes. See schema projects overview to learn more about column detection.
Data Source Tags: Adding tags to your data source allows users to search for the data source using the tags and governors to apply global policies to the data source. Note if schema detection is enabled, any tags added now will also be added to the tables that are detected.
Click the Edit button in the Data Source Tags section.
Begin typing in the Search by Tag Name box to select your tag, and then click Add.
Click Create to save the data source(s).
With data sources registered in Immuta, your organization can now start
building global subscription and data policies to govern data.
creating projects to collaborate.
This page details how to register Databricks data sources using the legacy workflow. To register data sources using the simplified workflow, see this how-to guide.
Databricks Spark integration
When exposing a table or view from an Immuta-enabled Databricks cluster, be sure that at least one of these traits is true:
The user exposing the tables has READ_METADATA and SELECT permissions on the target views/tables (specifically if Table ACLs are enabled).
The user exposing the tables is listed in the immuta.spark.acl.whitelist
configuration on the target cluster.
The user exposing the tables is a Databricks workspace administrator.
Databricks Unity Catalog integration
When exposing a table from Databricks Unity Catalog, be sure the credentials used to register the data sources have the Databricks privileges listed below.
The following privileges on the parent catalogs and schemas of those tables:
SELECT
USE CATALOG
USE SCHEMA
USE SCHEMA
on system.information_schema
Azure Databricks Unity Catalog limitation
Set all table-level ownership on your Unity Catalog data sources to an individual user or service principal instead of a Databricks group before proceeding. Otherwise, Immuta cannot apply data policies to the table in Unity Catalog. See the Azure Databricks Unity Catalog limitation for details.
Use SSL
Although not required, it is recommended that all connections use SSL. Additional connection string arguments may also be provided.
Note: Only Immuta uses the connection you provide and injects all policy controls when users query the system. In other words, users always connect through Immuta with policies enforced and have no direct association with this connection.
Navigate to the My Data Sources page.
Click New Data Source.
Select the Databricks tile in the Data Platform section. When exposing a table or view from an Immuta-enabled Databricks cluster, be sure that at least one of these traits is true:
The user exposing the tables has READ_METADATA and SELECT permissions on the target views/tables (specifically if Table ACLs are enabled).
The user exposing the tables is listed in the `immuta.spark.acl.whitelist` configuration on the target cluster.
The user exposing the tables is a Databricks workspace administrator.
Complete the first four fields in the Connection Information box:
Server: hostname or IP address
Port: port configured for Databricks, typically port 443
SSL: when enabled, ensures communication between Immuta and the remote database is encrypted
Database: the remote database
Select your authentication method from the dropdown:
Access Token:
Enter your Databricks API Token. Use a non-expiring token so that access to the data source is not lost unexpectedly.
Enter the HTTP Path of your Databricks cluster or SQL warehouse.
OAuth machine-to-machine (M2M):
Enter the HTTP Path of your Databricks cluster or SQL warehouse.
Fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.cloud.databricks.com/oidc/v1/token
.
Fill out the Client ID. This is a combination of letters, numbers, or symbols, used as a public identifier and is the same as the service principal's application ID.
Enter the Scope (string). The scope limits the operations and roles allowed in Databricks by the access token. See the OAuth 2.0 documentation for details about scopes.
Enter the Client Secret. Immuta uses this secret to authenticate with the authorization server when it requests a token.
If you are using a proxy server with Databricks, specify it in the Additional Connection String Options:
Click Test Connection.
Further considerations
Immuta pushes down joins to be processed on the native database when possible. To ensure this happens, make sure the connection information matches between data sources, including host, port, ssl, username, and password. You will see performance degradation on joins against the same database if this information doesn't match.
If a client certificate is required to connect to the source database, you can add it in the Upload Certificates section at the bottom of the form.
Decide how to virtually populate the data source by selecting one of the options:
Create sources for all tables in this database: This option will create data sources and keep them in sync for every table in the dataset. New tables will be automatically detected and new Immuta views will be created.
Schema / Table: This option will allow you to specify tables or datasets that you want Immuta to register.
Opt to Edit in the table selection box that appears.
By default, all schemas and tables are selected. Select and deselect by clicking the checkbox to the left of the name in the Import Schemas/Tables menu. You can create multiple data sources at one time by selecting an entire schema or multiple tables.
After making your selection(s), click Apply.
Enter the SQL Schema Name Format to be the SQL name that the data source exists under in Immuta. It must include a schema macro but you may personalize it using lowercase letters, numbers, and underscores to personalize the format. It may have up to 255 characters.
Enter the Schema Project Name Format to be the name of the schema project in the Immuta UI. If you enter a name that already exists, the name will automatically be incremented. For example, if the schema project Customer table
already exists and you enter that name in this field, the name for this second schema project will automatically become Customer table 2
when you create it.
When selecting Create sources for all tables in this database and monitor for changes you may personalize this field as you wish, but it must include a schema macro.
When selecting Schema/Table this field is prepopulated with the recommended project name and you can edit freely.
Select the Data Source Name Format, which will be the format of the name of the data source in the Immuta UI.
<Tablename
>: The data source name will be the name of the remote table, and the case of the data source name will match the case of the macro.
<Schema
><Tablename
>: The data source name will be the name of the remote schema followed by the name of the remote table, and the case of the data source name will match the cases of the macros.
Custom: Enter a custom template for the Data Source Name. You may personalize this field as you wish, but it must include a tablename macro. The case of the macro will apply to the data source name (i.e., <Tablename
> will result in "Data Source Name," <tablename
> will result in "data source name," and <TABLENAME
> will result in "DATA SOURCE NAME").
Enter the SQL Table Name Format, which will be the format of the name of the table in Immuta. It must include a table name macro, but you may personalize the format using lowercase letters, numbers, and underscores. It may have up to 255 characters.
Schema monitoring best practices
Schema monitoring is a powerful tool that ensures tables are all governed by Immuta.
Consider using schema monitoring later in your onboarding process, not during your initial setup and configuration when tables are not in a stable state.
Consider using Immuta’s API to either run the schema monitoring job when your ETL process adds new tables or to add new tables.
Activate the new column added templated global policy to protect potentially sensitive data. This policy will null the new columns until a data owner reviews new columns that have been added, protecting your data to avoid data leaks on new columns getting added without being reviewed first.
When selecting the Schema/Table option, you can opt to enable Schema Monitoring by selecting the checkbox in this section.
Note: This step will only appear if all tables within a server have been selected for creation.
Generate your Immuta API key
Before you can run the script referenced in this tutorial, generate your Immuta API Key from your user profile page. The Immuta API key used in the Databricks notebook job for schema detection must either belong to an Immuta Admin or the user who owns the schema detection groups that are being targeted.
Enable Schema Monitoring or Detect Column Changes on the Data Source creation page.
Click Download Schema Job Detection Template.
Click the Click Here To Download text.
Before you can run the script, create the correct scope and secret by running these commands in the CLI using the Immuta API Key generated on your user profile page:
Import the Python script you downloaded into a Databricks workspace as a notebook. Note: The job template has commented out lines for specifying a particular database or table. With those two lines commented out, the schema detection job will run against ALL databases and tables in Databricks. Additionally, if you need to add proxy configuration to the job template, the template uses the Python requests library, which has a simple mechanism for configuring proxies for a request.
Schedule the script as part of a notebook job to run as often as required. Each time the job runs, it will make an API call to Immuta to trigger schema detection queries, and these queries will run on the cluster from which the request was made. Note: Use the api_immuta
cluster for this job. The job in Databricks must use an Existing All-Purpose Cluster so that Immuta can connect to it over ODBC. Job clusters do not support ODBC connections.
Although not required, completing these steps will help maximize the utility of your data source. Otherwise, skip to the next step.
This setting monitors when remote tables' columns have been changed, updates the corresponding data sources in Immuta, and notifies Data Owners of these changes.
To enable, select the checkbox in this section.
See the Schema projects overview page to learn more about column detection.
An Event Time column denotes the time associated with records returned from this data source. For example, if your data source contains news articles, the time that the article was published would be an appropriate Event Time column.
Click the Edit button in the Event Time section.
Select the column(s).
Click Apply.
Selecting an Event Time column will enable
more statistics to be calculated for this data source including the most recent record time, which is used for determining the freshness of the data source.
the creation of time-based restrictions in the policy builder.
Click Edit in the Latency section.
Complete the Set Time field, and then select MINUTES, HOURS, or DAYS from the subsequent dropdown menu.
Click Apply.
This setting impacts how often Immuta checks for new values in a column that is driving row-level redaction policies. For example, if you are redacting rows based on a country column in the data, and you add a new country, it will not be seen by the Immuta policy until this period expires.
Data owners can disable sensitive data discovery for their data sources in this section.
Click Edit in this section.
Select Enabled or Disabled in the window that appears, and then click Apply.
Adding tags to your data source allows users to search for the data source using the tags and Governors to apply Global policies to the data source. Note if Schema Detection is enabled, any tags added now will also be added to the tables that are detected.
To add tags,
Click the Edit button in the Data Source Tags section.
Begin typing in the Search by Tag Name box to select your tag, and then click Add.
Tags can also be added after you create your data source from the data source details page on the overview tab or the data dictionary tab.
Click Create to save the data source(s).
This page details how to register Snowflake data sources using the legacy workflow. To register data sources using the , see this .
CREATE_DATA_SOURCE
Immuta permission
USAGE
Snowflake privilege on the schema and database
REFERENCES
Snowflake privilege on the tables
Snowflake imported databases
Immuta does not support Snowflake tables from imported databases. Instead, create a view of the table and register that view as a data source.
Use SSL
Although not required, all connections should use SSL. Additional connection string arguments may also be provided.
Note: Only Immuta uses the connection you provide and injects all policy controls when users query the system. In other words, users always connect through Immuta with policies enforced and have no direct association with this connection.
Click the plus button in the top left of the Immuta console.
Select New Data Source.
Select the Snowflake tile in the Data Platform section.
Complete these fields in the Connection Information box:
Server: hostname or IP address
Port: port configured for Snowflake, typically port 443
SSL: when enabled, ensures communication between Immuta and the remote database is encrypted
Warehouse: Snowflake warehouse that contains the remote database
Database: remote database
From the Select Authentication Method Dropdown, select either Username and Password, Key Pair Authentication or Snowflake External OAuth:
Username and Password
Enter a Username. This username will be used to connect to the remote database and retrieve records for this data source.
Enter a Password. This password will be used with the above username to connect to the remote database.
You can then choose to enter Additional Connection String Options or Upload Certificates to connect to the database.
Key Pair Authentication
Enter a Username. This username will be used to connect to the remote database and retrieve records for this data source.
Opt to enter the private key file password in the Additional Connection String Options. Use the following format: PRIV_KEY_FILE_PWD=<your_pw>
.
Click Select a File, and upload a Snowflake key pair file.
Snowflake External OAuth
Fill out the Token Endpoint, which is where the generated token is sent.
Fill out the Client ID, which is the subject of the generated token.
To use a certificate, keep the Use Certificate checkbox enabled and complete the steps below. You cannot pass a client secret if you use this method for obtaining the access token.
Opt to fill out the Resource field with a URI of the resource where the requested token will be used.
Enter the x509 Certificate Thumbprint. This identifies the corresponding key to the token and is often abbreviated as x5t
or is called sub
(Subject).
Upload the PEM Certificate, which is the client certificate that is used to sign the authorization request.
To pass a client secret, uncheck the Use Certificate checkbox and complete the fields below. You cannot use a certificate if you use this method for obtaining the access token.
Scope (string): The scope limits the operations and roles allowed in Snowflake by the access token. See the for details about creating scopes for External OAuth.
Client Secret (string): Immuta uses this secret to authenticate with the authorization server when it requests a token.
Click the Test Connection button.
If the connection is successful, a check mark and successful connection notification will appear and you can proceed. You must be able to connect to this data source using the connection information that you just entered to proceed.
Considerations
Immuta pushes down joins to be processed on the native database when possible. To ensure this happens, make sure the connection information matches between data sources, including host, port, ssl, username, and password. You will see performance degradation on joins against the same database if this information doesn't match.
If a client certificate is required to connect to the source database, you can add it in the Upload Certificates section at the bottom of the form.
File naming convention
If you are uploading more than one file, ensure the certificate used for the OAuth authentication has the key name "oauth client certificate."
Decide how to virtually populate the data source by selecting one of the options:
Create sources for all tables in this database: This option will create data sources and keep them in sync for every table in the dataset. New tables will be automatically detected and new Immuta views will be created.
Schema / Table: This option will allow you to specify tables or datasets that you want Immuta to register.
Opt to Edit in the table selection box that appears.
By default, all schemas and tables are selected. Select and deselect by clicking the checkbox to the left of the name in the Import Schemas/Tables menu. You can create multiple data sources at one time by selecting an entire schema or multiple tables.
After making your selection(s), click Apply.
Enter the SQL Schema Name Format to be the SQL name that the data source exists under in Immuta. It must include a schema macro but you may personalize it using lowercase letters, numbers, and underscores to personalize the format. It may have up to 255 characters.
Enter the Schema Project Name Format to be the name of the schema project in the Immuta UI. If you enter a name that already exists, the name will automatically be incremented. For example, if the schema project Customer table
already exists and you enter that name in this field, the name for this second schema project will automatically become Customer table 2
when you create it.
When selecting Create sources for all tables in this database and monitor for changes you may personalize this field as you wish, but it must include a schema macro.
When selecting Schema/Table this field is prepopulated with the recommended project name and you can edit freely.
Select the Data Source Name Format, which will be the format of the name of the data source in the Immuta UI.
<Tablename
>: The data source name will be the name of the remote table, and the case of the data source name will match the case of the macro.
<Schema
><Tablename
>: The data source name will be the name of the remote schema followed by the name of the remote table, and the case of the data source name will match the cases of the macros.
Custom: Enter a custom template for the Data Source Name. You may personalize this field as you wish, but it must include a tablename macro. The case of the macro will apply to the data source name (i.e., <Tablename
> will result in "Data Source Name," <tablename
> will result in "data source name," and <TABLENAME
> will result in "DATA SOURCE NAME").
Schema monitoring best practices
Schema monitoring is a powerful tool that ensures tables are all governed by Immuta.
Consider using schema monitoring later in your onboarding process, not during your initial setup and configuration when tables are not in a stable state.
Note: This step will only appear if all tables within a server have been selected for creation.
Although not required, completing these steps will help maximize the utility of your data source.
This setting monitors when remote tables' columns have been changed, updates the corresponding data sources in Immuta, and notifies Data Owners of these changes.
To enable, select the checkbox in this section.
An Event Time column denotes the time associated with records returned from this data source. For example, if your data source contains news articles, the time that the article was published would be an appropriate Event Time column.
Click the Edit button in the Event Time section.
Select the column(s).
Click Apply.
Selecting an Event Time column will enable
more statistics to be calculated for this data source including the most recent record time, which is used for determining the freshness of the data source.
the creation of time-based restrictions in the policy builder.
Click Edit in the Latency section.
Complete the Set Time field, and then select MINUTES, HOURS, or DAYS from the subsequent dropdown menu.
Click Apply.
This setting impacts how often Immuta checks for new values in a column that is driving row-level redaction policies. For example, if you are redacting rows based on a country column in the data, and you add a new country, it will not be seen by the Immuta policy until this period expires.
Data owners can disable sensitive data discovery for their data sources in this section.
Click Edit in this section.
Select Enabled or Disabled in the window that appears, and then click Apply.
Adding tags to your data source allows users to search for the data source using the tags and Governors to apply Global policies to the data source. Note if Schema Detection is enabled, any tags added now will also be added to the tables that are detected.
To add tags,
Click the Edit button in the Data Source Tags section.
Begin typing in the Search by Tag Name box to select your tag, and then click Add.
Tags can also be added after you create your data source from the data source details page on the overview tab or the data dictionary tab.
Click Create to register your data source.
Consider using to either run the schema monitoring job when your ETL process adds new tables or to add new tables.
Activate the to protect potentially sensitive data. This policy will null the new columns until a data owner reviews new columns that have been added, protecting your data to avoid data leaks on new columns getting added without being reviewed first.
When selecting the Schema/Table option, opt to enable by selecting the checkbox in this section.
See the page to learn more about column detection.
As a data owner, you can edit your data source settings and disable, delete, and re-enable a data source.
For other guides related to data source members and management, see the Related guides section.
Navigate to the Overview tab.
Click the more actions icon in the upper right corner of the page and select Edit.
Change your settings in the data source workflow.
Note: Some settings cannot be changed once the data source has been created. In these cases, simply create a new data source with the new settings.
When completed, navigate to the end of the workflow and click Save.
Note: Some data sources may require the data owner to reconnect to the remote database before any changes to the data source can be saved.
For information on specific data source settings, see the guides below:
Data owners can bulk edit data sources.
Navigate to the data sources list page.
Select the checkboxes for the data sources you want to edit. Note that when editing a connection string using bulk edit, all data sources from that connection must be selected.
Select the action you want or click More Actions for additional options.
Confirm your edits by following the prompts in the modals that appear.
Disabling a data source hides it and its data from all users except the data owner. While in this state, the data source will display as disabled in the console for the data owner and other users will not be able to see it at all.
Navigate to the Overview tab.
Click on the more actions icon in the upper right corner and select Disable.
A label will appear next to the data source indicating it is now disabled, and a notification will be sent to all users of the data source informing them that the data source has been disabled.
Navigate to the Overview tab.
Click on the more actions icon in the upper right corner and select Enable.
A notification will be sent out to all users of the data source informing them that the data source has been enabled.
Deleting a data source permanently removes it from Immuta. Data sources must first be disabled before they can be deleted.
Navigate to the Overview tab and click the more actions icon and select Delete.
Confirm that the data source should be deleted by clicking Delete.
A notification will be sent out to all users of the data source informing them that the data source has been deleted.
For information about data sources and policies, see the following guides:
In addition to adding and managing data source settings as outlined above, data owners can manage data source
When an Immuta data source is created, background jobs use the connection information provided to compute health checks dependent on the type of data source created and how it was configured. These data source health checks include the
blob crawl status: indicates whether the blob was successfully crawled.
column detection status: indicates whether the job run to determine if a column was added or removed from the remote table registered as an Immuta data source was successful.
external catalog link status: indicates whether or not the external catalog was successfully linked to the data source.
fingerprint generation status: indicates whether or not the data source fingerprint was successfully generated.
framework classification status: indicates whether classification was successfully run on the data source to determine the sensitivity of the data source.
global policy applied status: indicates whether global policies were successfully applied to the data source.
high cardinality calculation status: indicates whether the data source's high cardinality column was successfully calculated.
native SQL sync status (for Snowflake data sources): indicates whether Snowflake governance policies have been successfully synced.
native SQL view creation status (for Redshift data sources): indicates whether native views were properly created for Redshift tables registered in Immuta.
row count status: indicates whether the number of rows in the data source was successfully calculated.
schema detection status: indicates whether the job run to determine if a remote table was added or removed from the schema was successful.
sensitive data discovery status: indicates whether sensitive data discovery was successfully run on the data source.
After these jobs complete, the health status for each is updated to indicate whether the status check passed, was skipped, is unknown, or failed.
These background jobs can be disabled during data source creation by adding a specific tag to prevent automatic table statistics. This prevent statistics tag can be set on the app settings page by a system administrator. However, with automatic table statistics disabled these policies will be unavailable until the data source owner manually generates the fingerprint:
Masking with format preserving masking
Masking with k-anonymization
Masking using randomized response
Unhealthy data sources may fail their row count queries if they run against a cluster that has the Databricks query watchdog enabled.
Data sources with over 1600 columns will not have health checks run, but will still appear as healthy. The health check cannot be run automatically or manually.
Once a data source is created, the data owner can manage data source policies, members, data dictionary, and tags.
The reference and how-to guides in this section cover topics related to managing existing data sources.
Manage data source settings: Edit data source settings or disable and delete a data source.
Manage data source members: Add, remove, or modify users on a data source.
Manage data source access requests: Approve and deny subscriptions requests on data source.
Disable data sampling: Disable metadata collection that requires sampling data.
Data dictionary: Manage the data dictionary descriptions and tags.
Data source health checks: This reference guide defines the data source health check jobs that are run when a data source is created.
The data dictionary provides information about the columns within the data source, including column names and value types.
As a data owner, you can manage data dictionary descriptions and column tags. For other guides related to the data dictionary, see the Related guides section.
Navigate to the Data Dictionary tab.
To add or edit column descriptions, click the menu icon in the Actions column next to the entry you want to change and select Edit.
Complete the fields in the form that appears, and then click Save.
For information about the data dictionary, see the Data sources in Immuta overview.
In addition to managing data dictionary descriptions as outlined above, data owners or experts can also manage column tags.
In addition to creating and managing data sources, data owners can add and manage data source members manually. While this is supported, it is not recommended and instead it is much more scalable to manage user access through subscription policies
For other guides related to data source members and management, see the Related guides section.
Navigate to the data source and click the Members tab.
Click Add Members and enter the group name or username.
Select their Role:
Subscriber: The role can have read or write access to the table. This role is only available if there are read access policies on the data source.
Owner: The role can manage data source members and policies and have read or write access to the table.
Expert: The role can manage the data dictionary descriptions and have read or write access to the table. This role is only available if there are read access policies on the data source.
You can also opt to specify an expiration date for when the user’s access should expire.
Select Read or Write from the Access Grant dropdown. This option is only available if write policies have been enabled.
Click Add.
Navigate to the data sources list page.
Select the data sources you want to add users to by clicking the checkbox next to the data source.
Select Add Users.
In the modal, type the user name or group name and select the user or group you want to add from the dropdown menu.
Opt to set an Expiration for the users' subscriptions. Additionally, you can change the role from Subscriber to Expert or Owner for the users or groups using the dropdown menu in the Role column.
Click Add. All users and groups will be added to the data sources you selected.
As a data owner, you can limit the amount of time a user or group has access to your data source by setting an access expiration date.
Navigate to the Members tab.
Adjust the number of days under the Expires column for the user/group whose access you want to limit (the limit is counting from today, so users/groups with 0 days left means their access will be revoked by the end of today and users with 1 day left means their access will be revoked by the end of tomorrow).
Save your changes.
To remove the limit (or set the limit to Never), delete the number from the field and save your changes.
Navigate to the Members tab.
Click the drop-down arrow under the Role column next to the user/group whose role you’d like to change.
Select another role (subscribed, expert, owner or ingest user, if applicable).
Notifications about the change will be sent to the affected users and groups (as well as alternative Owners).
Navigate to the Members tab.
Click the Name of the user or group whose history you want to review.
As a data owner, you can deny access to any users or groups at any time.
Navigate to the Members tab.
To remove a user or group from a data source, click Deny in the Actions column next to the user or group you want to remove.
Complete the Deny Access form, including a reason for revoking the access.
This action will immediately update users' or groups' subscription status, and they will no longer have any access to the data source. Notifications will be sent to the affected users (as well as alternative data owners) informing them of the change in subscription status.
For information about data source members and subscriptions, see the data source user roles section.
In addition to adding and managing data source members as outlined above, data owners can manage data source
If you want to disable the metadata collection that requires sampling data, you must
These steps will ensure that Immuta queries no data, under any circumstances. Without this sample data, some Immuta features will be unavailable. Sensitive data discovery (SDD) cannot be used to automatically detect sensitive data in your data sources, and the following masking policies will not work:
Masking with format preserving masking
Masking with k-anonymization
Masking using randomized response
Reach out to your Immuta representative to disable fingerprinting.
Reach out to your Immuta representative to disable health checks on all data sources.
Tag each data source with the seeded Skip Stats Job
tag to stop Immuta from collecting a sample and running table stats on the sample. You can tag data sources as you create them in the UI or via the Immuta API.
Note that data sources automatically skip the stats job upon registration, without the Skip Stats Job
tag, as long as there are no active policies requiring them. The following policies require stats:
Column masking with randomized response
Column masking with format preserving masking
Column masking with k-anonymization
Column masking with rounding
Column masking with reversibility
Row minimization
Your outgoing and incoming requests are consolidated on the requests tab on your user profile page. Similar to notifications, a red dot displays on the request icon whenever you have pending requests. The sections below guide you through managing these requests.
Navigate to your Profile page, and then click the Requests tab. The names of the users who have submitted requests are displayed in the left pane. Once a user is selected, the corresponding pending requests are displayed on the right.
To view more information about the request, click the Details button in the Actions column of a request.
Click the Approve or Deny button in the Actions column of the request.
To approve or deny multiple access requests simultaneously,
Navigate to your Profile page, and then click the Requests tab.
Select the checkbox next to each request you want to address, and then click the Approve Selected or Deny Selected button.
If a policy that includes the New
tag is active and schema monitoring is enabled or you have registered a host, Immuta applies a New
tag to new data sources, new columns, or changed columns and sends data owners a request to validate those changes.
Navigate to your Profile page, and then click the Requests tab.
Click the approvals count in the Request Information column to view information about the change to the data source. The change will be one of the following:
Column added
Column changed
Column deleted
Data source created
After verifying the change, click Validate.
For more information about these requests, see the Schema monitoring guide or the Enhanced onboarding and data source registration guide.
Deprecation notice
Support for this feature has been deprecated.
If users make an unmask request, a tasks tab will appear on the data source overview page for the user making the request and the user receiving the request. From this tab, users can view and manage two different task views:
Your Created Tasks: This page lists the status and information of the unmask requests you've submitted.
Tasks For You: This page lists the status and information of the unmask requests that have been submitted to you.
To complete a task,
Navigate to the Tasks tab from the Data Source Overview page, and then click the Tasks For You toggle.
Click the Unmask Values icon in the Actions column of the task.
A dialog box will appear with the masked and unmasked value. Note: You can view information about this request, including the reason for the request and the date is was created, by clicking the Task Info button in the Actions column.
To delete a task, click Delete Task in the Actions column of the relevant task.
In addition to managing data source requests as outlined above, data owners can manage data source
Private preview
The Amazon S3 integration is available to select accounts. Reach out to your Immuta representative for details.
CREATE_S3_DATA_SOURCE
Immuta permission
Configure the Amazon S3 integration
Navigate to the My Data Sources page in Immuta.
Click New Data Source.
Select the Native S3 tile in the data platform section.
Select your AWS Account/Region from the dropdown menu.
Opt to select a default domain to which data sources will be assigned.
Opt to add default tags to the data sources.
Click Next.
The prefix field is populated with the base path. Add to this prefix to create a data source for a prefix, bucket, or object.
If the data source prefix ends in a wildcard (*
), it protects all items starting with that prefix. For example, a base location of s3://
and a data source prefix surveys/2024*
would protect paths like s3://surveys/2024-internal/research-dept.txt
or s3://surveys/2024-customer/april/us.csv
.
If the data source prefix ends without a wildcard (*
), it protects a single object. For example, a base location path of s3://
and a data source prefix of research-data/demographics
would only protect the object that exactly matches s3://research-data/demographics
.
Click Add Prefix, and then click Next.
Verify that your prefixes are correct and click Complete Setup.
Navigate to the My Data Sources page.
Click the New Data Source button in the top right corner.
Select the Azure Synapse Analytics tile in the Data Platform section.
Complete these fields in the Connection Information box:
Server: hostname or IP address
Port: port configured for Azure Synapse Analytics
SSL: when enabled, ensures communication between Immuta and the remote database is encrypted
Database: the remote database
Username: the username to use to connect to the remote database and retrieve records for this data source
Password: the password to use with the above username to connect to the remote database
You can then choose to enter Additional Connection String Options or Upload Certificates to connect to the database.
Click the Test Connection button.
If the connection is successful, a check mark and successful connection notification will appear and you will be able to proceed. If an error occurs when attempting to connect, the error will be displayed in the UI. In order to proceed to the next step of data source creation, you must be able to connect to this data source using the connection information that you just entered.
Use SSL
Although not required, it is recommended that all connections use SSL. Additional connection string arguments may also be provided.
Note: Only Immuta uses the connection you provide and injects all policy controls when users query the system. In other words, users always connect through Immuta with policies enforced and have no direct association with this connection.
Considerations
Immuta pushes down joins to be processed on the native database when possible. To ensure this happens, make sure the connection information matches between data sources, including host, port, ssl, username, and password. You will see performance degradation on joins against the same database if this information doesn't match.
If a client certificate is required to connect to the source database, you can add it in the Upload Certificates section at the bottom of the form.
Decide how to virtually populate the data source by selecting one of the options:
Create sources for all tables in this database: This option will create data sources and keep them in sync for every table in the dataset. New tables will be automatically detected and new Immuta views will be created.
Schema / Table: This option will allow you to specify tables or datasets that you want Immuta to register.
Opt to Edit in the table selection box that appears.
By default, all schemas and tables are selected. Select and deselect by clicking the checkbox to the left of the name in the Import Schemas/Tables menu. You can create multiple data sources at one time by selecting an entire schema or multiple tables.
After making your selection(s), click Apply.
Enter the SQL Schema Name Format to be the SQL name that the data source exists under in Immuta. It must include a schema macro but you may personalize it using lowercase letters, numbers, and underscores to personalize the format. It may have up to 255 characters.
Enter the Schema Project Name Format to be the name of the schema project in the Immuta UI. If you enter a name that already exists, the name will automatically be incremented. For example, if the schema project Customer table
already exists and you enter that name in this field, the name for this second schema project will automatically become Customer table 2
when you create it.
When selecting Create sources for all tables in this database and monitor for changes you may personalize this field as you wish, but it must include a schema macro.
When selecting Schema/Table this field is prepopulated with the recommended project name and you can edit freely.
Select the Data Source Name Format, which will be the format of the name of the data source in the Immuta UI.
<Tablename
>: The data source name will be the name of the remote table, and the case of the data source name will match the case of the macro.
<Schema
><Tablename
>: The data source name will be the name of the remote schema followed by the name of the remote table, and the case of the data source name will match the cases of the macros.
Custom: Enter a custom template for the Data Source Name. You may personalize this field as you wish, but it must include a tablename macro. The case of the macro will apply to the data source name (i.e., <Tablename
> will result in "Data Source Name," <tablename
> will result in "data source name," and <TABLENAME
> will result in "DATA SOURCE NAME").
Enter the SQL Table Name Format, which will be the format of the name of the table in Immuta. It must include a table name macro, but you may personalize the format using lowercase letters, numbers, and underscores. It may have up to 255 characters.
Schema monitoring best practices
Schema monitoring is a powerful tool that ensures tables are all governed by Immuta.
Consider using schema monitoring later in your onboarding process, not during your initial setup and configuration when tables are not in a stable state.
Consider using Immuta’s API to either run the schema monitoring job when your ETL process adds new tables or to add new tables.
Activate the new column added templated global policy to protect potentially sensitive data. This policy will null the new columns until a data owner reviews new columns that have been added, protecting your data to avoid data leaks on new columns getting added without being reviewed first.
When selecting the Schema/Table option, you can opt to enable Schema Monitoring by selecting the checkbox in this section.
Note: This step will only appear if all tables within a server have been selected for creation.
Although not required, completing these steps will help maximize the utility of your data source. Otherwise, skip to the next step.
This setting monitors when remote tables' columns have been changed, updates the corresponding data sources in Immuta, and notifies Data Owners of these changes.
To enable, select the checkbox in this section.
See the Schema projects overview page to learn more about column detection.
An Event Time column denotes the time associated with records returned from this data source. For example, if your data source contains news articles, the time that the article was published would be an appropriate Event Time column.
Click the Edit button in the Event Time section.
Select the column(s).
Click Apply.
Selecting an Event Time column will enable
more statistics to be calculated for this data source including the most recent record time, which is used for determining the freshness of the data source.
the creation of time-based restrictions in the policy builder.
Click Edit in the Latency section.
Complete the Set Time field, and then select MINUTES, HOURS, or DAYS from the subsequent dropdown menu.
Click Apply.
This setting impacts how often Immuta checks for new values in a column that is driving row-level redaction policies. For example, if you are redacting rows based on a country column in the data, and you add a new country, it will not be seen by the Immuta policy until this period expires.
Adding tags to your data source allows users to search for the data source using the tags and Governors to apply Global policies to the data source. Note if Schema Detection is enabled, any tags added now will also be added to the tables that are detected.
To add tags,
Click the Edit button in the Data Source Tags section.
Begin typing in the Search by Tag Name box to select your tag, and then click Add.
Tags can also be added after you create your data source from the data source details page on the overview tab or the data dictionary tab.
Click Create to save the data source(s).
Immuta is a live metadata aggregator - metadata about your data and your users. With data metadata specifically, Immuta can monitor changes in your database and reflect those changes in your Immuta tenant through schema monitoring.
When schema monitoring is enabled, Immuta monitors your organization's servers to identify when new tables or columns are created or deleted, and automatically registers (or disables) those tables in Immuta. The newly updated data sources then have global policies and tags applied to them, and the Immuta data dictionary is updated with column changes.
Schema monitoring keeps Immuta in sync with your data environment, helping you remain compliant without having to manually update individual data sources.
Without schema monitoring, data owners have to manually add and remove Immuta data sources when users add or remove tables from databases in their data platforms. At worst, data owners are not aware of these changes; at best they are aware of the changes and have to manually update Immuta with those changes, which is a time-consuming, error-prone process.
Beyond draining data owners' time, manually updating data sources to reflect the state of the data platform also complicates the process: not only must they understand when a new table is present, but they then must remember to tag it and protect it appropriately. This leaves organizations ripe for data leaks as new data is created across the business, perhaps daily.
Schema monitoring, by contrast, is scalable and accounts for the evolution of your schemas and policies. Instead of manually managing access to these tables or adding and removing data sources, you are empowered to register a schema, create policies, and allow Immuta to manage those policies and changes to your schema for you to keep your data in sync and restrict access appropriately.
Both monitoring for new data and discovering and tagging sensitive data align with the concepts of scalability and evolvability, removing redundant and arduous work. Once tables are registered and tagged, policies can immediately be applied - this means humans can be completely removed from the process by creating tag-based policies that dynamically apply themselves to new tables.
Then, your business reaps the following benefits:
Increased revenue: Accelerate data access and time-to-data access because where sensitive data lives is well understood.
Decreased cost: Operate efficiently and move with agility at scale.
Decreased risk: Discover and protect sensitive data immediately.
Schema monitoring pairs with the following features:
Column detection: Column detection identifies when a column has been added to or removed from a table and adds or removes that column from the data source in Immuta.
New column added templated global policy: When paired with column detection or schema monitoring, this policy locks down access to those newly added columns and tables to prevent data leaks.
Sensitive data discovery: When the tables are discovered through the registration process, Immuta evaluates the table data for sensitive information and tags it as such. These tags are critical for scaling tag-based policies.
Global data and subscription policies: Global data and subscription policies can be created using tags so that they immediately enforce appropriate access restrictions on tables and columns when they are added.
Schema projects are automatically created and managed by Immuta. They group all the data sources of the schema, and when new data sources are created, manually or with schema monitoring, they are automatically added to the schema project. They work as a tool to organize all the data sources within a schema, which is particularly helpful with schema monitoring enabled.
Schema projects are created when tables are registered as data sources in Immuta. The user creating the data source does not need the CREATE_PROJECT permission to have the project auto-create because no data sources can be added by the owner. Instead, new data sources are managed by Immuta. The user can manage Subscription policies for schema projects, but they cannot apply Data policies or purposes to them.
The schema settings, such as schema evolution and connection information, can be edited from the project overview tab. Note: Deleting the project will delete all of the data sources within it as well.
Schema settings are edited from the project overview tab:
Schema Project Connection Details: Editing these details will update them for all the data sources within the schema project.
Data Source Naming Convention: When schema monitoring is enabled, new data sources will be automatically detected and added to the schema project. Updating the naming convention will change how these newly detected data sources are named by Immuta.
Schema Detection Owner: When schema monitoring is enabled, a user is assigned to be the owner of any detected and Immuta created data source.
Disable or delete your schema project: Deleting the project will delete all of the data sources within it as well.
Schema monitoring allows organizations to monitor their data environments. When it is enabled, Immuta monitors the organization's servers to detect when new tables or columns are created or deleted, and automatically registers (or disables) those tables in Immuta. These newly updated data sources will then have any global policies and tags that are set in Immuta applied to them. The Immuta data dictionary will be updated with any column changes, and the Immuta environment will be in sync with the organization's data environment. This automated process helps organizations keep compliant without the need to manually keep data sources up to date.
Schema monitoring is enabled while creating or editing a data source and only registers new tables and columns within known schemas. It does not register new schemas. Data owners or governors can edit the naming convention for newly detected data sources and the schema detection owner from the after it has been enabled.
See the guides for instructions on enabling schema monitoring or for instructions on editing the schema monitoring settings.
Column detection is a part of schema monitoring, but can also be enabled on its own to detect the column changes of a select group of tables. Column detection monitors when columns are added or removed from a table and when column types are changed and updates those changes in the appropriate Immuta data source's data dictionary.
See one of the guides for instructions on enabling column detection.
When new data sources and columns are detected and added to Immuta, or when column types have changed, they will always automatically be tagged with the New
tag. This allows governors to use the to mask columns with the New
tag, since they could contain sensitive data.
The New Column Added
global policy is staged (inactive) by default.
See the to activate this seeded global policy if you want any columns with the New
tag to be automatically masked.
When schema monitoring is enabled and there is an active policy that targets the New
tag, Immuta sends validation requests to data owners for the following changes made in the remote data platform:
Column added: Immuta applies the New
tag on the column that has been added and sends a request to the data owner to validate if the new column contains sensitive data. Once the data owner confirms they have validated the content of the column, Immuta removes the New
tag from it and as a result any policy that targets the New
column tag no longer applies.
Column data type changed: Immuta applies the New
tag on the column where the data type has been changed and sends a request to the data owner to validate if the column contains sensitive data. Once the data owner confirms they have validated the content of the column, Immuta removes the New
tag from it and as a result any policy that targets the New
column tag no longer applies.
Column deleted: Immuta deletes the column from the data source's data dictionary in Immuta. Then, Immuta sends a request to the data owner to validate the deleted column.
Data source created: Immuta applies the New
tag on the data source that has been newly created and sends a request to the data owner to validate if the new data source contains sensitive data. Once the data owner confirms they have validated the content of the data source, Immuta removes the New
tag from it and as a result any policy that targets the New
data source tag no longer applies.
Every 24 hours, at 12:30 a.m. UTC by default, Immuta checks the servers for any changes to tables and columns.
If Immuta finds a change, it will update the appropriate Immuta data source or column:
If Immuta finds a new table, then Immuta creates an Immuta data source for that table and tags it New
.
If Immuta finds a table has been deleted, then Immuta disables that table's data source.
If Immuta finds a previously deleted table has been re-created, then Immuta restores that table's data source and tags it New
.
If Immuta finds a new column within a table, then Immuta adds that column to the data dictionary and tags it New
.
If Immuta finds a column has been deleted, then Immuta deletes that column from the data dictionary.
If Immuta finds a column type has changed, then Immuta updates the column type in the data dictionary and tags it New
.
Active policies that target the New
data source or column tag will be applied until a data owner validates the changes.
The default schedule for schema monitoring to run is every 24 hours. Some organizations may need to schedule it to run more often; however, this needs careful consideration as it can impact performance and compute costs.
Domains are containers of data sources that allow you to assign data ownership and access management to specific business units, subject matter experts, or teams at the nexus of cross-functional groups. Within a domain, specific users are assigned a domain permission to manage policies on or to audit the data sources in that domain, which eliminates the problem of centralizing your data governance and giving users too much power over all data in your organization. Instead, you can control how much power these governance and audit users have over data by restricting their privileges to the domains you specify.
Once a data owner registers data as Immuta data sources, users with the GOVERNANCE
permission can add data sources to domains based on business units, business goals, or policy management strategy within their organization. Then, a user with the USER_ADMIN
permission can assign additional users to audit activity on data sources or manage policies on data within those domains:
Domains are also integral to delegating policy management of data products within a data mesh without breaking your organization's security and compliance standards. To learn more about how you can integrate Immuta in your data mesh framework (and domains' role in that process), see the .
Data sources can be assigned to domains based on business units in your organization or any other method that suits your business goals and policy management strategy. Users with the GOVERNANCE
permission can change the domain that a data source belongs to or remove a data source from a domain. Once a data source is assigned to a particular domain, users with a domain permission can start performing actions on those data sources.
Whether created within or outside a domain, Immuta policies enforce access as usual: users who meet the restrictions outlined in the policy applied to a data source may access that data source. However, domains restrict who can author policies that apply to data sources assigned to that domain. This policy management restriction gives organizations more control of how much power governance users have over data. Furthermore, it can make Immuta easier to use by allowing more people to author policies.
Users with the Manage Policies
permission in a domain can set global policies to apply to the domains for which they have the Manage Policies
permission. In the example below, the domain policy manager can only apply global policies to the HR
domain:
If that same user had the Manage Policies
permission for the HR
and Marketing
domains, they could write global policies to apply to data sources within both of those domains. If that user had the Manage Policies
permission on all domains in their organization or the global GOVERNANCE
permission, they could set global policies to apply to all data sources.
When a data source is removed from or added to a domain, Immuta recomputes the policies that apply to the domain. Any policies associated with a domain will be added to or removed from a data source when it is added to or removed from the domain.
These three different policy levels are authored by separate users:
Global policy level: Users with the GOVERNANCE
permission can create global policies that apply to all data sources in an organization.
Domain policy level: Users with the Manage Policies
permission can apply policies to data sources within the domains for which they have that permission.
Local policy level: Data owners can apply policies directly to a data source, even if a domain or global policy applies to it as well. Because they are the most knowledgeable of their data, data owners can disable and apply the most relevant policy to their data source.
Since policies apply to domain data sources at multiple levels, there are instances in which two or three policies could apply to a single domain data source. Data owners can disable and enable policies that are most appropriate for their data source when a conflict occurs. For details about policy conflicts, merges, and policy conflict management, see the following pages:
Users with the global AUDIT
permission can see all audit events throughout Immuta. To scope this audit information to just a single domain, users can be given the domain Audit Activity
permission.
If a user has the domain Audit Activity
permission on a particular domain, they can audit the following activities:
Domain: Events from when the domain was created, users were added, permissions were updated, and the domain was updated.
Data sources: Events from the data sources within the specific domain.
Policies: Events for policies created, edited, or deleted that affect data sources within the domain.
Queries: Events for user queries against data sources within the domain.
Users with the GOVERNANCE
permission can delete any domain that has zero data sources assigned to it.
Existing data sources can be assigned to a domain by a user with the GOVERNANCE
permission. Once added to a domain, domain policies will be enforced on the data sources.
Private preview
This feature is only available to select accounts. Reach out to your Immuta representative to enable this feature.
Snowflake Enterprise Edition
Snowflake X-Large or Large warehouse is strongly recommended
Set the to None for bulk data source creation. This will simplify the data source creation process by not automatically applying policies.
Make a request to the Immuta V2 API , as the Immuta UI does not support creating more than 1000 data sources. The following options must be specified in your request to ensure the maximum performance benefits of bulk data source creation. The Skip Stats Job
tag is only required if you are using ; otherwise, Snowflake data sources automatically skip the stats job.
Specifying disableSensitiveDataDiscovery
as true
ensures that will not be applied when the new data sources are created in Immuta, regardless of how it is configured for the Immuta tenant. Disabling sensitive data discovery improves performance during data source creation.
Applying the Skip Stats Job
tag using the tableTag
value will ensure that some jobs that are not vital to data source creation are skipped, specifically the fingerprint and high cardinality check jobs.
When the Snowflake bulk data source creation feature is configured, the create data source endpoint operates asynchronously and responds immediately with a bulkId
that can be used for monitoring progress.
To monitor the progress of the background jobs for the bulk data source creation, make the following request using the bulkId
from the response of the previous step:
The response will contain a list of job states and the number of jobs currently in each state. If errors were encountered during processing, a list of errors will be included in the response:
With these recommended configurations, bulk creating 100,000 Snowflake data sources will take between six and seven hours for all associated jobs to complete.
With schema monitoring enabled, Immuta monitors your organization's servers to find when new tables or columns are created or deleted and automatically registers (or disables) those tables in Immuta.
: Edit connection information, schema project owner, or the naming conventions of data registered in the schema.
: Manually trigger schema monitoring.
: This reference guide describes the design and components of schema monitoring.
: This reference guide describes schema projects, which group all the data sources of a schema.
: This explanatory guide provides a conceptual overview of schema monitoring. It offers a discussion of the benefits of the feature, context for why it was developed, and insights into the features schema monitoring pairs with. This guide is designed to deepen your understanding of schema monitoring's purpose as you implement it.
Redshift data sources
Redshift Spectrum data sources must be registered using .
Registering Redshift datashares as Immuta data sources is unsupported.
The must be set to false
(default setting) for your Redshift cluster.
Navigate to the My Data Sources page.
Click the New Data Source button in the top right corner.
Select the Redshift tile in the Data Platform section.
Complete these fields in the Connection Information box:
Server: hostname or IP address
Port: port configured for Redshift, typically port 5439
SSL: when enabled, ensures communication between Immuta and the remote database is encrypted
Database: the remote database
Username: the username to use to connect to the remote database and retrieve records for this data source
Password: the password to use with the above username to connect to the remote database
You can then choose to enter Additional Connection String Options or Upload Certificates to connect to the database.
Click the Test Connection button.
If the connection is successful, a check mark and successful connection notification will appear and you will be able to proceed. If an error occurs when attempting to connect, the error will be displayed in the UI. In order to proceed to the next step of data source creation, you MUST be able to connect to this data source using the connection information that you just entered.
Use SSL
Although not required, it is recommended that all connections use SSL. Additional connection string arguments may also be provided.
Note: Only Immuta uses the connection you provide and injects all policy controls when users query the system. In other words, users always connect through Immuta with policies enforced and have no direct association with this connection.
Further considerations
Immuta pushes down joins to be processed on the native database when possible. To ensure this happens, make sure the connection information matches between data sources, including host, port, ssl, username, and password. You will see performance degradation on joins against the same database if this information doesn't match.
If a client certificate is required to connect to the source database, you can add it in the Upload Certificates section at the bottom of the form.
Decide how to virtually populate the data source by selecting one of the options:
Create sources for all tables in this database: This option will create data sources and keep them in sync for every table in the dataset. New tables will be automatically detected and new Immuta views will be created.
Schema / Table: This option will allow you to specify tables or datasets that you want Immuta to register.
Opt to Edit in the table selection box that appears.
By default, all schemas and tables are selected. Select and deselect by clicking the checkbox to the left of the name in the Import Schemas/Tables menu. You can create multiple data sources at one time by selecting an entire schema or multiple tables.
After making your selection(s), click Apply.
Enter the SQL Schema Name Format to be the SQL name that the data source exists under in Immuta. It must include a schema macro but you may personalize it using lowercase letters, numbers, and underscores to personalize the format. It may have up to 255 characters.
Enter the Schema Project Name Format to be the name of the schema project in the Immuta UI. If you enter a name that already exists, the name will automatically be incremented. For example, if the schema project Customer table
already exists and you enter that name in this field, the name for this second schema project will automatically become Customer table 2
when you create it.
When selecting Create sources for all tables in this database and monitor for changes you may personalize this field as you wish, but it must include a schema macro.
When selecting Schema/Table this field is prepopulated with the recommended project name and you can edit freely.
Select the Data Source Name Format, which will be the format of the name of the data source in the Immuta UI.
<Tablename
>: The data source name will be the name of the remote table, and the case of the data source name will match the case of the macro.
<Schema
><Tablename
>: The data source name will be the name of the remote schema followed by the name of the remote table, and the case of the data source name will match the cases of the macros.
Custom: Enter a custom template for the Data Source Name. You may personalize this field as you wish, but it must include a tablename macro. The case of the macro will apply to the data source name (i.e., <Tablename
> will result in "Data Source Name," <tablename
> will result in "data source name," and <TABLENAME
> will result in "DATA SOURCE NAME").
Enter the SQL Table Name Format, which will be the format of the name of the table in Immuta. It must include a table name macro, but you may personalize the format using lowercase letters, numbers, and underscores. It may have up to 255 characters.
Schema monitoring best practices
Schema monitoring is a powerful tool that ensures tables are all governed by Immuta.
Consider using schema monitoring later in your onboarding process, not during your initial setup and configuration when tables are not in a stable state.
Note: This step will only appear if all tables within a server have been selected for creation.
This setting monitors when remote tables' columns have been changed, updates the corresponding data sources in Immuta, and notifies Data Owners of these changes.
To enable, select the checkbox in this section.
An Event Time column denotes the time associated with records returned from this data source. For example, if your data source contains news articles, the time that the article was published would be an appropriate Event Time column.
Click the Edit button in the Event Time section.
Select the column(s).
Click Apply.
Selecting an Event Time column will enable
more statistics to be calculated for this data source including the most recent record time, which is used for determining the freshness of the data source.
the creation of time-based restrictions in the policy builder.
Click Edit in the Latency section.
Complete the Set Time field, and then select MINUTES, HOURS, or DAYS from the subsequent dropdown menu.
Click Apply.
This setting impacts how often Immuta checks for new values in a column that is driving row-level redaction policies. For example, if you are redacting rows based on a country column in the data, and you add a new country, it will not be seen by the Immuta policy until this period expires.
Data owners can disable sensitive data discovery for their data sources in this section.
Click Edit in this section.
Select Enabled or Disabled in the window that appears, and then click Apply.
Adding tags to your data source allows users to search for the data source using the tags and Governors to apply Global policies to the data source. Note if Schema Detection is enabled, any tags added now will also be added to the tables that are detected.
To add tags,
Click the Edit button in the Data Source Tags section.
Begin typing in the Search by Tag Name box to select your tag, and then click Add.
Tags can also be added after you create your data source from the data source details page on the overview tab or the data dictionary tab.
Click Create to save the data source(s).
Using OAuth authentication to create Starburst (Trino) data sources
If you are using OAuth or asynchronous authentication to create Starburst (Trino) data sources, work with your Immuta representative to configure the globalAdminUsername
property. See the for details.
Navigate to the My Data Sources page.
Click the New Data Source button in the top right corner.
Select the Starburst (Trino) tile in the Data Platform section.
Complete these fields in the Connection Information box:
Server: hostname or IP address
Port: port configured for Starburst (Trino)
SSL: when enabled, ensures communication between Immuta and the remote database is encrypted
Catalog: the remote catalog
Username: the username to use to connect to the remote database and retrieve records for this data source
Password: the password to use with the above username to connect to the remote database
You can then choose to enter Additional Connection String Options or Upload Certificates to connect to the database.
Click the Test Connection button.
If the connection is successful, a check mark and successful connection notification will appear and you will be able to proceed. If an error occurs when attempting to connect, the error will be displayed in the UI. In order to proceed to the next step of data source creation, you must be able to connect to this data source using the connection information that you just entered.
Using OAuth authentication to create Starburst (Trino) data sources
Use SSL
Although not required, it is recommended that all connections use SSL. Additional connection string arguments may also be provided.
Note: Only Immuta uses the connection you provide and injects all policy controls when users query the system. In other words, users always connect through Immuta with policies enforced and have no direct association with this connection.
Considerations
Immuta pushes down joins to be processed on the native database when possible. To ensure this happens, make sure the connection information matches between data sources, including host, port, ssl, username, and password. You will see performance degradation on joins against the same database if this information doesn't match.
If a client certificate is required to connect to the source database, you can add it in the Upload Certificates section at the bottom of the form.
Decide how to virtually populate the data source by selecting one of the options:
Create sources for all tables in this database: This option will create data sources and keep them in sync for every table in the dataset. New tables will be automatically detected and new Immuta views will be created.
Schema / Table: This option will allow you to specify tables or datasets that you want Immuta to register.
Opt to Edit in the table selection box that appears.
By default, all schemas and tables are selected. Select and deselect by clicking the checkbox to the left of the name in the Import Schemas/Tables menu. You can create multiple data sources at one time by selecting an entire schema or multiple tables.
After making your selection(s), click Apply.
Enter the SQL Schema Name Format to be the SQL name that the data source exists under in Immuta. It must include a schema macro but you may personalize it using lowercase letters, numbers, and underscores to personalize the format. It may have up to 255 characters.
Enter the Schema Project Name Format to be the name of the schema project in the Immuta UI. If you enter a name that already exists, the name will automatically be incremented. For example, if the schema project Customer table
already exists and you enter that name in this field, the name for this second schema project will automatically become Customer table 2
when you create it.
When selecting Create sources for all tables in this database and monitor for changes you may personalize this field as you wish, but it must include a schema macro.
When selecting Schema/Table this field is prepopulated with the recommended project name and you can edit freely.
Select the Data Source Name Format, which will be the format of the name of the data source in the Immuta UI.
<Tablename
>: The data source name will be the name of the remote table, and the case of the data source name will match the case of the macro.
<Schema
><Tablename
>: The data source name will be the name of the remote schema followed by the name of the remote table, and the case of the data source name will match the cases of the macros.
Custom: Enter a custom template for the Data Source Name. You may personalize this field as you wish, but it must include a tablename macro. The case of the macro will apply to the data source name (i.e., <Tablename
> will result in "Data Source Name," <tablename
> will result in "data source name," and <TABLENAME
> will result in "DATA SOURCE NAME").
Enter the SQL Table Name Format, which will be the format of the name of the table in Immuta. It must include a table name macro, but you may personalize the format using lowercase letters, numbers, and underscores. It may have up to 255 characters.
Schema monitoring best practices
Schema monitoring is a powerful tool that ensures tables are all governed by Immuta.
Consider using schema monitoring later in your onboarding process, not during your initial setup and configuration when tables are not in a stable state.
Note: This step will only appear if all tables within a server have been selected for creation.
This setting monitors when remote tables' columns have been changed, updates the corresponding data sources in Immuta, and notifies Data Owners of these changes.
To enable, select the checkbox in this section.
An Event Time column denotes the time associated with records returned from this data source. For example, if your data source contains news articles, the time that the article was published would be an appropriate Event Time column.
Click the Edit button in the Event Time section.
Select the column(s).
Click Apply.
Selecting an Event Time column will enable
more statistics to be calculated for this data source including the most recent record time, which is used for determining the freshness of the data source.
the creation of time-based restrictions in the policy builder.
Click Edit in the Latency section.
Complete the Set Time field, and then select MINUTES, HOURS, or DAYS from the subsequent dropdown menu.
Click Apply.
This setting impacts how often Immuta checks for new values in a column that is driving row-level redaction policies. For example, if you are redacting rows based on a country column in the data, and you add a new country, it will not be seen by the Immuta policy until this period expires.
Data owners can disable sensitive data discovery for their data sources in this section.
Click Edit in this section.
Select Enabled or Disabled in the window that appears, and then click Apply.
Adding tags to your data source allows users to search for the data source using the tags and Governors to apply Global policies to the data source. Note if Schema Detection is enabled, any tags added now will also be added to the tables that are detected.
To add tags,
Click the Edit button in the Data Source Tags section.
Begin typing in the Search by Tag Name box to select your tag, and then click Add.
Tags can also be added after you create your data source from the data source details page on the overview tab or the data dictionary tab.
Click Create to save the data source(s).
For instructions on how to view and manage your assigned tasks in the Immuta UI, see the . To view and manage your assigned tasks via the Immuta API, see the section of the API documentation.
Immuta user with schema monitoring enabled.
If Immuta finds that the backing object type of a data source has been changed (for example, from a TABLE
to a VIEW
) in Snowflake or Databricks Unity Catalog, Immuta will reapply existing policies on the data source. Note that because of policy limitations on Unity Catalog views, changing a Databricks Unity Catalog object type from a table to a view could result in some types of data policies being removed. See the for a list of data policies that are not supported for views.
To run schema monitoring or column detection manually, see the .
Manually trigger schema monitoring (filtered down to the database) after your dbt or other transform workflows run. For more information, see the .
When manually triggering schema monitoring, specify a table or database for maximum performance efficiency and to reduce data or policy downtime. For more information on triggering schema monitoring, see the .
If you are manually managing data tags, activate the to protect newly found and potentially sensitive data. This policy sets all columns with the tag New
to NULL until a data owner reviews and validates their content. Using this workflow protects your data and avoids data leaks on new columns getting automatically added. This recommendation is unnecessary for users leveraging sensitive data discovery (SDD) or using an external data catalog.
See the for instructions on creating a domain and adding data sources to it.
See the for instructions on authoring domain policies.
When integrating domains to distribute policy management across your organization, data governance and access control must be applied horizontally (globally across data in your organization) and vertically (locally within specific domains or data products). Global policies should be authored and applied in line with your ecosystem’s most generic and all-encompassing principles, regardless of the data’s domain. For example, a global policy could be used to mask all PII data across an organization. Domain or , on the other hand, should be fine-grained and applicable to only context-specific purposes or use cases. For example, a domain or local policy could be used to only show rows in the Sales
table where the value in the country
column matches the user's office location.
See the for instructions on deleting a domain.
See the for instructions on adding existing data sources to a domain.
Domain permissions can be added to users or groups by a user with the USER_ADMIN
Immuta permission. See the for a list of global and domain permissions necessary to manage domains.
Consider using to either run the schema monitoring job when your ETL process adds new tables or to add new tables.
Activate the to protect potentially sensitive data. This policy will null the new columns until a data owner reviews new columns that have been added, protecting your data to avoid data leaks on new columns getting added without being reviewed first.
When selecting the Schema/Table option, you can opt to enable by selecting the checkbox in this section.
Although not required, completing these steps will help maximize the utility of your data source. Otherwise, skip to .
See the page to learn more about column detection.
If you are using OAuth or asynchronous authentication to create Starburst (Trino) data sources, work with your Immuta representative to configure the globalAdminUsername
property. See the page for details.
Consider using to either run the schema monitoring job when your ETL process adds new tables or to add new tables.
Activate the to protect potentially sensitive data. This policy will null the new columns until a data owner reviews new columns that have been added, protecting your data to avoid data leaks on new columns getting added without being reviewed first.
When selecting the Schema/Table option, you can opt to enable by selecting the checkbox in this section.
Although not required, completing these steps will help maximize the utility of your data source. Otherwise, skip to .
See the page to learn more about column detection.
Domains are containers of data sources that allow you to map your users and data into your organizational or governance structure. Within a domain you can assign data ownership and access management to specific business units, subject matter experts, or teams at the nexus of cross-functional groups.
Getting started with domains: Create and manage a domain.
Domains: This reference guide describes the design and components of domains.
Immuta uses tags primarily to enforce policies, but tags can also be used for generating Immuta reports and search results in the Immuta UI.
Create tags: Create tags in Immuta or import tags from an external catalog.
Add tags to data sources and projects: Add and manage tags on Immuta data sources and projects.
Tags: This reference guide describes how Immuta uses tags.
Tags have several uses, mainly to drive policies, but they can be used for the following purposes:
Use tags for global subscription or data policies that will apply to all data sources in the organization. In doing this, company-wide data security restrictions can be controlled by the administrators and governors, while the users and data owners need only to worry about tagging the data correctly.
Generate Immuta reports from tags for anything from insider threat surveillance to data access monitoring.
Drive search results with tags in the Immuta UI.
Every user within Immuta can see tags, but they will all interact with them differently as their roles require. Governors create, manage, and delete tags or import tags from external catalogs. Data owners, data source experts, and governors apply these tags to or remove them from projects, data sources, and columns within the data sources. Data users view tags and tag metadata on data sources they have access to.
Managing tags best practice: Use the minimum number of tags possible to achieve the data privacy needed.
When navigating tags in the Immuta UI, there are several helpful features:
Side sheets: Clicking on a tag in the data dictionary, on the data overview page, or on a project page will open the tag side sheet with valuable information about the tag. This information depends on the kind of tag it is and where it is applied. The side sheet can include a link to the tag details page, a description of the tag, the context of the tag (i.e., where the tag was created and added from), the columns the tag is applied to, and actions that can be done to the tag (e.g., disabling or deleting the tag from its object).
Tooltips: When you hover over a tag, a tooltip will appear. It contains information about the tag, including where it was created (e.g., Immuta or an external catalog), whether the tag was applied by sensitive data discovery, and the full name of the tag.
Simplified names: When fully articulated, tags are presented as Parent . Child . Grandchild
with " . " between each level. However, tags will usually appear as the lowest name level (i.e. Discovered . Entity . Person Name
will appear as Person Name
) and the full name can be seen in the tooltip.
Use sensitive data discovery
Sensitive data discovery can improve your ability to secure your data by automatically tagging sensitive entities, enabling the scalable implementation of global policies. Use this feature in tandem with verification of tags on all data sources.
Sensitive data discovery (SDD) helps to ensure sensitive data is properly managed and governed, providing fast identification for entities in columns such as credit card numbers, names, locations, social security numbers, bitcoin wallets, US phone numbers, financial data, and more.
Navigate to the Project Overview tab.
Click Edit Connection.
Use the Connection Information modal to make any necessary changes.
Click Save.
Navigate to the Project Overview tab.
Click Edit Schema Monitoring.
Use the Basic Information modal to make any necessary changes to naming formats.
Click Save.
Navigate to the Project Overview tab.
Click Edit Schema Monitoring.
Use the dropdown menu in the Schema Monitoring modal to select a new schema detection owner. The new owner must be an owner of one or more of the data sources belonging to that schema.
Click Save.
Click the Data icon in the navigation menu and select the Data Sources tab.
Select a data source.
Click the Add Tags button at the bottom of the Overview tab.
Begin typing Environment.dev
in the Search by Name field and select the tag from the dropdown list.
Click Add. A list of the applied tags will populate at the bottom of the Overview tab.
Repeat as necessary for other data sources and tags.
Click the Data icon in the navigation menu and select the Data Sources tab.
Select a data source.
Scroll to the Tags section at the bottom of the Overview tab, and click on the tag you want to remove.
Click Delete in the side sheet and then click Confirm.
The data dictionary lists the columns within the data source and the value type of the data within each column. From this page, governors can add tags to or remove them from specific columns in a data source.
Navigate to a data source and click the Data Dictionary tab.
Scroll to the column you want to add a tag to and click Add Tags.
Begin typing in the Search by Name field and select the tag from the dropdown list.
Click Add. The applied tag will appear below the column name in the data dictionary.
Navigate to a data source and click the Data Dictionary tab.
Scroll to the column you want to remove the tag from and click on the tag you want to delete.
Click Delete in the side sheet and then click Confirm.
Click the Data icon and select Projects in the left sidebar.
Select a project.
Click the Add Tags button at the bottom of the Project Overview tab.
Begin typing in the Search by Name field that appears, and then select the tag from the dropdown list.
Click Add. A list of the applied tags will populate at the bottom of the project overview.
Click the Data icon and select Projects in the left sidebar.
Select a project.
Scroll to the Tags section at the bottom of the Overview tab, and then click the tag you want to delete.
Click Delete in the side sheet and then click Confirm.
For information about data sources and tags, see the following guides:
In addition to adding and managing data source tags as outlined above, data owners can manage data source
Domains are containers of data sources that allow you to assign data ownership and access management to specific business units, subject matter experts, or teams at the nexus of cross-functional groups. Instead of centralizing your data governance and giving users too much governance over all your data, you control how much power they have over data sources by granting them permission within domains in Immuta.
Required Immuta permission: GOVERNANCE
Navigate to the Domains page.
Click + New Domain.
Enter a Name and Description for your domain.
Click Save.
To create a domain using the API, see the Domains API guide. For more information about domains, see the Domains reference guide.
Required Immuta permission: USER_ADMIN
User administrators can assign domain permissions from the domain permissions tab or the people page. See instructions for both methods below.
Click Domains and navigate to the domain.
Got to the Permissions tab and click + Grant Permissions.
Opt to select additional domains to apply the permission assignments to.
Choose how to assign the permission:
Individual selected users: Select this option from the dropdown and then search for individual users to grant the permission to.
Users in group: Select this option from the dropdown and then search for groups to grant the permission to.
Choose the permission to assign:
Manage Policies permission to allow them to create policies that will apply to the data sources within the domain.
Audit Activity permission to allow them to view audit events within the domain.
Review your changes and click Grant Permissions.
To assign permissions using the API, see the Domains API guide. For a list of permissions associated with domains, see the Domains reference guide.
Click People in the left navigation menu and select Users or Groups.
Select your user or group and then click the Settings tab.
Click + Add Domain Permissions.
Select the Domain for which the user or group should have the permission.
Opt to select additional users or groups to grant the permission to within the selected domains.
Choose the permission to assign:
Manage Policies permission to allow them to create policies that will apply to the data sources within the domain.
Audit Activity permission to allow them to view audit events within the domain.
Review your changes and click Grant Permissions.
Required Immuta permission: GOVERNANCE
Navigate to the Domains page and select your domain.
Click the Data Sources tab, and then click + Add Data Sources.
Select the checkboxes for the data sources you want to add to your domain.
Click + Add to Domain.
To assign data sources using the API, see the Domains API guide. For more information about domain data sources, see the Domains reference guide.
Required Immuta permission: GOVERNANCE
or Manage Policies
Navigate to the Domains page and select your domain.
Click the Subscription Policies or Data Policies tab.
Click Create Policy and select Subscription Policy or Data Policy.
Write your subscription policy or data policy as outlined in the policies how-to guide.
When building your policy, your domain should automatically be added in the What domain(s) should this policy be restricted to? section. However, you can select more domains that you have the Manage Policies
permission for here as well. This step will assign the policy to all data sources added to that domain.
For more information about domain policies, see the Domains reference guide.
Required Immuta permission: Audit Activity
Domain-related activity can be audited from the domain page, the audit page, the people page, or the data sources overview page. To find a specific audit record,
Navigate to the Audit page - records are automatically filtered to your authorized domains only.
Optional: Use filters to narrow down the search for activities.
Click on a record to see details about a specific activity.
Required Immuta permission: GOVERNANCE
Navigate to the Domains page and select your domain.
Click Remove Domain.
Confirm your changes.
To delete a domain using the API, see the Domains API guide.
Tag shortcuts
You can use keyboard shortcuts when creating tags.
Add sibling tag: "Enter"
Add child tag: "Shift"+"Enter"
Previous/next tag: "â–¼"/"â–²"
Click the Governance icon in the navigation menu and select the Tags tab.
Click Add Tags in the top right corner.
Complete the Enter tag name field.
Additional nested tags are optional. These nested tags follow a tree structure. There are parent, sibling, and child tags. Click Remove Tag to remove a nested tag.
Click Save.
Click the Data icon in the navigation menu and select the Data Sources tab.
Select a data source.
Navigate to the Data Dictionary tab.
Hover over tags for metadata or click on a tag to open the side sheet with information about the tag.
Click the Governance icon in the navigation menu and select the Tags tab.
A list of all top-level tags will be displayed. Click the expand arrow to view nested tags.
Click the tag itself or the icon in the Actions column to edit tags, generate tag reports, or delete tags.
You can pull external tags that you had previously defined in the external catalog (e.g., Collibra, Snowflake, etc.).
Click the Governance icon in the navigation menu and select the Tags tab.
Click Refresh External Tags.
External tags will be automatically detected when you create a new data source that originates in an external catalog, or they can be linked directly from the data source overview page.
When using custom REST catalogs, the GET/dataSource/page/{id}
endpoint returns a human-readable information page from the REST catalog for the data source associated with {id}
. Immuta provides this as a mechanism for allowing the REST catalog to provide additional information about the data source that may not be directly ingested by or visible within Immuta. This link is accessed in the Immuta UI when a user clicks the catalog logo associated with the data source on the data source overview page.
Requirement: Immuta permission USER_ADMIN
You can manually run a schema monitoring job globally using the with an empty payload.
You can manually run a schema monitoring job for all data sources that you own using the with a payload containing the hostname for your data sources or their individual IDs.
You can manually run a schema monitoring job for data sources you are subscribed to using the with a payload containing the hostname for your data source and the table name or data source ID.
Navigate to the data source overview page.
Click on the health check icon.
Scroll to Column Detection, and click Trigger Detection.