1 of 9

Register Data Sources

When a data source is exposed, are dynamically enforced on the data, appropriately redacting and masking information depending on the attributes or groups of the user accessing the data. Once the data source is exposed and subscribed to, the data can be accessed in a consistent manner, allowing reproducibility and collaboration.

This section includes how-to guides for registering data sources in Immuta:

Amazon S3 Data Source

Private preview

The Amazon S3 integration is available to select accounts. Reach out to your Immuta representative for details.

Requirement

CREATE_S3_DATA_SOURCE Immuta permission

Prerequisite

Configure the Amazon S3 integration

Create a data source

Navigate to the Data Sources list page in Immuta.
Click Register Data Source.
Select the S3 tile in the data platform section.
Select your AWS Account/Region from the dropdown menu.
Opt to select a default domain to which data sources will be assigned.
Opt to add default tags to the data sources.
Click Next.
The prefix field is populated with the base path. Add to this prefix to create a data source for a prefix, bucket, or object.
- If the data source prefix ends in a wildcard (*), it protects all items starting with that prefix. For example, a base location of s3:// and a data source prefix surveys/2024* would protect paths like s3://surveys/2024-internal/research-dept.txt or s3://surveys/2024-customer/april/us.csv.
- If the data source prefix ends without a wildcard (*), it protects a single object. For example, a base location path of s3:// and a data source prefix of research-data/demographics would only protect the object that exactly matches s3://research-data/demographics.
Click Add Prefix, and then click Next.
Verify that your prefixes are correct and click Complete Setup.

Azure Synapse Analytics Data Source

Enter connection information

Navigate to the Data Sources list page and click Register Data Source.
Select the Azure Synapse Analytics tile in the Data Platform section.
Complete these fields in the Connection Information box:
- Server: hostname or IP address
- Port: port configured for Azure Synapse Analytics
- SSL: when enabled, ensures communication between Immuta and the remote database is encrypted
- Database: the remote database
- Username: the username to use to connect to the remote database and retrieve records for this data source
- Password: the password to use with the above username to connect to the remote database
You can then choose to enter Additional Connection String Options or Upload Certificates to connect to the database.
Click the Test Connection button.

Use SSL

Although not required, it is recommended that all connections use SSL. Additional connection string arguments may also be provided.

Note: Only Immuta uses the connection you provide and injects all policy controls when users query the system. In other words, users always connect through Immuta with policies enforced and have no direct association with this connection.

Considerations

Immuta pushes down joins to be processed on the remote database when possible. To ensure this happens, make sure the connection information matches between data sources, including host, port, ssl, username, and password. You will see performance degradation on joins against the same database if this information doesn't match.
If a client certificate is required to connect to the source database, you can add it in the Upload Certificates section.

Select virtual population

Decide how to virtually populate the data source by selecting one of the options:

Create sources for all tables in this database: This option will create data sources and keep them in sync for every table in the dataset. New tables will be automatically detected and new Immuta views will be created.
Schema / Table: This option will allow you to specify tables or datasets that you want Immuta to register.
1. Opt to Edit in the table selection box that appears.
2. By default, all schemas and tables are selected. Select and deselect by clicking the checkbox for the schemas in the Import Schemas/Tables modal. You can create multiple data sources at one time by selecting an entire schema or multiple tables.
3. After making your selection(s), click Apply.

Enter basic information

Enter the SQL Schema Name Format to be the SQL name that the data source exists under in Immuta. It must include a schema macro but you may personalize it using lowercase letters, numbers, and underscores to personalize the format. It may have up to 255 characters.
Enter the Schema Project Name Format to be the name of the schema project in the Immuta UI. If you enter a name that already exists, the name will automatically be incremented. For example, if the schema project Customer table already exists and you enter that name in this field, the name for this second schema project will automatically become Customer table 2 when you create it.
1. When selecting Create sources for all tables in this database and monitor for changes you may personalize this field as you wish, but it must include a schema macro.
2. When selecting Schema/Table this field is prepopulated with the recommended project name and you can edit freely.
Select the Data Source Name Format, which will be the format of the name of the data source in the Immuta UI.
- <Tablename>: The data source name will be the name of the remote table, and the case of the data source name will match the case of the macro.
- <Schema><Tablename>: The data source name will be the name of the remote schema followed by the name of the remote table, and the case of the data source name will match the cases of the macros.
- Custom: Enter a custom template for the Data Source Name. You may personalize this field as you wish, but it must include a tablename macro. The case of the macro will apply to the data source name (i.e., <Tablename> will result in "Data Source Name," <tablename> will result in "data source name," and <TABLENAME> will result in "DATA SOURCE NAME").
Enter the SQL Table Name Format, which will be the format of the name of the table in Immuta. It must include a table name macro, but you may personalize the format using lowercase letters, numbers, and underscores. It may have up to 255 characters.

Enable or disable schema monitoring

Schema monitoring best practices

Schema monitoring is a powerful tool that ensures tables are all governed by Immuta.

Consider using schema monitoring later in your onboarding process, not during your initial setup and configuration when tables are not in a stable state.
Consider using Immuta’s API to either run the schema monitoring job when your ETL process adds new tables or to add new tables.
Activate the new column added templated global policy to protect potentially sensitive data. This policy will null the new columns until a data owner reviews new columns that have been added, protecting your data to avoid data leaks on new columns getting added without being reviewed first.

When selecting the Schema/Table option, you can opt to enable Schema Monitoring by selecting the checkbox in this section.

Note: This step will only appear if all tables within a server have been selected for creation.

Opt to configure advanced settings

Although not required, completing these steps will help maximize the utility of your data source. Otherwise, click Create to save the data source.

Column detection

This setting monitors when remote tables' columns have been changed, updates the corresponding data sources in Immuta, and notifies Data Owners of these changes.

To enable, select the checkbox in this section.

See the Schema projects overview page to learn more about column detection.

Event time

An Event Time column denotes the time associated with records returned from this data source. For example, if your data source contains news articles, the time that the article was published would be an appropriate Event Time column.

Click the Edit button in the Event Time section.
Select the column(s).
Click Apply.

Selecting an Event Time column will enable

more statistics to be calculated for this data source including the most recent record time, which is used for determining the freshness of the data source.
the creation of time-based restrictions in the policy builder.

Latency

Click Edit in the Latency section.
Complete the Set Time field, and then select MINUTES, HOURS, or DAYS from the subsequent dropdown menu.
Click Apply.

This setting impacts how often Immuta checks for new values in a column that is driving row-level redaction policies. For example, if you are redacting rows based on a country column in the data, and you add a new country, it will not be seen by the Immuta policy until this period expires.

Data source tags

Adding tags to your data source allows users to search for the data source using the tags and Governors to apply Global policies to the data source. Note if Schema Detection is enabled, any tags added now will also be added to the tables that are detected.

To add tags,

Click the Edit button in the Data Source Tags section.
Begin typing in the Search by Tag Name box to select your tag, and then click Add.

Tags can also be added after you create your data source from the data source details page on the overview tab or the data dictionary tab.

Create the data source

Click Create to save the data source(s).

Databricks Data Source

This page details how to register Databricks data sources using the existing workflow. To register data sources using connections, see this how-to guide.

Requirements

Databricks Spark integration

When exposing a table or view from an Immuta-enabled Databricks cluster, be sure that at least one of these traits is true:

The user exposing the tables has READ_METADATA and SELECT permissions on the target views/tables (specifically if Table ACLs are enabled).
The user exposing the tables is listed in the immuta.spark.acl.whitelist configuration on the target cluster.
The user exposing the tables is a Databricks workspace administrator.

Databricks Unity Catalog integration

When exposing a table from Databricks Unity Catalog, be sure the credentials used to register the data sources have the Databricks privileges listed below.

The following privileges on the parent catalogs and schemas of those tables:
- SELECT
- USE CATALOG
- USE SCHEMA
USE SCHEMA on system.information_schema

Azure Databricks Unity Catalog limitation

Set all table-level ownership on your Unity Catalog data sources to an individual user or service principal instead of a Databricks group before proceeding. Otherwise, Immuta cannot apply data policies to the table in Unity Catalog. See the Azure Databricks Unity Catalog limitation for details.

Enter connection information

Use SSL

Although not required, it is recommended that all connections use SSL. Additional connection string arguments may also be provided.

Navigate to the Data Sources list page and click Register Data Source.
Select the Databricks tile in the Data Platform section. When exposing a table or view from an Immuta-enabled Databricks cluster, be sure that at least one of these traits is true:
- The user exposing the tables has READ_METADATA and SELECT permissions on the target views/tables (specifically if Table ACLs are enabled).
- The user exposing the tables is listed in the `immuta.spark.acl.whitelist` configuration on the target cluster.
- The user exposing the tables is a Databricks workspace administrator.
Complete the first four fields in the Connection Information box:
- Server: hostname or IP address
- Port: port configured for Databricks, typically port 443
- SSL: when enabled, ensures communication between Immuta and the remote database is encrypted
- Database: the remote database
Select your authentication method from the dropdown:
- Access Token:
  1. Enter your Databricks API Token. Use a non-expiring token so that access to the data source is not lost unexpectedly.
  2. Enter the HTTP Path of your Databricks cluster or SQL warehouse.
- OAuth machine-to-machine (M2M):
  1. Enter the HTTP Path of your Databricks cluster or SQL warehouse.
  2. Fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.cloud.databricks.com/oidc/v1/token.
  3. Fill out the Client ID. This is a combination of letters, numbers, or symbols, used as a public identifier and is the same as the service principal's application ID.
  4. Enter the Scope (string). The scope limits the operations and roles allowed in Databricks by the access token. See the OAuth 2.0 documentation for details about scopes.
  5. Enter the Client Secret. Immuta uses this secret to authenticate with the authorization server when it requests a token.
If you are using a proxy server with Databricks, specify it in the Additional Connection String Options:
```
UseProxy=1;ProxyHost=my.host.com;ProxyPort=6789
```
Click Test Connection.

Further considerations

Immuta pushes down joins to be processed on the remote database when possible. To ensure this happens, make sure the connection information matches between data sources, including host, port, ssl, username, and password. You will see performance degradation on joins against the same database if this information doesn't match.
If a client certificate is required to connect to the source database, you can add it in the Upload Certificates section.

Select virtual population

Decide how to virtually populate the data source by selecting one of the options:

Create sources for all tables in this database: This option will create data sources and keep them in sync for every table in the dataset. New tables will be automatically detected and new Immuta views will be created.
Schema / Table: This option will allow you to specify tables or datasets that you want Immuta to register.
1. Opt to Edit in the table selection box that appears.
2. By default, all schemas and tables are selected. Select and deselect by clicking the checkbox for the schemas in the Import Schemas/Tables modal. You can create multiple data sources at one time by selecting an entire schema or multiple tables.
3. After making your selection(s), click Apply.

Enter basic information

Enter the SQL Schema Name Format to be the SQL name that the data source exists under in Immuta. It must include a schema macro but you may personalize it using lowercase letters, numbers, and underscores to personalize the format. It may have up to 255 characters.
Enter the Schema Project Name Format to be the name of the schema project in the Immuta UI. If you enter a name that already exists, the name will automatically be incremented. For example, if the schema project Customer table already exists and you enter that name in this field, the name for this second schema project will automatically become Customer table 2 when you create it.
1. When selecting Create sources for all tables in this database and monitor for changes you may personalize this field as you wish, but it must include a schema macro.
2. When selecting Schema/Table this field is prepopulated with the recommended project name and you can edit freely.
Select the Data Source Name Format, which will be the format of the name of the data source in the Immuta UI.
- <Tablename>: The data source name will be the name of the remote table, and the case of the data source name will match the case of the macro.
- <Schema><Tablename>: The data source name will be the name of the remote schema followed by the name of the remote table, and the case of the data source name will match the cases of the macros.
- Custom: Enter a custom template for the Data Source Name. You may personalize this field as you wish, but it must include a tablename macro. The case of the macro will apply to the data source name (i.e., <Tablename> will result in "Data Source Name," <tablename> will result in "data source name," and <TABLENAME> will result in "DATA SOURCE NAME").
Enter the SQL Table Name Format, which will be the format of the name of the table in Immuta. It must include a table name macro, but you may personalize the format using lowercase letters, numbers, and underscores. It may have up to 255 characters.

Enable or disable schema monitoring

Note: This step will only appear if all tables within a server have been selected for creation.

Schema monitoring best practices

Schema monitoring is a powerful tool that ensures tables are all governed by Immuta.

Consider using schema monitoring later in your onboarding process, not during your initial setup and configuration when tables are not in a stable state.
Consider using Immuta’s API to either run the schema monitoring job when your ETL process adds new tables or to add new tables.
Activate the new column added templated global policy to protect potentially sensitive data. This policy will null the new columns until a data owner reviews new columns that have been added, protecting your data to avoid data leaks on new columns getting added without being reviewed first.

Generate your Immuta API Key from your user profile page. The Immuta API key used in the Databricks notebook job for schema detection must either belong to an Immuta admin or the user who owns the schema detection groups that are being targeted.
On the data source creation page, click the checkbox to enable Schema Monitoring or Detect Column Changes.
Click Download Schema Job Detection Template and then the Click Here To Download text.
Before you can run the script, follow the Databricks documentation to create the scope and secret using the Immuta API Key generated on your user profile page.
Import the Python script you downloaded into a Databricks workspace as a notebook. Note: The job template has commented out lines for specifying a particular database or table. With those two lines commented out, the schema detection job will run against ALL databases and tables in Databricks. Additionally, if you need to add proxy configuration to the job template, the template uses the Python requests library, which has a simple mechanism for configuring proxies for a request.
Schedule the script as part of a notebook job to run as often as required. Each time the job runs, it will make an API call to Immuta to trigger schema detection queries, and these queries will run on the cluster from which the request was made. Note: Use the api_immuta cluster for this job. The job in Databricks must use an Existing All-Purpose Cluster so that Immuta can connect to it over ODBC. Job clusters do not support ODBC connections.

Opt to configure advanced settings

Although not required, completing these steps will help maximize the utility of your data source. Otherwise, click Create to save the data source.

Column detection

This setting monitors when remote tables' columns have been changed, updates the corresponding data sources in Immuta, and notifies Data Owners of these changes.

To enable, select the checkbox in this section.

See the Schema projects overview page to learn more about column detection.

Event time

Click the Edit button in the Event Time section.
Select the column(s).
Click Apply.

Selecting an Event Time column will enable

more statistics to be calculated for this data source including the most recent record time, which is used for determining the freshness of the data source.
the creation of time-based restrictions in the policy builder.

Latency

Click Edit in the Latency section.
Complete the Set Time field, and then select MINUTES, HOURS, or DAYS from the subsequent dropdown menu.
Click Apply.

Sensitive data discovery

Data owners can disable sensitive data discovery for their data sources in this section.

Click Edit in this section.
Select Enabled or Disabled in the window that appears, and then click Apply.

Data source tags

To add tags,

Click the Edit button in the Data Source Tags section.
Begin typing in the Search by Tag Name box to select your tag, and then click Add.

Tags can also be added after you create your data source from the data source details page on the overview tab or the data dictionary tab.

Create the data source

Click Create to save the data source(s).

Google BigQuery Data Source

Private preview: Google BigQuery is available to select accounts. Reach out to your Immuta representative for details.

Requirements

CREATE_DATA_SOURCE Immuta permission
Google BigQuery roles:
- roles/bigquery.metadataViewer on the source table (if managed at that level) or dataset
- roles/bigquery.dataViewer (or higher) on the source table (if managed at that level) or dataset
- roles/bigquery.jobUser on the project

Prerequisite

Configure the Google BigQuery integration

Create a Google Cloud service account for creating Google BigQuery data sources

Google BigQuery data sources in Immuta must be created using a Google Cloud service account rather than a Google Cloud user account. If you do not currently have a service account for the Google Cloud project separate from the Google Cloud service account you created when configuring the Google BigQuery integration, you must create a Google Cloud service account with privileges to view and run queries against the tables you are protecting.

You have two options to create the required Google Cloud service account:

Create a service account by using Google Cloud Console.
Create a service account by using gcloud.

Create a service account using the Google Cloud web console

Using the Google Cloud documentation, create a service account with the following roles:
- BigQuery User
- BigQuery Data Viewer
Using the Google Cloud documentation, generate a service account key for the account you just created.

Create a service account using gcloud

Copy the script below and update the SERVICE_ACCOUNT, PROJECT_ID, and IMMUTA_GCP_KEY_FILE values.
- SERVICE_ACCOUNT is the name for the new service account.
- PROJECT_ID is the project ID for the Google Cloud Project that is integrated with Immuta.
- IMMUTA_GCP_KEY_FILE is the path to a new output file for the private key.

Use the script below in the gcloud command line. This script is a template; change values as necessary:

# Fill these out
# Please use .json extension for key
export SERVICE_ACCOUNT=datasource-account
export PROJECT_ID=project123
export IMMUTA_GCP_KEY_FILE=~/GCP_${SERVICE_ACCOUNT}_key.json

# Create service account for creating data sources
gcloud iam service-accounts create ${SERVICE_ACCOUNT} --project ${PROJECT_ID}

# Generate keyfile
gcloud iam service-accounts keys create ${IMMUTA_GCP_KEY_FILE} --iam-account=${SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com

# Allow account to execute queries
#gcloud projects add-iam-policy-binding ${PROJECT_ID} \
#--member="serviceAccount:${SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com" --role=projects/${PROJECT_ID}/roles/bigquery.user
gcloud projects add-iam-policy-binding ${PROJECT_ID} \
--member="serviceAccount:${SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com" --role=roles/bigquery.user

# Allow account to view data
gcloud projects add-iam-policy-binding ${PROJECT_ID} \
--member="serviceAccount:${SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com" --role=roles/bigquery.dataViewer

echo if something went wrong and you want to delete the service account, run:
echo gcloud iam service-accounts delete ${SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com --project ${PROJECT_ID}

Register data sources in Immuta

Required Google BigQuery roles

Ensure that the user creating the data source has these Google BigQuery roles:

roles/bigquery.metadataViewer on the source table (if managed at that level) or dataset
roles/bigquery.dataViewer (or higher) on the source table (if managed at that level) or dataset
roles/bigquery.jobUser on the project

Navigate to the Data Sources list page.
Click Register Data Source.
Select the Google BigQuery tile in the Data Platform section.
Complete these fields in the Connection Information box:
- Account Email Address: Enter the email address of a user with access to the dataset and tables. This is the account created in the Google BigQuery configuration guide.
- Project: Enter the name of the project that has been integrated with Immuta.
- Dataset: Enter the name of the dataset with the tables you want Immuta to ingest.
Upload a BigQuery Key File in the modal. Note that the account in the key file must match the account email address entered in the previous step.
Click the Test Connection button. If the connection is successful, a check mark and successful connection notification will appear and you will be able to proceed. If an error occurs when attempting to connect, the error will be displayed in the UI. In order to proceed to the next step of data source creation, you must be able to connect to this data source using the connection information that you just entered.
Decide how to virtually populate the data source by selecting one of the options:
- Create sources for all tables in this database: This option will create data sources and keep them in sync for every table in the dataset. New tables will be automatically detected and new Immuta views will be created.
- Schema / Table: This option will allow you to specify tables or datasets that you want Immuta to register.
Provide basic information about your data source to make it discoverable to users.
- Enter the SQL Schema Name Format to be the SQL name that the data source exists under in Immuta. For BigQuery the schema will be the BigQuery dataset. The format must include a schema macro but you may personalize it using lowercase letters, numbers, and underscores to personalize the format. It can have up to 255 characters.
- Enter the Schema Project Name Format to be the name of the schema project in the Immuta UI. This is an Immuta project that will hold all of the metadata for the tables in a single dataset.
  - When selecting Create sources for all tables in this database and monitor for changes, you may personalize this field as you wish, but it must include a schema macro to represent the dataset name.
  - When selecting Schema/Table, this field is pre-populated with the recommended project name and you can edit freely.
- Select the Data Source Name Format, which will be the format of the name of the data source in the Immuta UI.
  - <Tablename>: The Immuta data source will have the same name as the original table.
  - <Schema><Tablename>: The Immuta data source will have both the dataset and original table name.
  - Custom: This is a template you create to make the data source name. You may personalize this field as you wish, but it must include a tablename macro. The case of the macro will apply to the data source name (i.e., <Tablename> will result in "Data Source Name," <tablename> will result in "data source name," and <TABLENAME> will result in "DATA SOURCE NAME").
- Enter the SQL Table Name Format, which will be the format of the name of the table in Immuta. It must include a table name macro, but you may personalize the format using lowercase letters, numbers, and underscores. It may have up to 255 characters.
When selecting the Schema/Table option, you can opt to enable schema monitoring by selecting the checkbox in this section. This step will only appear if all tables within a server have been selected for creation.
Optional Advanced Settings:
- Column Detection: To enable, select the checkbox in this section. This setting monitors when remote tables' columns have been changed, updates the corresponding data sources in Immuta, and notifies data owners of these changes. See schema projects overview to learn more about column detection.
- Data Source Tags: Adding tags to your data source allows users to search for the data source using the tags and governors to apply global policies to the data source. Note if schema detection is enabled, any tags added now will also be added to the tables that are detected.
  - Click the Edit button in the Data Source Tags section.
  - Begin typing in the Search by Tag Name box to select your tag, and then click Add.
Click Create to save the data source(s).

Next steps

With data sources registered in Immuta, your organization can now start

building global subscription and data policies to govern data.
creating projects to collaborate.

Redshift Data Source

Redshift data sources

Redshift Spectrum data sources must be registered via the Immuta CLI or V2 API using this payload.
Registering Redshift datashares as Immuta data sources is unsupported.

Requirement

The enable_case_sensitive_identifier parameter must be set to false (default setting) for your Redshift cluster.

Enter connection information

Navigate to the Data Sources list page and click Register Data Source.
Select the Redshift tile in the Data Platform section.
Complete these fields in the Connection Information box:
- Server: hostname or IP address
- Port: port configured for Redshift, typically port 5439
- SSL: when enabled, ensures communication between Immuta and the remote database is encrypted
- Database: the remote database
- Username: the username to use to connect to the remote database and retrieve records for this data source
- Password: the password to use with the above username to connect to the remote database
You can then choose to enter Additional Connection String Options or Upload Certificates to connect to the database.
Click the Test Connection button.

Use SSL

Although not required, it is recommended that all connections use SSL. Additional connection string arguments may also be provided.

Further considerations

Immuta pushes down joins to be processed on the remote database when possible. To ensure this happens, make sure the connection information matches between data sources, including host, port, ssl, username, and password. You will see performance degradation on joins against the same database if this information doesn't match.
If a client certificate is required to connect to the source database, you can add it in the Upload Certificates section.

Select virtual population

Decide how to virtually populate the data source by selecting one of the options:

Create sources for all tables in this database: This option will create data sources and keep them in sync for every table in the dataset. New tables will be automatically detected and new Immuta views will be created.
Schema / Table: This option will allow you to specify tables or datasets that you want Immuta to register.
1. Opt to Edit in the table selection box that appears.
2. By default, all schemas and tables are selected. Select and deselect by clicking the checkbox for the schemas in the Import Schemas/Tables modal. You can create multiple data sources at one time by selecting an entire schema or multiple tables.
3. After making your selection(s), click Apply.

Enter basic information

Enter the SQL Schema Name Format to be the SQL name that the data source exists under in Immuta. It must include a schema macro but you may personalize it using lowercase letters, numbers, and underscores to personalize the format. It may have up to 255 characters.
Enter the Schema Project Name Format to be the name of the schema project in the Immuta UI. If you enter a name that already exists, the name will automatically be incremented. For example, if the schema project Customer table already exists and you enter that name in this field, the name for this second schema project will automatically become Customer table 2 when you create it.
1. When selecting Create sources for all tables in this database and monitor for changes you may personalize this field as you wish, but it must include a schema macro.
2. When selecting Schema/Table this field is prepopulated with the recommended project name and you can edit freely.
Select the Data Source Name Format, which will be the format of the name of the data source in the Immuta UI.
- <Tablename>: The data source name will be the name of the remote table, and the case of the data source name will match the case of the macro.
- <Schema><Tablename>: The data source name will be the name of the remote schema followed by the name of the remote table, and the case of the data source name will match the cases of the macros.
- Custom: Enter a custom template for the Data Source Name. You may personalize this field as you wish, but it must include a tablename macro. The case of the macro will apply to the data source name (i.e., <Tablename> will result in "Data Source Name," <tablename> will result in "data source name," and <TABLENAME> will result in "DATA SOURCE NAME").
Enter the SQL Table Name Format, which will be the format of the name of the table in Immuta. It must include a table name macro, but you may personalize the format using lowercase letters, numbers, and underscores. It may have up to 255 characters.

Enable or disable schema monitoring

Schema monitoring best practices

Schema monitoring is a powerful tool that ensures tables are all governed by Immuta.

Consider using schema monitoring later in your onboarding process, not during your initial setup and configuration when tables are not in a stable state.
Consider using Immuta’s API to either run the schema monitoring job when your ETL process adds new tables or to add new tables.
Activate the new column added templated global policy to protect potentially sensitive data. This policy will null the new columns until a data owner reviews new columns that have been added, protecting your data to avoid data leaks on new columns getting added without being reviewed first.

When selecting the Schema/Table option, you can opt to enable Schema Monitoring by selecting the checkbox in this section.

Note: This step will only appear if all tables within a server have been selected for creation.

Opt to configure advanced settings

Although not required, completing these steps will help maximize the utility of your data source. Otherwise, click Create to save the data source.

Column detection

This setting monitors when remote tables' columns have been changed, updates the corresponding data sources in Immuta, and notifies Data Owners of these changes.

To enable, select the checkbox in this section.

See the Schema projects overview page to learn more about column detection.

Event time

Click the Edit button in the Event Time section.
Select the column(s).
Click Apply.

Selecting an Event Time column will enable

more statistics to be calculated for this data source including the most recent record time, which is used for determining the freshness of the data source.
the creation of time-based restrictions in the policy builder.

Latency

Click Edit in the Latency section.
Complete the Set Time field, and then select MINUTES, HOURS, or DAYS from the subsequent dropdown menu.
Click Apply.

Sensitive data discovery

Data owners can disable sensitive data discovery for their data sources in this section.

Click Edit in this section.
Select Enabled or Disabled in the window that appears, and then click Apply.

Data source tags

To add tags,

Click the Edit button in the Data Source Tags section.
Begin typing in the Search by Tag Name box to select your tag, and then click Add.

Tags can also be added after you create your data source from the data source details page on the overview tab or the data dictionary tab.

Create the data source

Click Create to save the data source(s).

Snowflake Data Source

This page details how to register Snowflake data sources using the existing workflow. To register data sources using the connections, see this how-to guide.

Requirements

CREATE_DATA_SOURCE Immuta permission
USAGE Snowflake privilege on the schema and database
REFERENCES Snowflake privilege on the tables

Snowflake imported databases

Immuta does not support Snowflake tables from imported databases. Instead, create a view of the table and register that view as a data source.

Enter connection information

Use SSL

Although not required, all connections should use SSL. Additional connection string arguments may also be provided.

Navigate to the Data Sources list page and click Register Data Source.
Select the Snowflake tile in the Data Platform section.
Complete these fields in the Connection Information box:
- Server: hostname or IP address
- Port: port configured for Snowflake, typically port 443
- SSL: when enabled, ensures communication between Immuta and the remote database is encrypted
- Warehouse: Snowflake warehouse that contains the remote database
- Database: remote database
From the Select Authentication Method Dropdown, select either Username and Password, Key Pair Authentication or Snowflake External OAuth:
- Username and Password
  1. Enter a Username. This username will be used to connect to the remote database and retrieve records for this data source.
  2. Enter a Password. This password will be used with the above username to connect to the remote database.
  3. You can then choose to enter Additional Connection String Options or Upload Certificates to connect to the database.
- Key Pair Authentication
  1. Enter a Username. This username will be used to connect to the remote database and retrieve records for this data source.
  2. Opt to enter the private key file password in the Additional Connection String Options. Use the following format: PRIV_KEY_FILE_PWD=<your_pw>.
  3. Click Select a File, and upload a Snowflake key pair file.
- Snowflake External OAuth
  1. Fill out the Token Endpoint, which is where the generated token is sent.
  2. Fill out the Client ID, which is the subject of the generated token.
  3. To use a certificate, keep the Use Certificate checkbox enabled and complete the steps below. You cannot pass a client secret if you use this method for obtaining the access token.
    Opt to fill out the Resource field with a URI of the resource where the requested token will be used.
    Enter the x509 Certificate Thumbprint. This identifies the corresponding key to the token and is often abbreviated as x5t or is called sub (Subject).
    Upload the PEM Certificate, which is the client certificate that is used to sign the authorization request.
  4. To pass a client secret, uncheck the Use Certificate checkbox and complete the fields below. You cannot use a certificate if you use this method for obtaining the access token.
    Scope (string): The scope limits the operations and roles allowed in Snowflake by the access token. See the Snowflake documentation for details about creating scopes for External OAuth.
    Client Secret (string): Immuta uses this secret to authenticate with the authorization server when it requests a token.
Click the Test Connection button.

Considerations

Immuta pushes down joins to be processed on the remote database when possible. To ensure this happens, make sure the connection information matches between data sources, including host, port, ssl, username, and password. You will see performance degradation on joins against the same database if this information doesn't match.
If a client certificate is required to connect to the source database, you can add it in the Upload Certificates section.

File naming convention

If you are uploading more than one file, ensure the certificate used for the OAuth authentication has the key name "oauth client certificate."

Select virtual population

Decide how to virtually populate the data source by selecting one of the options:

Create sources for all tables in this database: This option will create data sources and keep them in sync for every table in the dataset. New tables will be automatically detected and new Immuta views will be created.
Schema / Table: This option will allow you to specify tables or datasets that you want Immuta to register.
1. Opt to Edit in the table selection box that appears.
2. By default, all schemas and tables are selected. Select and deselect by clicking the checkbox for the schemas in the Import Schemas/Tables modal. You can create multiple data sources at one time by selecting an entire schema or multiple tables.
3. After making your selection(s), click Apply.

Enter basic information

Enter the SQL Schema Name Format to be the SQL name that the data source exists under in Immuta. It must include a schema macro but you may personalize it using lowercase letters, numbers, and underscores to personalize the format. It may have up to 255 characters.
Enter the Schema Project Name Format to be the name of the schema project in the Immuta UI. If you enter a name that already exists, the name will automatically be incremented. For example, if the schema project Customer table already exists and you enter that name in this field, the name for this second schema project will automatically become Customer table 2 when you create it.
1. When selecting Create sources for all tables in this database and monitor for changes you may personalize this field as you wish, but it must include a schema macro.
2. When selecting Schema/Table this field is prepopulated with the recommended project name and you can edit freely.
Select the Data Source Name Format, which will be the format of the name of the data source in the Immuta UI.
- <Tablename>: The data source name will be the name of the remote table, and the case of the data source name will match the case of the macro.
- <Schema><Tablename>: The data source name will be the name of the remote schema followed by the name of the remote table, and the case of the data source name will match the cases of the macros.
- Custom: Enter a custom template for the Data Source Name. You may personalize this field as you wish, but it must include a tablename macro. The case of the macro will apply to the data source name (i.e., <Tablename> will result in "Data Source Name," <tablename> will result in "data source name," and <TABLENAME> will result in "DATA SOURCE NAME").

Enable or disable schema monitoring

Schema monitoring best practices

Schema monitoring is a powerful tool that ensures tables are all governed by Immuta.

Consider using schema monitoring later in your onboarding process, not during your initial setup and configuration when tables are not in a stable state.
Consider using Immuta’s API to either run the schema monitoring job when your ETL process adds new tables or to add new tables.
Activate the new column added templated global policy to protect potentially sensitive data. This policy will null the new columns until a data owner reviews new columns that have been added, protecting your data to avoid data leaks on new columns getting added without being reviewed first.

When selecting the Schema/Table option, opt to enable Schema Monitoring by selecting the checkbox in this section.

Note: This step will only appear if all tables within a server have been selected for creation.

Opt to configure advanced settings

Although not required, completing these steps will help maximize the utility of your data source. Otherwise, click Create to save the data source.

Column detection

This setting monitors when remote tables' columns have been changed, updates the corresponding data sources in Immuta, and notifies Data Owners of these changes.

To enable, select the checkbox in this section.

See the Schema projects overview page to learn more about column detection.

Event time

Click the Edit button in the Event Time section.
Select the column(s).
Click Apply.

Selecting an Event Time column will enable

more statistics to be calculated for this data source including the most recent record time, which is used for determining the freshness of the data source.
the creation of time-based restrictions in the policy builder.

Latency

Click Edit in the Latency section.
Complete the Set Time field, and then select MINUTES, HOURS, or DAYS from the subsequent dropdown menu.
Click Apply.

Sensitive data discovery

Data owners can disable sensitive data discovery for their data sources in this section.

Click Edit in this section.
Select Enabled or Disabled in the window that appears, and then click Apply.

Data source tags

To add tags,

Click the Edit button in the Data Source Tags section.
Begin typing in the Search by Tag Name box to select your tag, and then click Add.

Tags can also be added after you create your data source from the data source details page on the overview tab or the data dictionary tab.

Create the data source

Click Create to register your data source.

Bulk Create Snowflake Data Sources

Private preview

This feature is only available to select accounts. Reach out to your Immuta representative to enable this feature.

Requirements

Snowflake Enterprise Edition
Snowflake X-Large or Large warehouse is strongly recommended

Create Snowflake data sources

Set the to None for bulk data source creation. This will simplify the data source creation process by not automatically applying policies.
Make a request to the Immuta V2 API , as the Immuta UI does not support creating more than 1000 data sources. The following options must be specified in your request to ensure the maximum performance benefits of bulk data source creation. The Skip Stats Job tag is only required if you are using ; otherwise, Snowflake data sources automatically skip the stats job.
```
"options": {
    "disableSensitiveDataDiscovery": true,
    "tableTags": [
        "Skip Stats Job"
    ]
}
```

Specifying disableSensitiveDataDiscovery as true ensures that will not be applied when the new data sources are created in Immuta, regardless of how it is configured for the Immuta tenant. Disabling sensitive data discovery improves performance during data source creation.

Applying the Skip Stats Job tag using the tableTag value will ensure that some jobs that are not vital to data source creation are skipped, specifically the fingerprint and high cardinality check jobs.

When the Snowflake bulk data source creation feature is configured, the create data source endpoint operates asynchronously and responds immediately with a bulkId that can be used for monitoring progress.

Monitor progress

To monitor the progress of the background jobs for the bulk data source creation, make the following request using the bulkId from the response of the previous step:

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    --data @example_payload.json
    https://your-immuta-url.com/jobs?bulkId=<your-bulkId>

The response will contain a list of job states and the number of jobs currently in each state. If errors were encountered during processing, a list of errors will be included in the response:

    {
      "total":"99893",
      "completed":"99892",
      "failed":"0",
      "pending":"1",
      "errors":null
    }

With these recommended configurations, bulk creating 100,000 Snowflake data sources will take between six and seven hours for all associated jobs to complete.

Create a Starburst (Trino) Data Source

Using OAuth authentication to create Starburst (Trino) data sources

If you are using OAuth or asynchronous authentication to create Starburst (Trino) data sources, work with your Immuta representative to configure the globalAdminUsername property. See the Starburst (Trino) reference page for details.

Enter connection information

Navigate to the Data Sources list page and click Register Data Source.
Select the Starburst (Trino) tile in the Data Platform section.
Complete these fields in the Connection Information box:
- Server: hostname or IP address
- Port: port configured for Starburst (Trino)
- SSL: when enabled, ensures communication between Immuta and the remote database is encrypted
- Catalog: the remote catalog
- Username: the username to use to connect to the remote database and retrieve records for this data source
- Password: the password to use with the above username to connect to the remote database
If you are using a proxy server with Starburst (Trino), specify it in the Additional Connection String Options:
```
UseProxy=1;ProxyHost=my.host.com;ProxyUID=your-username;ProxyPort=6789;ProxyPwd=your-password
```
Opt to Upload Certificates to connect to the database.
Click the Test Connection button.

Using OAuth authentication to create Starburst (Trino) data sources

Use SSL

Although not required, it is recommended that all connections use SSL. Additional connection string arguments may also be provided.

Considerations

Immuta pushes down joins to be processed on the remote database when possible. To ensure this happens, make sure the connection information matches between data sources, including host, port, ssl, username, and password. You will see performance degradation on joins against the same database if this information doesn't match.
If a client certificate is required to connect to the source database, you can add it in the Upload Certificates section.

Select virtual population

Decide how to virtually populate the data source by selecting one of the options:

Create sources for all tables in this database: This option will create data sources and keep them in sync for every table in the dataset. New tables will be automatically detected and new Immuta views will be created.
Schema / Table: This option will allow you to specify tables or datasets that you want Immuta to register.
1. Opt to Edit in the table selection box that appears.
2. By default, all schemas and tables are selected. Select and deselect by clicking the checkbox for the schemas in the Import Schemas/Tables modal. You can create multiple data sources at one time by selecting an entire schema or multiple tables.
3. After making your selection(s), click Apply.

Enter basic information

Enter the SQL Schema Name Format to be the SQL name that the data source exists under in Immuta. It must include a schema macro but you may personalize it using lowercase letters, numbers, and underscores to personalize the format. It may have up to 255 characters.
Enter the Schema Project Name Format to be the name of the schema project in the Immuta UI. If you enter a name that already exists, the name will automatically be incremented. For example, if the schema project Customer table already exists and you enter that name in this field, the name for this second schema project will automatically become Customer table 2 when you create it.
1. When selecting Create sources for all tables in this database and monitor for changes you may personalize this field as you wish, but it must include a schema macro.
2. When selecting Schema/Table this field is prepopulated with the recommended project name and you can edit freely.
Select the Data Source Name Format, which will be the format of the name of the data source in the Immuta UI.
- <Tablename>: The data source name will be the name of the remote table, and the case of the data source name will match the case of the macro.
- <Schema><Tablename>: The data source name will be the name of the remote schema followed by the name of the remote table, and the case of the data source name will match the cases of the macros.
- Custom: Enter a custom template for the Data Source Name. You may personalize this field as you wish, but it must include a tablename macro. The case of the macro will apply to the data source name (i.e., <Tablename> will result in "Data Source Name," <tablename> will result in "data source name," and <TABLENAME> will result in "DATA SOURCE NAME").
Enter the SQL Table Name Format, which will be the format of the name of the table in Immuta. It must include a table name macro, but you may personalize the format using lowercase letters, numbers, and underscores. It may have up to 255 characters.

Enable or disable schema monitoring

Schema monitoring best practices

Schema monitoring is a powerful tool that ensures tables are all governed by Immuta.

Consider using schema monitoring later in your onboarding process, not during your initial setup and configuration when tables are not in a stable state.
Consider using Immuta’s API to either run the schema monitoring job when your ETL process adds new tables or to add new tables.
Activate the new column added templated global policy to protect potentially sensitive data. This policy will null the new columns until a data owner reviews new columns that have been added, protecting your data to avoid data leaks on new columns getting added without being reviewed first.

When selecting the Schema/Table option, you can opt to enable Schema Monitoring by selecting the checkbox in this section.

Note: This step will only appear if all tables within a server have been selected for creation.

Opt to configure advanced settings

Although not required, completing these steps will help maximize the utility of your data source. Otherwise, click Create to save the data source.

Column detection

This setting monitors when remote tables' columns have been changed, updates the corresponding data sources in Immuta, and notifies Data Owners of these changes.

To enable, select the checkbox in this section.

See the Schema projects overview page to learn more about column detection.

Event time

Click the Edit button in the Event Time section.
Select the column(s).
Click Apply.

Selecting an Event Time column will enable

more statistics to be calculated for this data source including the most recent record time, which is used for determining the freshness of the data source.
the creation of time-based restrictions in the policy builder.

Latency

Click Edit in the Latency section.
Complete the Set Time field, and then select MINUTES, HOURS, or DAYS from the subsequent dropdown menu.
Click Apply.

Sensitive data discovery

Data owners can disable sensitive data discovery for their data sources in this section.

Click Edit in this section.
Select Enabled or Disabled in the window that appears, and then click Apply.

Data source tags

To add tags,

Click the Edit button in the Data Source Tags section.
Begin typing in the Search by Tag Name box to select your tag, and then click Add.

Tags can also be added after you create your data source from the data source details page on the overview tab or the data dictionary tab.

Create the data source

Click Create to save the data source(s).

Databricks Data Source

This page details how to register Databricks data sources using the existing workflow. To register data sources using connections, see this how-to guide.

Requirements

Databricks Spark integration

When exposing a table or view from an Immuta-enabled Databricks cluster, be sure that at least one of these traits is true:

The user exposing the tables has READ_METADATA and SELECT permissions on the target views/tables (specifically if Table ACLs are enabled).
The user exposing the tables is listed in the immuta.spark.acl.whitelist configuration on the target cluster.
The user exposing the tables is a Databricks workspace administrator.

Databricks Unity Catalog integration

When exposing a table from Databricks Unity Catalog, be sure the credentials used to register the data sources have the Databricks privileges listed below.

The following privileges on the parent catalogs and schemas of those tables:
- SELECT
- USE CATALOG
- USE SCHEMA
USE SCHEMA on system.information_schema

Azure Databricks Unity Catalog limitation

Enter connection information

Use SSL

Although not required, it is recommended that all connections use SSL. Additional connection string arguments may also be provided.

Navigate to the Data Sources list page and click Register Data Source.
Select the Databricks tile in the Data Platform section. When exposing a table or view from an Immuta-enabled Databricks cluster, be sure that at least one of these traits is true:
- The user exposing the tables has READ_METADATA and SELECT permissions on the target views/tables (specifically if Table ACLs are enabled).
- The user exposing the tables is listed in the `immuta.spark.acl.whitelist` configuration on the target cluster.
- The user exposing the tables is a Databricks workspace administrator.
Complete the first four fields in the Connection Information box:
- Server: hostname or IP address
- Port: port configured for Databricks, typically port 443
- SSL: when enabled, ensures communication between Immuta and the remote database is encrypted
- Database: the remote database
Select your authentication method from the dropdown:
- Access Token:
  1. Enter your Databricks API Token. Use a non-expiring token so that access to the data source is not lost unexpectedly.
  2. Enter the HTTP Path of your Databricks cluster or SQL warehouse.
- OAuth machine-to-machine (M2M):
  1. Enter the HTTP Path of your Databricks cluster or SQL warehouse.
  2. Fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.cloud.databricks.com/oidc/v1/token.
  3. Fill out the Client ID. This is a combination of letters, numbers, or symbols, used as a public identifier and is the same as the service principal's application ID.
  4. Enter the Scope (string). The scope limits the operations and roles allowed in Databricks by the access token. See the OAuth 2.0 documentation for details about scopes.
  5. Enter the Client Secret. Immuta uses this secret to authenticate with the authorization server when it requests a token.
If you are using a proxy server with Databricks, specify it in the Additional Connection String Options:
```
UseProxy=1;ProxyHost=my.host.com;ProxyPort=6789
```
Click Test Connection.

Further considerations

Immuta pushes down joins to be processed on the remote database when possible. To ensure this happens, make sure the connection information matches between data sources, including host, port, ssl, username, and password. You will see performance degradation on joins against the same database if this information doesn't match.
If a client certificate is required to connect to the source database, you can add it in the Upload Certificates section.

Select virtual population

Decide how to virtually populate the data source by selecting one of the options:

Create sources for all tables in this database: This option will create data sources and keep them in sync for every table in the dataset. New tables will be automatically detected and new Immuta views will be created.
Schema / Table: This option will allow you to specify tables or datasets that you want Immuta to register.
1. Opt to Edit in the table selection box that appears.
2. By default, all schemas and tables are selected. Select and deselect by clicking the checkbox for the schemas in the Import Schemas/Tables modal. You can create multiple data sources at one time by selecting an entire schema or multiple tables.
3. After making your selection(s), click Apply.

Enter basic information

Enter the SQL Schema Name Format to be the SQL name that the data source exists under in Immuta. It must include a schema macro but you may personalize it using lowercase letters, numbers, and underscores to personalize the format. It may have up to 255 characters.
Enter the Schema Project Name Format to be the name of the schema project in the Immuta UI. If you enter a name that already exists, the name will automatically be incremented. For example, if the schema project Customer table already exists and you enter that name in this field, the name for this second schema project will automatically become Customer table 2 when you create it.
1. When selecting Create sources for all tables in this database and monitor for changes you may personalize this field as you wish, but it must include a schema macro.
2. When selecting Schema/Table this field is prepopulated with the recommended project name and you can edit freely.
Select the Data Source Name Format, which will be the format of the name of the data source in the Immuta UI.
- <Tablename>: The data source name will be the name of the remote table, and the case of the data source name will match the case of the macro.
- <Schema><Tablename>: The data source name will be the name of the remote schema followed by the name of the remote table, and the case of the data source name will match the cases of the macros.
- Custom: Enter a custom template for the Data Source Name. You may personalize this field as you wish, but it must include a tablename macro. The case of the macro will apply to the data source name (i.e., <Tablename> will result in "Data Source Name," <tablename> will result in "data source name," and <TABLENAME> will result in "DATA SOURCE NAME").
Enter the SQL Table Name Format, which will be the format of the name of the table in Immuta. It must include a table name macro, but you may personalize the format using lowercase letters, numbers, and underscores. It may have up to 255 characters.

Enable or disable schema monitoring

Note: This step will only appear if all tables within a server have been selected for creation.

Schema monitoring best practices

Schema monitoring is a powerful tool that ensures tables are all governed by Immuta.

Consider using schema monitoring later in your onboarding process, not during your initial setup and configuration when tables are not in a stable state.
Consider using Immuta’s API to either run the schema monitoring job when your ETL process adds new tables or to add new tables.
Activate the new column added templated global policy to protect potentially sensitive data. This policy will null the new columns until a data owner reviews new columns that have been added, protecting your data to avoid data leaks on new columns getting added without being reviewed first.

Generate your Immuta API Key from your user profile page. The Immuta API key used in the Databricks notebook job for schema detection must either belong to an Immuta admin or the user who owns the schema detection groups that are being targeted.
On the data source creation page, click the checkbox to enable Schema Monitoring or Detect Column Changes.
Click Download Schema Job Detection Template and then the Click Here To Download text.
Before you can run the script, follow the Databricks documentation to create the scope and secret using the Immuta API Key generated on your user profile page.
Import the Python script you downloaded into a Databricks workspace as a notebook. Note: The job template has commented out lines for specifying a particular database or table. With those two lines commented out, the schema detection job will run against ALL databases and tables in Databricks. Additionally, if you need to add proxy configuration to the job template, the template uses the Python requests library, which has a simple mechanism for configuring proxies for a request.
Schedule the script as part of a notebook job to run as often as required. Each time the job runs, it will make an API call to Immuta to trigger schema detection queries, and these queries will run on the cluster from which the request was made. Note: Use the api_immuta cluster for this job. The job in Databricks must use an Existing All-Purpose Cluster so that Immuta can connect to it over ODBC. Job clusters do not support ODBC connections.

Opt to configure advanced settings

Although not required, completing these steps will help maximize the utility of your data source. Otherwise, click Create to save the data source.

Column detection

This setting monitors when remote tables' columns have been changed, updates the corresponding data sources in Immuta, and notifies Data Owners of these changes.

To enable, select the checkbox in this section.

See the Schema projects overview page to learn more about column detection.

Event time

Click the Edit button in the Event Time section.
Select the column(s).
Click Apply.

Selecting an Event Time column will enable

more statistics to be calculated for this data source including the most recent record time, which is used for determining the freshness of the data source.
the creation of time-based restrictions in the policy builder.

Latency

Click Edit in the Latency section.
Complete the Set Time field, and then select MINUTES, HOURS, or DAYS from the subsequent dropdown menu.
Click Apply.

Sensitive data discovery

Data owners can disable sensitive data discovery for their data sources in this section.

Click Edit in this section.
Select Enabled or Disabled in the window that appears, and then click Apply.

Data source tags

To add tags,

Click the Edit button in the Data Source Tags section.
Begin typing in the Search by Tag Name box to select your tag, and then click Add.

Tags can also be added after you create your data source from the data source details page on the overview tab or the data dictionary tab.

Create the data source

Click Create to save the data source(s).