arrow-left

All pages
gitbookPowered by GitBook
1 of 27

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Amazon S3 Data Source

circle-info

Private preview: The Amazon S3 integration is available to select accounts. Contact your Immuta representative for details.

hashtag
Requirement

CREATE_S3_DATA_SOURCE Immuta permission

hashtag
Prerequisite

hashtag
Create a data source

  1. Navigate to the Data Sources list page in Immuta.

  2. Click Register Data Source.

  3. Select the S3 tile in the data platform section.

Select your AWS Account/Region from the dropdown menu.
  • Opt to select a default domain to which data sources will be assigned.

  • Opt to add default tags to the data sources.

  • Click Next.

  • The prefix field is populated with the base path. Add to this prefix to create a data source for a prefix, bucket, or object.

    • If the data source prefix ends in a wildcard (*), it protects all items starting with that prefix. For example, a base location of s3:// and a data source prefix surveys/2024* would protect paths like s3://surveys/2024-internal/research-dept.txt or s3://surveys/2024-customer/april/us.csv.

    • If the data source prefix ends without a wildcard (*), it protects a single object. For example, a base location path of s3:// and a data source prefix of research-data/demographics would only protect the object that exactly matches s3://research-data/demographics.

  • Click Add Prefix, and then click Next.

  • Verify that your prefixes are correct and click Complete Setup.

  • Configure the Amazon S3 integration

    Snowflake Data Source

    circle-exclamation

    Deprecation notice

    Support for registering Snowflake data sources using this legacy workflow has been deprecated. Instead, register your data using connections.

    hashtag
    Requirements

    • CREATE_DATA_SOURCE Immuta permission

    • The Snowflake user registering data sources must have the following privileges on all securables:

      • USAGE on all databases and schemas with registered data sources

    circle-exclamation

    Snowflake imported databases

    Immuta does not support Snowflake tables from imported databases. Instead, create a view of the table and register that view as a data source.

    hashtag
    Enter connection information

    circle-info

    Use SSL

    Although not required, all connections should use SSL. Additional connection string arguments may also be provided.

    Note: Only Immuta uses the connection you provide and injects all policy controls when users query the system. In other words, users always connect through Immuta with policies enforced and have no direct association with this connection.

    1. Navigate to the Data Sources list page and click Register Data Source.

    2. Select the Snowflake tile in the Data Platform section.

    3. Complete these fields in the Connection Information box:

    circle-info

    Considerations

    • Immuta pushes down joins to be processed on the remote database when possible. To ensure this happens, make sure the connection information matches between data sources, including host, port, ssl, username, and password. You will see performance degradation on joins against the same database if this information doesn't match.

    circle-exclamation

    File naming convention

    If you are uploading more than one file, ensure the certificate used for the OAuth authentication has the key name "oauth client certificate."

    hashtag
    Select virtual population

    Decide how to virtually populate the data source by selecting one of the options:

    • Create sources for all tables in this database: This option will create data sources and keep them in sync for every table in the dataset. New tables will be automatically detected and new Immuta views will be created.

    • Schema / Table: This option will allow you to specify tables or datasets that you want Immuta to register.

      1. Opt to Edit in the table selection box that appears.

    hashtag
    Enter basic information

    1. Enter the SQL Schema Name Format to be the SQL name that the data source exists under in Immuta. It must include a schema macro but you may personalize it using lowercase letters, numbers, and underscores to personalize the format. It may have up to 255 characters.

    2. Enter the Schema Project Name Format to be the name of the schema project in the Immuta UI. If you enter a name that already exists, the name will automatically be incremented. For example, if the schema project Customer table already exists and you enter that name in this field, the name for this second schema project will automatically become Customer table 2 when you create it.

    hashtag
    Enable or disable schema monitoring

    circle-info

    Schema monitoring best practices

    Schema monitoring is a powerful tool that ensures tables are all governed by Immuta.

    • Consider using schema monitoring later in your onboarding process, not during your initial setup and configuration when tables are not in a stable state.

    When selecting the Schema/Table option, opt to enable by selecting the checkbox in this section.

    Note: This step will only appear if all tables within a server have been selected for creation.

    hashtag
    Opt to configure advanced settings

    Although not required, completing these steps will help maximize the utility of your data source. Otherwise, click Create to save the data source.

    hashtag
    Column detection

    This setting monitors when remote tables' columns have been changed, updates the corresponding data sources in Immuta, and notifies Data Owners of these changes.

    To enable, select the checkbox in this section.

    See the page to learn more about column detection.

    hashtag
    Event time

    An Event Time column denotes the time associated with records returned from this data source. For example, if your data source contains news articles, the time that the article was published would be an appropriate Event Time column.

    1. Click the Edit button in the Event Time section.

    2. Select the column(s).

    3. Click Apply.

    Selecting an Event Time column will enable

    • more statistics to be calculated for this data source including the most recent record time, which is used for determining the freshness of the data source.

    • the creation of time-based restrictions in the policy builder.

    hashtag
    Latency

    1. Click Edit in the Latency section.

    2. Complete the Set Time field, and then select MINUTES, HOURS, or DAYS from the subsequent dropdown menu.

    3. Click Apply.

    This setting impacts how often Immuta checks for new values in a column that is driving row-level redaction policies. For example, if you are redacting rows based on a country column in the data, and you add a new country, it will not be seen by the Immuta policy until this period expires.

    hashtag
    Sensitive data discovery

    Data owners can disable identification for their data sources in this section.

    1. Click Edit in this section.

    2. Select Enabled or Disabled in the window that appears, and then click Apply.

    hashtag
    Data source tags

    Adding tags to your data source allows users to search for the data source using the tags and Governors to apply Global policies to the data source. Note if Schema Detection is enabled, any tags added now will also be added to the tables that are detected.

    To add tags,

    1. Click the Edit button in the Data Source Tags section.

    2. Begin typing in the Search by Tag Name box to select your tag, and then click Add.

    Tags can also be added after you create your data source from the data source details page on the overview tab or the data dictionary tab.

    hashtag
    Create the data source

    Click Create to register your data source.

    REFERENCES on all tables and views registered in Immuta

  • SELECT on all tables and views registered in Immuta

  • Server: hostname or IP address

  • Port: port configured for Snowflake, typically port 443

  • SSL: when enabled, ensures communication between Immuta and the remote database is encrypted

  • Warehouse: Snowflake warehouse that contains the remote database

  • Database: remote database

  • From the Select Authentication Method Dropdown, select either Username and Password, Key Pair Authentication or Snowflake External OAuth:

    • Username and Password

      1. Enter a Username. This username will be used to connect to the remote database and retrieve records for this data source.

      2. Enter a Password. This password will be used with the above username to connect to the remote database.

      3. You can then choose to enter Additional Connection String Options or Upload Certificates to connect to the database.

      1. Enter a Username. This username will be used to connect to the remote database and retrieve records for this data source.

      2. If using an encrypted private key, enter the private key file password in the Additional Connection String Options. Use the following format: PRIV_KEY_FILE_PWD=<your_pw>.

    • Snowflake External OAuth

      1. Fill out the Token Endpoint, which is where the generated token is sent.

      2. Fill out the Client ID, which is the subject of the generated token.

  • Click the Test Connection button.

  • If a client certificate is required to connect to the source database, you can add it in the Upload Certificates section.

    By default, all schemas and tables are selected. Select and deselect by clicking the checkbox for the schemas in the Import Schemas/Tables modal. You can create multiple data sources at one time by selecting an entire schema or multiple tables.

  • After making your selection(s), click Apply.

  • When selecting Create sources for all tables in this database and monitor for changes you may personalize this field as you wish, but it must include a schema macro.

  • When selecting Schema/Table this field is prepopulated with the recommended project name and you can edit freely.

  • Select the Data Source Name Format, which will be the format of the name of the data source in the Immuta UI.

    • <Tablename>: The data source name will be the name of the remote table, and the case of the data source name will match the case of the macro.

    • <Schema><Tablename>: The data source name will be the name of the remote schema followed by the name of the remote table, and the case of the data source name will match the cases of the macros.

    • Custom: Enter a custom template for the Data Source Name. You may personalize this field as you wish, but it must include a tablename macro. The case of the macro will apply to the data source name (i.e., <Tablename> will result in "Data Source Name," <tablename> will result in "data source name," and <TABLENAME> will result in "DATA SOURCE NAME").

  • Consider using Immuta’s API to either run the schema monitoring job when your ETL process adds new tables or to add new tables.

  • Activate the new column added templated global policy to protect potentially sensitive data. This policy will null the new columns until a data owner reviews new columns that have been added, protecting your data to avoid data leaks on new columns getting added without being reviewed first.

  • Schema Monitoring
    Schema projects overview

    Click Select a File, and upload a Snowflake key pair file.

    To use a certificate, keep the Use Certificate checkbox enabled and complete the steps below. You cannot pass a client secret if you use this method for obtaining the access token.
    1. Opt to fill out the Resource field with a URI of the resource where the requested token will be used.

    2. Enter the x509 Certificate Thumbprint. This identifies the corresponding key to the token and is often abbreviated as x5t or is called sub (Subject).

    3. Upload the PEM Certificate, which is the client certificate that is used to sign the authorization request.

  • To pass a client secret, uncheck the Use Certificate checkbox and complete the fields below. You cannot use a certificate if you use this method for obtaining the access token.

    1. Scope (string): The scope limits the operations and roles allowed in Snowflake by the access token. See the Snowflake documentationarrow-up-right for details about creating scopes for External OAuth.

    2. Client Secret (string): Immuta uses this secret to authenticate with the authorization server when it requests a token.

  • Key Pair Authenticationarrow-up-right

    Azure Synapse Analytics Data Source

    hashtag
    Prerequisites

    If you are using the OAuth authentication method,

    • Ensure that Microsoft Entra ID is on the same account as the Azure Synapse Analytics workspace and dedicated SQL pool.

    • .

    • Select Accounts in this organizational directory only as the account type.

    hashtag
    Enter connection information

    1. Navigate to the Data Sources list page and click Register Data Source.

    2. Select the Azure Synapse Analytics tile in the Data Platform section.

    3. Complete these fields in the Connection Information box:

    circle-info

    Use SSL

    Although not required, it is recommended that all connections use SSL. Additional connection string arguments may also be provided.

    Note: Only Immuta uses the connection you provide and injects all policy controls when users query the system. In other words, users always connect through Immuta with policies enforced and have no direct association with this connection.

    circle-info

    Considerations

    • Immuta pushes down joins to be processed on the remote database when possible. To ensure this happens, make sure the connection information matches between data sources, including host, port, ssl, and credentials. You will see performance degradation on joins against the same database if this information doesn't match.

    hashtag
    Select virtual population

    Decide how to virtually populate the data source by selecting one of the options:

    • Create sources for all tables in this database: This option will create data sources and keep them in sync for every table in the dataset. New tables will be automatically detected and new Immuta views will be created.

    • Schema / Table: This option will allow you to specify tables or datasets that you want Immuta to register.

      1. Opt to Edit in the table selection box that appears.

    hashtag
    Enter basic information

    1. Enter the SQL Schema Name Format to be the SQL name that the data source exists under in Immuta. It must include a schema macro but you may personalize it using lowercase letters, numbers, and underscores to personalize the format. It may have up to 255 characters.

    2. Enter the Schema Project Name Format to be the name of the schema project in the Immuta UI. If you enter a name that already exists, the name will automatically be incremented. For example, if the schema project Customer table already exists and you enter that name in this field, the name for this second schema project will automatically become Customer table 2 when you create it.

    hashtag
    Enable or disable schema monitoring

    circle-info

    Schema monitoring best practices

    Schema monitoring is a powerful tool that ensures tables are all governed by Immuta.

    • Consider using schema monitoring later in your onboarding process, not during your initial setup and configuration when tables are not in a stable state.

    When selecting the Schema/Table option, you can opt to enable by selecting the checkbox in this section.

    Note: This step will only appear if all tables within a server have been selected for creation.

    hashtag
    Opt to configure advanced settings

    Although not required, completing these steps will help maximize the utility of your data source. Otherwise, click Create to save the data source.

    hashtag
    Column detection

    This setting monitors when remote tables' columns have been changed, updates the corresponding data sources in Immuta, and notifies Data Owners of these changes.

    To enable, select the checkbox in this section.

    See the page to learn more about column detection.

    hashtag
    Data source tags

    Adding tags to your data source allows users to search for the data source using the tags and Governors to apply Global policies to the data source. Note if Schema Detection is enabled, any tags added now will also be added to the tables that are detected.

    To add tags,

    1. Click the Edit button in the Data Source Tags section.

    2. Begin typing in the Search by Tag Name box to select your tag, and then click Add.

    Tags can also be added after you create your data source from the data source details page on the overview tab or the data dictionary tab.

    hashtag
    Create the data source

    Click Create to save the data source(s).

    Server: hostname or IP address

  • Port: port configured for Azure Synapse Analytics

  • SSL: when enabled, ensures communication between Immuta and the remote database is encrypted

  • Database: the remote database

  • Select the authentication method:

    1. Username and Password:

      1. Username: The username to use to connect to the remote database and retrieve records for this data source

      2. Password: The password to use with the above username to connect to the remote database

    2. Entra ID OAuth Client Secret: The values below can be found on the overview page of the application you created in Microsoft Entra ID. Before you enter this information, ensure you have completed the

      1. Tenant ID

      2. Client ID

  • You can then choose to enter Additional Connection String Options or Upload Certificates to connect to the database.

  • Click the Test Connection button.

  • If a client certificate is required to connect to the source database, you can add it in the Upload Certificates section.

    By default, all schemas and tables are selected. Select and deselect by clicking the checkbox for the schemas in the Import Schemas/Tables modal. You can create multiple data sources at one time by selecting an entire schema or multiple tables.

  • After making your selection(s), click Apply.

  • When selecting Create sources for all tables in this database and monitor for changes you may personalize this field as you wish, but it must include a schema macro.

  • When selecting Schema/Table this field is prepopulated with the recommended project name and you can edit freely.

  • Select the Data Source Name Format, which will be the format of the name of the data source in the Immuta UI.

    • <Tablename>: The data source name will be the name of the remote table, and the case of the data source name will match the case of the macro.

    • <Schema><Tablename>: The data source name will be the name of the remote schema followed by the name of the remote table, and the case of the data source name will match the cases of the macros.

    • Custom: Enter a custom template for the Data Source Name. You may personalize this field as you wish, but it must include a tablename macro. The case of the macro will apply to the data source name (i.e., <Tablename> will result in "Data Source Name," <tablename> will result in "data source name," and <TABLENAME> will result in "DATA SOURCE NAME").

  • Enter the SQL Table Name Format, which will be the format of the name of the table in Immuta. It must include a table name macro, but you may personalize the format using lowercase letters, numbers, and underscores. It may have up to 255 characters.

  • Consider using Immuta’s API to either run the schema monitoring job when your ETL process adds new tables or to add new tables.

  • Activate the new column added templated global policy to protect potentially sensitive data. This policy will null the new columns until a data owner reviews new columns that have been added, protecting your data to avoid data leaks on new columns getting added without being reviewed first.

  • Set up OAuth via Microsoft Entra ID app registration with a client secretarrow-up-right
    Schema Monitoring
    Schema projects overview

    Databricks Data Source

    circle-exclamation

    Deprecation notice

    Support for registering Databricks Unity Catalog data sources using this legacy workflow has been deprecated. Instead, register your data using connections.

    hashtag
    Requirements

    Databricks Spark integration

    When exposing a table or view from an Immuta-enabled Databricks cluster, be sure that at least one of these traits is true:

    • The user exposing the tables has READ_METADATA and SELECT permissions on the target views/tables (specifically if Table ACLs are enabled).

    • The user exposing the tables is listed in the immuta.spark.acl.allowlist configuration on the target cluster.

    • The user exposing the tables is a Databricks workspace administrator.

    Databricks Unity Catalog integration

    When registering Databricks Unity Catalog securables in Immuta, use the service principal from the integration configuration and ensure it has the privileges listed below. Immuta uses this service principal continuously to orchestrate Unity Catalog policies and maintain state between Immuta and Databricks.

    • USE CATALOG and MANAGE on all catalogs containing securables registered as Immuta data sources.

    • USE SCHEMA on all schemas containing securables registered as Immuta data sources.

    circle-info

    MANAGE and MODIFY are required so that the service principal can apply row filters and column masks on the securable; to do so, the service principal must also have SELECT on the securable as well as USE CATALOG on its parent catalog and USE SCHEMA on its parent schema. Since privileges are inherited, you can grant the service principal the MODIFY and SELECT privilege on all catalogs or schemas containing Immuta data sources, which automatically grants the service principal the

    circle-exclamation

    Azure Databricks Unity Catalog limitation

    Set all table-level ownership on your Unity Catalog data sources to an individual user or service principal instead of a Databricks group before proceeding. Otherwise, Immuta cannot apply data policies to the table in Unity Catalog. See the for details.

    hashtag
    Enter connection information

    circle-info

    Performance recommendations

    • Register entire databases with Immuta and run jobs through the Python script provided during data source registration.

    • Use a Databricks administrator account to register data sources with Immuta using the UI or API; however, you should

    1. Navigate to the Data Sources list page and click Register Data Source.

    2. Select the Databricks tile in the Data Platform section. When exposing a table or view from an Immuta-enabled Databricks cluster, be sure that at least one of these traits is true:

      • The user exposing the tables has READ_METADATA and SELECT permissions on the target views/tables (specifically if Table ACLs are enabled).

    circle-info

    Further considerations

    • Immuta pushes down joins to be processed on the remote database when possible. To ensure this happens, make sure the connection information matches between data sources, including host, port, ssl, username, and password. You will see performance degradation on joins against the same database if this information doesn't match.

    hashtag
    Select virtual population

    Decide how to virtually populate the data source by selecting one of the options:

    • Create sources for all tables in this database: This option will create data sources and keep them in sync for every table in the dataset. New tables will be automatically detected and new Immuta views will be created.

    • Schema / Table: This option will allow you to specify tables or datasets that you want Immuta to register.

      1. Opt to Edit in the table selection box that appears.

    hashtag
    Enter basic information

    1. Enter the SQL Schema Name Format to be the SQL name that the data source exists under in Immuta. It must include a schema macro but you may personalize it using lowercase letters, numbers, and underscores to personalize the format. It may have up to 255 characters.

    2. Enter the Schema Project Name Format to be the name of the schema project in the Immuta UI. If you enter a name that already exists, the name will automatically be incremented. For example, if the schema project Customer table already exists and you enter that name in this field, the name for this second schema project will automatically become Customer table 2 when you create it.

    hashtag
    Enable or disable schema monitoring

    Note: This step will only appear if all tables within a server have been selected for creation.

    circle-info

    Schema monitoring best practices

    Schema monitoring is a powerful tool that ensures tables are all governed by Immuta.

    • Consider using schema monitoring later in your onboarding process, not during your initial setup and configuration when tables are not in a stable state.

    1. Generate your Immuta API Key from your user profile page. The Immuta API key used in the Databricks notebook job for schema detection must either belong to an Immuta admin or the user who owns the schema detection groups that are being targeted.

    2. On the data source creation page, click the checkbox to enable Schema Monitoring or Detect Column Changes.

    3. Click Download Schema Job Detection Template and then the Click Here To Download text.

    hashtag
    Opt to configure advanced settings

    Although not required, completing these steps will help maximize the utility of your data source. Otherwise, click Create to save the data source.

    hashtag
    Column detection

    This setting monitors when remote tables' columns have been changed, updates the corresponding data sources in Immuta, and notifies Data Owners of these changes.

    To enable, select the checkbox in this section.

    See the page to learn more about column detection.

    hashtag
    Event time

    An Event Time column denotes the time associated with records returned from this data source. For example, if your data source contains news articles, the time that the article was published would be an appropriate Event Time column.

    1. Click the Edit button in the Event Time section.

    2. Select the column(s).

    3. Click Apply.

    Selecting an Event Time column will enable

    • more statistics to be calculated for this data source including the most recent record time, which is used for determining the freshness of the data source.

    • the creation of time-based restrictions in the policy builder.

    hashtag
    Latency

    1. Click Edit in the Latency section.

    2. Complete the Set Time field, and then select MINUTES, HOURS, or DAYS from the subsequent dropdown menu.

    3. Click Apply.

    This setting impacts how often Immuta checks for new values in a column that is driving row-level redaction policies. For example, if you are redacting rows based on a country column in the data, and you add a new country, it will not be seen by the Immuta policy until this period expires.

    hashtag
    Sensitive data discovery

    Data owners can disable identification for their data sources in this section.

    1. Click Edit in this section.

    2. Select Enabled or Disabled in the window that appears, and then click Apply.

    hashtag
    Data source tags

    Adding tags to your data source allows users to search for the data source using the tags and Governors to apply Global policies to the data source. Note if Schema Detection is enabled, any tags added now will also be added to the tables that are detected.

    To add tags,

    1. Click the Edit button in the Data Source Tags section.

    2. Begin typing in the Search by Tag Name box to select your tag, and then click Add.

    Tags can also be added after you create your data source from the data source details page on the overview tab or the data dictionary tab.

    hashtag
    Create the data source

    Click Create to save the data source(s).

    Client Secret: Enter the Value of the secret, not the secret ID.

    prerequisites for OAuth authentication listed above.
    MODIFY
    and
    SELECT
    on all securables you want registered as Immuta data sources.
    The
    MODIFY
    privilege is not required for materialized views registered as Immuta data sources, since
    MODIFY
    is not a supported privilege on that object type in
    .
    MODIFY
    and
    SELECT
    privilege on all current and future securables in the catalog or schema. The service principal also inherits
    MANAGE
    from the parent catalog for the purpose of applying row filters and column masks, but that privilege must be set directly on the parent catalog in order for grants to be fully applied.
    not
    test Immuta policies using a Databricks administrator account, as they are able to bypass controls.

    The user exposing the tables is listed in the immuta.spark.acl.allowlist configuration on the target cluster.

  • The user exposing the tables is a Databricks workspace administrator.

  • Complete the first four fields in the Connection Information box:

    • Server: hostname or IP address

    • Port: port configured for Databricks, typically port 443

    • SSL: when enabled, ensures communication between Immuta and the remote database is encrypted. Immuta recommends that all connections use SSL. Additional connection string arguments may also be provided below. Only Immuta uses the connection you provide and injects all policy controls when users query the system. Users always connect through Immuta with policies enforced and have no direct association with this connection.

    • Database: the remote database

  • Select your authentication method from the dropdown:

    • Access Token:

      1. Enter your Databricks API Token. Use a non-expiring token so that access to the data source is not lost unexpectedly.

      2. Enter the HTTP Path of your Databricks cluster or SQL warehouse.

    • OAuth machine-to-machine (M2M):

      1. Enter the HTTP Path of your Databricks cluster or SQL warehouse.

      2. Fill out the Token Endpoint with the full URL of the identity provider. This is where the generated token is sent. The default value is https://<your workspace name>.cloud.databricks.com/oidc/v1/token.

  • If you are using a proxy server with Databricks, specify it in the Additional Connection String Options:

  • Click Test Connection.

  • If a client certificate is required to connect to the source database, you can add it in the Upload Certificates section.

    By default, all schemas and tables are selected. Select and deselect by clicking the checkbox for the schemas in the Import Schemas/Tables modal. You can create multiple data sources at one time by selecting an entire schema or multiple tables.

  • After making your selection(s), click Apply.

  • When selecting Create sources for all tables in this database and monitor for changes you may personalize this field as you wish, but it must include a schema macro.

  • When selecting Schema/Table this field is prepopulated with the recommended project name and you can edit freely.

  • Select the Data Source Name Format, which will be the format of the name of the data source in the Immuta UI.

    • <Tablename>: The data source name will be the name of the remote table, and the case of the data source name will match the case of the macro.

    • <Schema><Tablename>: The data source name will be the name of the remote schema followed by the name of the remote table, and the case of the data source name will match the cases of the macros.

    • Custom: Enter a custom template for the Data Source Name. You may personalize this field as you wish, but it must include a tablename macro. The case of the macro will apply to the data source name (i.e., <Tablename> will result in "Data Source Name," <tablename> will result in "data source name," and <TABLENAME> will result in "DATA SOURCE NAME").

  • Enter the SQL Table Name Format, which will be the format of the name of the table in Immuta. It must include a table name macro, but you may personalize the format using lowercase letters, numbers, and underscores. It may have up to 255 characters.

  • Consider using Immuta’s API to either run the schema monitoring job when your ETL process adds new tables or to add new tables.

  • Activate the new column added templated global policy to protect potentially sensitive data. This policy will null the new columns until a data owner reviews new columns that have been added, protecting your data to avoid data leaks on new columns getting added without being reviewed first.

  • Before you can run the script, follow the Databricks documentationarrow-up-right to create the scope and secret using the Immuta API Key generated on your user profile page.

  • Import the Python script you downloaded into a Databricks workspace as a notebook. Note: The job template has commented out lines for specifying a particular database or table. With those two lines commented out, the schema detection job will run against ALL databases and tables in Databricks. Additionally, if you need to add proxy configuration to the job template, the template uses the Python requests libraryarrow-up-right, which has a simple mechanism for configuring proxies for a request.

  • Schedule the script as part of a notebook job to run as often as required. Each time the job runs, it will make an API call to Immuta to trigger schema detection queries, and these queries will run on the cluster from which the request was made. Note: Use the api_immuta cluster for this job. The job in Databricks must use an Existing All-Purpose Cluster so that Immuta can connect to it over ODBC. Job clusters do not support ODBC connections.

  • Azure Databricks Unity Catalog limitation
    schema monitoring
    Schema projects overview
    Databricksarrow-up-right
    UseProxy=1;ProxyHost=my.host.com;ProxyPort=6789

    Fill out the Client ID. This is a combination of letters, numbers, or symbols, used as a public identifier and is the same as the service principal's application IDarrow-up-right.

  • Enter the Scope (string). The scope limits the operations and roles allowed in Databricks by the access token. See the OAuth 2.0 documentationarrow-up-right for details about scopes.

  • Enter the Client Secret. Immuta uses this secret to authenticate with the authorization server when it requests a token.

  • Manage Data Sources

    As a data owner, you can edit your data source settings and disable, delete, and re-enable a data source.

    For other guides related to data source members and management, see the Related guides section.

    hashtag
    Bulk edit data sources

    Data owners can bulk edit data sources.

    1. Navigate to the Data Sources page.

    2. Select the checkboxes for the data sources you want to edit. Note that when editing a connection string using bulk edit, all data sources from that connection must be selected.

    3. Select the action you want or click More Actions for additional options.

    4. Confirm your edits by following the prompts in the modals that appear.

    hashtag
    Resync policies

    hashtag
    Resync data policies

    Users can manually resync data policies for all Immuta integrations.

    1. Navigate to the Data Sources page.

    2. Select the checkboxes for the data sources you want to sync data policies for.

    3. Click More Actions and select Sync Data Policies.

    hashtag
    Resync grants and data policies

    For Databricks Unity Catalog integrations, users can manually resync both subscription and data policies through the data source health check.

    1. Navigate to the data source you would like to resync policies for.

    2. Click the health status in the top corner.

    3. Click Sync All Policies.

    To sync grants and data policies using the API, see the .

    hashtag
    Disable a data source

    Disabling a data source hides it and its data from all users except the data owner. While in this state, the data source will display as disabled in the console for the data owner and other users will not be able to see it at all.

    1. Navigate to the data source.

    2. Click on the more actions icon and select Disable.

    A label will appear next to the data source indicating it is now disabled, and a notification will be sent to all users of the data source informing them that the data source has been disabled.

    circle-info

    Disabled data sources and Immuta policies

    Disabling a data source for one of the integrations below removes subscription and data policies from that data source; policies will not be applied until the data source is re-enabled:

    • Azure Synapse Analytics

    hashtag
    Enable a disabled data source

    1. Navigate to the data source.

    2. Click on the more actions icon and select Enable.

    A notification will be sent out to all users of the data source informing them that the data source has been enabled. After you enable a data source, existing policies on that data source will take effect.

    hashtag
    Delete a data source

    Deleting a data source permanently removes it from Immuta. Data sources must first be disabled before they can be deleted.

    1. .

    2. Navigate to the data source, click the more actions icon and select Delete.

    3. Confirm that the data source should be deleted by clicking Delete.

    A notification will be sent out to all users of the data source informing them that the data source has been deleted.

    hashtag
    Related guides

    hashtag
    Reference guides

    For information about data sources and policies, see the following guides:

    hashtag
    How-to guides

    In addition to adding and managing data source settings as outlined above, data owners can manage data source

    Databricks Unity Catalog

  • Google BigQuery

  • Redshift

  • Snowflake

  • Subscribe to and manage data sources guide
    Disable the data source
    Data sources in Immuta overview
    Policies in Immuta overview
    column tags
    data dictionaries
    policies
    members

    Run Schema Monitoring and Column Detection Jobs

    hashtag
    Manually Run Schema Monitoring Jobs

    hashtag
    Manually Run Schema Monitoring Job for All Data Sources

    Requirement: Immuta permission USER_ADMIN

    You can manually run a schema monitoring job globally using the with an empty payload.

    hashtag
    Manually Run Schema Monitoring Job as a Data Owner

    You can manually run a schema monitoring job for all data sources that you own using the with a payload containing the hostname for your data sources or their individual IDs.

    hashtag
    Manually Run Schema Monitoring Job as a Data User

    You can manually run a schema monitoring job for data sources you are subscribed to using the with a payload containing the hostname for your data source and the table name or data source ID.

    hashtag
    Manually Run a Column Detection Job

    1. Navigate to the data source overview page.

    2. Click on the health check icon.

    3. Scroll to Column Detection, and click Trigger Detection.

    /dataSource/detectRemoteChanges endpoint of the Immuta API
    /dataSource/detectRemoteChanges endpoint of the Immuta API
    /dataSource/detectRemoteChanges endpoint of the Immuta API

    Bulk Create Snowflake Data Sources

    circle-info

    Private preview: This feature is available to select accounts. Contact your Immuta representative to enable this feature.

    hashtag
    Requirements

    • Snowflake Enterprise Edition

    • Snowflake X-Large or Large warehouse is strongly recommended

    hashtag
    Create Snowflake data sources

    Make a request to the Immuta V2 API , as the Immuta UI does not support creating more than 1000 data sources. The following options must be specified in your request to ensure the maximum performance benefits of bulk data source creation. The Skip Stats Job tag is only required if you are using ; otherwise, Snowflake data sources automatically skip the stats job.

    Specifying disableSensitiveDataDiscovery as true ensures that will not be applied when the new data sources are created in Immuta, regardless of how it is configured for the Immuta tenant. Disabling identification improves performance during data source creation.

    Applying the Skip Stats Job tag using the tableTag value will ensure that some jobs that are not vital to data source creation are skipped, specifically the fingerprint and high cardinality check jobs.

    When the Snowflake bulk data source creation feature is configured, the create data source endpoint operates asynchronously and responds immediately with a bulkId that can be used for monitoring progress.

    hashtag
    Monitor progress

    To monitor the progress of the background jobs for the bulk data source creation, make the following request using the bulkId from the response of the previous step:

    The response will contain a list of job states and the number of jobs currently in each state. If errors were encountered during processing, a list of errors will be included in the response:

    With these recommended configurations, bulk creating 100,000 Snowflake data sources will take between six and seven hours for all associated jobs to complete.

    create data source endpoint
    specific policies that require stats
    identification
    ```json
    "options": {
        "disableSensitiveDataDiscovery": true,
        "tableTags": [
            "Skip Stats Job"
        ]
    }
    ```
    curl \
        --request POST \
        --header "Content-Type: application/json" \
        --header "Authorization: Bearer dea464c07bd07300095caa8" \
        --data @example_payload.json
        https://your-immuta-url.com/jobs?bulkId=<your-bulkId>
        {
          "total":"99893",
          "completed":"99892",
          "failed":"0",
          "pending":"1",
          "errors":null
        }

    Register Data Sources

    When a data source is exposed, policies are dynamically enforced on the data, appropriately redacting and masking information depending on the attributes or groups of the user accessing the data. Once the data source is exposed and subscribed to, the data can be accessed in a consistent manner, allowing reproducibility and collaboration.

    This section includes how-to guides for registering data sources in Immuta:

    • Amazon Redshift Spectrum data source

    • Amazon S3 data source

    Azure Synapse Analytics data source
    Databricks data source
    Google BigQuery data source
    Snowflake data source
    Starburst (Trino) data source

    Manage Data Dictionary Descriptions

    The data dictionary provides information about the columns within the data source, including column names and value types.

    As a data owner, you can manage data dictionary descriptions and column tags. For other guides related to the data dictionary, see the Related guides section.

    hashtag
    Manage data dictionary descriptions

    1. Navigate to the Data Dictionary tab.

    2. To add or edit column descriptions, click the menu icon in the Actions column next to the entry you want to change and select Edit.

    3. Complete the fields in the form that appears, and then click Save.

    hashtag
    Related guides

    hashtag
    Reference guide

    For information about the data dictionary, see the .

    hashtag
    How-to guide

    In addition to managing data dictionary descriptions as outlined above, data owners or experts can also manage .

    Data sources in Immuta overview
    column tags

    Schema Projects

    Schema projects are automatically created and managed by Immuta. They group all the data sources of the schema, and when new data sources are created, manually or with schema monitoring, they are automatically added to the schema project. They work as a tool to organize all the data sources within a schema, which is particularly helpful with schema monitoring enabled.

    Schema projects are created when tables are registered as data sources in Immuta. The user creating the data source does not need the CREATE_PROJECT permission to have the project auto-create because no data sources can be added by the owner. Instead, new data sources are managed by Immuta. The user can manage Subscription policies for schema projects, but they cannot apply Data policies or purposes to them.

    The schema settings, such as schema evolution and connection information, can be edited from the project overview tab. Note: Deleting the project will delete all of the data sources within it as well.

    hashtag
    Schema Project Actions

    Schema settings are edited from the project overview tab:

    • : Editing these details will update them for all the data sources within the schema project.

    • : When schema monitoring is enabled, new data sources will be automatically detected and added to the schema project. Updating the naming convention will change how these newly detected data sources are named by Immuta.

    • : When schema monitoring is enabled, a user is assigned to be the owner of any detected and Immuta created data source.

    : Deleting the project will delete all of the data sources within it as well.

    Schema Project Connection Details
    Data Source Naming Convention
    Schema Detection Owner
    Disable or delete your schema project

    How-to Guides

    Data Source Health Checks Reference Guide

    When an Immuta data source is created, background jobs use the connection information provided to compute health checks dependent on the type of data source created and how it was configured. These data source health checks include the

    • blob crawl status: indicates whether the blob was successfully crawled.

    • column detection status: indicates whether the job run to determine if a column was added or removed from the remote table registered as an Immuta data source was successful.

    • external catalog link status: indicates whether or not the external catalog was successfully linked to the data source.

    • fingerprint generation status: indicates whether or not the data source fingerprint was successfully generated. Fingerprints are only available for Snowflake data sources.

    • framework classification status: indicates whether classification was successfully run on the data source to determine the sensitivity of the data source.

    • global policy applied status: indicates whether global policies were successfully applied to the data source.

    • high cardinality calculation status: indicates whether the data source's high cardinality column was successfully calculated.

    • SQL sync status (for Snowflake data sources): indicates whether Snowflake governance policies have been successfully synced.

    • SQL view creation status (for Amazon Redshift Spectrum, Azure Synapse Analytics, or Google BigQuery data sources): indicates whether views were properly created for tables registered in Immuta.

    • row count status: indicates whether the number of rows in the data source was successfully calculated.

    • schema detection status: indicates whether the job run to determine if a remote table was added or removed from the schema was successful.

    • sensitive data discovery status: indicates whether identification was successfully run on the data source.

    After these jobs complete, the health status for each is updated to indicate whether the status check passed, was skipped, is unknown, or failed.

    These background jobs can be disabled during data source creation by adding a specific tag to prevent automatic table statistics. This prevent statistics tag can be set on the by a system administrator. However, with automatic table statistics disabled these policies will be unavailable until the data source owner :

    • Masking with format preserving masking

    • Masking using randomized response

    hashtag
    Unhealthy Databricks data sources

    Unhealthy data sources may fail their row count queries if they run against a cluster that has the Databricks query watchdog enabled.

    hashtag
    Limitations

    Data sources with over 1600 columns will not have health checks run, but will still appear as healthy. The health check cannot be run automatically or manually.

    app settings page
    manually generates the fingerprint (available only for Snowflake data sources)

    How-to Guides

    Why Use Schema Monitoring Concept Guide

    Immuta is a live metadata aggregator - metadata about your data and your users. With data metadata specifically, Immuta can monitor changes in your database and reflect those changes in your Immuta tenant through schema monitoring.

    When schema monitoring is enabled, Immuta monitors your organization's servers to identify when new tables or columns are created or deleted, and automatically registers (or disables) those tables in Immuta. The newly updated data sources then have global policies and tags applied to them, and the Immuta data dictionary is updated with column changes.

    Schema monitoring keeps Immuta in sync with your data environment, helping you remain compliant without having to manually update individual data sources.

    hashtag
    Anti-patterns: Using Immuta without schema monitoring

    Without schema monitoring, data owners have to manually add and remove Immuta data sources when users add or remove tables from databases in their data platforms. At worst, data owners are not aware of these changes; at best they are aware of the changes and have to manually update Immuta with those changes, which is a time-consuming, error-prone process.

    Beyond draining data owners' time, manually updating data sources to reflect the state of the data platform also complicates the process: not only must they understand when a new table is present, but they then must remember to tag it and protect it appropriately. This leaves organizations ripe for data leaks as new data is created across the business, perhaps daily.

    Schema monitoring, by contrast, is scalable and accounts for the evolution of your schemas and policies. Instead of manually managing access to these tables or adding and removing data sources, you are empowered to register a schema, create policies, and allow Immuta to manage those policies and changes to your schema for you to keep your data in sync and restrict access appropriately.

    hashtag
    Business value

    Both monitoring for new data and align with the , removing redundant and arduous work. Once tables are registered and tagged, policies can immediately be applied - this means humans can be completely removed from the process by creating tag-based policies that dynamically apply themselves to new tables.

    Then, your business reaps the following benefits:

    • Increased revenue: Accelerate data access and time-to-data access because where sensitive data lives is well understood.

    • Decreased cost: Operate efficiently and move with agility at scale.

    • Decreased risk: Discover and protect sensitive data immediately.

    hashtag
    What features does it pair with?

    Schema monitoring pairs with the following features:

    • : Column detection identifies when a column has been added to or removed from a table and adds or removes that column from the data source in Immuta.

    • : When paired with column detection or schema monitoring, this policy locks down access to those newly added columns and tables to prevent data leaks.

    • : When the tables are discovered through the registration process, Immuta evaluates the table data for sensitive information and tags it as such. These tags are critical for scaling tag-based policies.

    : Global data and subscription policies can be created using tags so that they immediately enforce appropriate access restrictions on tables and columns when they are added.

    discovering and tagging sensitive data
    concepts of scalability and evolvability
    Column detection
    New column added templated global policy
    Identification
    Global data and subscription policies

    Data Source Settings

    Once a data source is created, the data owner can manage data source policies, members, data dictionary, and tags.

    The reference and how-to guides in this section cover topics related to managing existing data sources.

    hashtag
    How-to guides

    • : Edit data source settings or disable and delete a data source.

    • : Add, remove, or modify users on a data source.

    • : Approve and deny subscriptions requests on data source.

    • : Manage the data dictionary descriptions and tags.

    • : Disable metadata collection that requires sampling data.

    hashtag
    Reference guide

    : This reference guide defines the data source health check jobs that are run when a data source is created.

    Manage data sources
    Manage data source members
    Manage data source access requests
    Data dictionary
    Disable data sampling
    Data source health checks

    Google BigQuery Data Source

    circle-info

    Private preview: The Google BigQuery integration is available to all accounts.

    hashtag
    Requirements

    • CREATE_DATA_SOURCE Immuta permission

    • Google BigQuery roles:

      • roles/bigquery.metadataViewer on the source table (if managed at that level) or dataset

      • roles/bigquery.dataViewer (or higher) on the source table (if managed at that level) or dataset

    hashtag
    Prerequisite

    hashtag
    Create a Google Cloud service account for creating Google BigQuery data sources

    Google BigQuery data sources in Immuta must be created using a Google Cloud service account rather than a Google Cloud user account. If you do not currently have a service account for the Google Cloud project separate from the Google Cloud service account you created when , you must create a Google Cloud service account with privileges to view and run queries against the tables you are protecting.

    You have two options to create the required Google Cloud service account:

    • .

    • .

    hashtag
    Create a service account using the Google Cloud web console

    1. Using the , create a service account with the following roles:

      • BigQuery User

      • BigQuery Data Viewer

    hashtag
    Create a service account using gcloud

    1. Copy the script below and update the SERVICE_ACCOUNT, PROJECT_ID, and IMMUTA_GCP_KEY_FILE values.

      • SERVICE_ACCOUNT is the name for the new service account.

      • PROJECT_ID is the project ID for the Google Cloud Project that is integrated with Immuta.

    hashtag
    Register data sources in Immuta

    circle-info

    Required Google BigQuery roles

    Ensure that the user creating the data source has these Google BigQuery roles:

    • roles/bigquery.metadataViewer on the source table (if managed at that level) or dataset

    1. Navigate to the Data Sources list page.

    2. Click Register Data Source.

    3. Select the Google BigQuery tile in the Data Platform section.

    hashtag
    Next steps

    With data sources registered in Immuta, your organization can now start

    • building and to govern data.

    • to collaborate.

    roles/bigquery.jobUser on the project

    Using the Google Cloud documentationarrow-up-right, generate a service account key for the account you just created.
  • IMMUTA_GCP_KEY_FILE is the path to a new output file for the private key.

  • Use the script below in the gcloud command line. This script is a template; change values as necessary:

  • roles/bigquery.dataViewer (or higher) on the source table (if managed at that level) or dataset

  • roles/bigquery.jobUser on the project

  • Complete these fields in the Connection Information box:
    • Account Email Address: Enter the email address of a user with access to the dataset and tables. This is the account created in the Google BigQuery configuration guide.

    • Project: Enter the name of the project that has been integrated with Immuta.

    • Dataset: Enter the name of the dataset with the tables you want Immuta to ingest.

  • Upload a BigQuery Key File in the modal. Note that the account in the key file must match the account email address entered in the previous step.

  • Click the Test Connection button. If the connection is successful, a check mark and successful connection notification will appear and you will be able to proceed. If an error occurs when attempting to connect, the error will be displayed in the UI. In order to proceed to the next step of data source creation, you must be able to connect to this data source using the connection information that you just entered.

  • Decide how to virtually populate the data source by selecting one of the options:

    • Create sources for all tables in this database: This option will create data sources and keep them in sync for every table in the dataset. New tables will be automatically detected and new Immuta views will be created.

    • Schema / Table: This option will allow you to specify tables or datasets that you want Immuta to register.

  • Provide basic information about your data source to make it discoverable to users.

    • Enter the SQL Schema Name Format to be the SQL name that the data source exists under in Immuta. For BigQuery the schema will be the BigQuery dataset. The format must include a schema macro but you may personalize it using lowercase letters, numbers, and underscores to personalize the format. It can have up to 255 characters.

    • Enter the Schema Project Name Format to be the name of the schema project in the Immuta UI. This is an Immuta project that will hold all of the metadata for the tables in a single dataset.

      • When selecting Create sources for all tables in this database and monitor for changes, you may personalize this field as you wish, but it must include a schema macro to represent the dataset name.

      • When selecting Schema/Table, this field is pre-populated with the recommended project name and you can edit freely.

    • Select the Data Source Name Format, which will be the format of the name of the data source in the Immuta UI.

      • <Tablename>: The Immuta data source will have the same name as the original table.

      • <Schema><Tablename>: The Immuta data source will have both the dataset and original table name.

    • Enter the SQL Table Name Format, which will be the format of the name of the table in Immuta. It must include a table name macro, but you may personalize the format using lowercase letters, numbers, and underscores. It may have up to 255 characters.

  • When selecting the Schema/Table option, you can opt to enable schema monitoring by selecting the checkbox in this section. This step will only appear if all tables within a server have been selected for creation.

  • Optional Advanced Settings:

    • Column Detection: To enable, select the checkbox in this section. This setting monitors when remote tables' columns have been changed, updates the corresponding data sources in Immuta, and notifies data owners of these changes. See schema projects overview to learn more about column detection.

    • Data Source Tags: Adding tags to your data source allows users to search for the data source using the tags and governors to apply global policies to the data source. Note if schema detection is enabled, any tags added now will also be added to the tables that are detected.

      • Click the Edit button in the Data Source Tags section.

      • Begin typing in the Search by Tag Name box to select your tag, and then click Add.

  • Click Create to save the data source(s).

  • Configure the Google BigQuery integration
    configuring the Google BigQuery integration
    Create a service account by using Google Cloud Console
    Create a service account by using gcloud
    Google Cloud documentationarrow-up-right
    global subscription
    data policies
    creating projects
    # Fill these out
    # Please use .json extension for key
    export SERVICE_ACCOUNT=datasource-account
    export PROJECT_ID=project123
    export IMMUTA_GCP_KEY_FILE=~/GCP_${SERVICE_ACCOUNT}_key.json
    
    # Create service account for creating data sources
    gcloud iam service-accounts create ${SERVICE_ACCOUNT} --project ${PROJECT_ID}
    
    # Generate keyfile
    gcloud iam service-accounts keys create ${IMMUTA_GCP_KEY_FILE} --iam-account=${SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com
    
    # Allow account to execute queries
    #gcloud projects add-iam-policy-binding ${PROJECT_ID} \
    #--member="serviceAccount:${SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com" --role=projects/${PROJECT_ID}/roles/bigquery.user
    gcloud projects add-iam-policy-binding ${PROJECT_ID} \
    --member="serviceAccount:${SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com" --role=roles/bigquery.user
    
    # Allow account to view data
    gcloud projects add-iam-policy-binding ${PROJECT_ID} \
    --member="serviceAccount:${SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com" --role=roles/bigquery.dataViewer
    
    echo if something went wrong and you want to delete the service account, run:
    echo gcloud iam service-accounts delete ${SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com --project ${PROJECT_ID}
  • Custom: This is a template you create to make the data source name. You may personalize this field as you wish, but it must include a tablename macro. The case of the macro will apply to the data source name (i.e., <Tablename> will result in "Data Source Name," <tablename> will result in "data source name," and <TABLENAME> will result in "DATA SOURCE NAME").

  • Amazon Redshift Spectrum Data Source

    You can create an Amazon Redshift Spectrum data source using the Immuta CLI or the Immuta V2 API.

    hashtag
    Requirements

    • The enable_case_sensitive_identifier parameterarrow-up-right must be set to false (default setting) for your Redshift cluster.

    • CREATE_DATA_SOURCE Immuta permission

    • The Redshift user registering data sources must have the following privileges on all securables:

      • USAGE on all schemas with registered data sources

      • SELECT on all tables within those schemas

    hashtag
    Create a data source using the Immuta CLI

    1. Copy the snippet above and save it as a YAML file. Replace the configuration values with your own, where

      1. hostname is the URL of your Redshift account.

      2. database is the name of the that the Immuta system user will manage and store metadata in.

    For additional configuration options, see the connection object details for Redshift Spectrum data sources on the .

    circle-exclamation

    Avoid schema name conflicts

    Your nativeSchemaFormat must contain _immuta to avoid schema name conflicts.

    hashtag
    Create a data source using the Immuta V2 API

    1. Copy the request example. The example provided uses JSON format, but the request also accepts YAML.

    2. Replace the Immuta URL and with your own.

    3. Change the config values to your own, where

    circle-exclamation

    Avoid schema name conflicts

    Your nativeSchemaFormat must contain _immuta to avoid schema name conflicts.

    hashtag
    Path parameters

    Parameter
    Description
    Required or optional
    Default value

    hashtag
    Body parameters

    The request accepts a JSON or YAML payload with the parameters outlined below.

    Parameter
    Description
    Required or optional

    username and password are the credentials for the system account that can act on Redshift objects and configure the integration.

  • schema is the name of the external schema in the database.

  • Run immuta datasource save <filepath> [--wait int] [--dryRun], referencing the file you just created. The options you can specify include

    • -d or --dryRun: No updates will actually be made to the data source(s).

    • -h or --help: Get more information about the command.

    • -w or --wait int: Specify how long to wait for data source creation.

  • hostname is the URL of your Redshift account.

  • database is the name of the existing or new database that the Immuta system user will manage and store metadata in.

  • username and password are the credentials for the system account that can act on Redshift objects and configure the integration.

  • schema is the name of the external schema in the database.

  • object

    Specify owners for all data sources created. See the for attribute details.

    Optional

    array

    Configure which data sources are created. If not provided, all objects from the given connection will be created. See the for attribute details.

    Optional

    dryRun boolean

    If true, no updates will actually be made.

    Optional

    false

    wait number

    The number of seconds to wait for data sources to be created before returning. Anything less than 0 will wait indefinitely.

    Optional

    0

    connectionKey string

    A key/name to uniquely identify this collection of data sources.

    Required

    connection object

    Connection information. See the connection object description for attribute details.

    Required

    nameTemplate object

    A template to override naming conventions. If not provided, system defaults will be used. See the nameTemplate object description for attribute details.

    Optional

    options object

    Override options for these data sources. If not provided, system defaults will be used. See the options object description for attribute details.

    existing or new database
    Create a data source guide
    API key

    Optional

    connectionKey: redshift
    connection:
      hostname: your-redshift-cluster.djie25k.us-east-1.redshift.amazonaws.com
      port: 5439
      ssl: true
      database: your_database_with_external_schema
      username: awsuser
      password: your_password
      handler: Redshift
      schema: external_schema
    nameTemplate:
      dataSourceFormat: <Tablename>
      schemaFormat: <schema>
      tableFormat: <tablename>
      schemaProjectNameFormat: <Schema>
      nativeSchemaFormat: <schema>_immuta
      nativeViewFormat: <tablename>
    sources:
      - all: true
    curl -X 'POST' \
      'https://www.organization.immuta.com/api/v2/data' \
      -H 'accept: application/json' \
      -H 'Content-Type: application/json' \
      -H 'Authorization: Bearer b64dbdcd29e24ae88a5b3ce0507df019' \
      -d '{
      "connectionKey": "redshift",
      "connection": {
          "hostname": "your-redshift-cluster.djie25k.us-east-1.redshift.amazonaws.com",
          "port": "5439",
          "ssl": true,
          "database": "your_database_with_external_schema",
          "username": "awsuser",
          "password": "your_password",
          "handler": "Redshift",
          "schema": "external_schema"
      },
      "nameTemplate": {
          "dataSourceFormat": "<Tablename>",
          "schemaFormat": "<schema>",
          "tableFormat": "<tablename>",
          "schemaProjectNameFormat": "<Schema>",
          "nativeSchemaFormat": "<schema>_immuta",
          "nativeViewFormat": "<tablename>"
        },
        "sources": [
        {
          "all": true
        }
        ]
        }'
    owners
    owners object description
    sources
    sources array description

    Reference Guides

    Schema Monitoring

    With schema monitoring enabled, Immuta monitors your organization's servers to find when new tables or columns are created or deleted and automatically registers (or disables) those tables in Immuta.

    hashtag
    How-to guides

    • Manage schema monitoring: Edit connection information, schema project owner, or the naming conventions of data registered in the schema.

    • : Manually trigger schema monitoring.

    hashtag
    Reference guides

    • : This reference guide describes the design and components of schema monitoring.

    • : This reference guide describes schema projects, which group all the data sources of a schema.

    hashtag
    Concept guide

    : This explanatory guide provides a conceptual overview of schema monitoring. It offers a discussion of the benefits of the feature, context for why it was developed, and insights into the features schema monitoring pairs with. This guide is designed to deepen your understanding of schema monitoring's purpose as you implement it.

    Run schema monitoring jobs
    Schema monitoring
    Schema projects
    Why use schema monitoring?

    Registering Metadata

    A data source is how data owners expose their data across their organization to other Immuta users. Throughout this process, the data is not copied. Instead, Immuta uses metadata from the data source to determine how to expose the data. An Immuta data source is a virtual representation of data that exists in a remote data platform.

    This section includes reference and how-to guides for registering and managing data sources.

    hashtag
    Data sources in Immuta

    This reference guide describes Immuta data sources and their major components.

    hashtag

    These how-to guides illustrate how to register data in Immuta.

    hashtag

    The guides in this section illustrate how to manage and edit data sources and data dictionaries.

    hashtag

    The reference and how-to guides in this section describe schema monitoring and illustrate how to configure it for your integration.

    Register data sources
    Data source settings
    Schema monitoring

    Manage Schema Monitoring

    hashtag
    Edit schema project connection

    Requirement: Must be an owner of the schema project

    1. Navigate to the Project Overview tab.

    2. Click Edit Connection.

    3. Use the modal to edit the connection information or column detection.

    4. Click Save.

    hashtag
    Edit schema monitoring

    Requirement: Must be an owner of the schema project

    1. Navigate to the Project Overview tab.

    2. Click Edit Schema Monitoring.

    3. Use the modal to make any necessary changes:

    Basic Information: Use this section to edit the format that data sources will be named with when a new table is discovered in your platform by schema monitoring.
    1. Opt to edit the SQL Schema Name Format to be the SQL name that the data source exists under in Immuta. It must include a schema macro but you may personalize it using lowercase letters, numbers, and underscores to personalize the format. It may have up to 255 characters.

    2. The Schema Project Name Format cannot be changed in the schema monitoring settings.

    3. Opt to edit the Data Source Name Format, which will be the format of the name of the data source in the Immuta UI.

      • <Tablename>: The data source name will be the name of the remote table, and the case of the data source name will match the case of the macro.

      • <Schema><Tablename>: The data source name will be the name of the remote schema followed by the name of the remote table, and the case of the data source name will match the cases of the macros.

    4. Opt to edit the SQL Table Name Format, which will be the format of the name of the table in Immuta. It must include a table name macro, but you may personalize the format using lowercase letters, numbers, and underscores. It may have up to 255 characters.

  • Schema Monitoring:

    1. Select the toggle to enable or disable schema monitoring.

    2. If schema monitoring is enabled, there must be a schema detection owner. Use the dropdown menu in the modal to select a new schema detection owner. The new owner must be an owner of one or more of the data sources belonging to that schema.

  • Click Save.

  • Custom: Enter a custom template for the data source name. You may personalize this field as you wish, but it must include a tablename macro. The case of the macro will apply to the data source name (i.e., <Tablename> will result in "Data Source Name," <tablename> will result in "data source name," and <TABLENAME> will result in "DATA SOURCE NAME").

    Schema Monitoring

    Schema monitoring allows organizations to monitor their data environments. When it is enabled, Immuta monitors the organization's servers to detect when new tables or columns are created or deleted, and automatically registers (or disables) those tables in Immuta. These newly updated data sources will then have any global policies and tags that are set in Immuta applied to them. The Immuta data dictionary will be updated with any column changes, and the Immuta environment will be in sync with the organization's data environment. This automated process helps organizations keep compliant without the need to manually keep data sources up to date.

    Schema monitoring is enabled while creating or editing a data source and only registers new tables and columns within known schemas. It does not register new schemas. Data owners or governors can edit the naming convention for newly detected data sources and the schema detection owner from the schema project page after it has been enabled.

    See the Register a data source guides for instructions on enabling schema monitoring or Manage schema monitoring for instructions on editing the schema monitoring settings.

    hashtag
    Column detection

    Column detection is a part of schema monitoring, but can also be enabled on its own to detect the column changes of a select group of tables. Column detection monitors when columns are added or removed from a table and when column types are changed and updates those changes in the appropriate Immuta data source's data dictionary.

    See one of the guides for instructions on enabling column detection.

    hashtag
    Tracking new data sources and columns

    When new data sources and columns are detected and added to Immuta, or when column types have changed, they will always automatically be tagged with the New tag. This allows governors to use the to mask columns with the New tag, since they could contain sensitive data.

    The New Column Added global policy is staged (inactive) by default.

    See the to activate this seeded global policy if you want any columns with the New tag to be automatically masked.

    hashtag
    Data source requests

    When schema monitoring is enabled and there is an active policy that targets the New tag, Immuta sends validation requests to data owners for the following changes made in the remote data platform:

    • Column added: Immuta applies the New tag on the column that has been added and sends a request to the data owner to validate if the new column contains sensitive data. Once the data owner confirms they have validated the content of the column, Immuta removes the New tag from it and as a result any policy that targets the New column tag no longer applies.

    • Column data type changed: Immuta applies the New tag on the column where the data type has been changed and sends a request to the data owner to validate if the column contains sensitive data. Once the data owner confirms they have validated the content of the column, Immuta removes the

    For instructions on how to view and manage your assigned tasks in the Immuta UI, see the . To view and manage your assigned tasks via the Immuta API, see the section of the API documentation.

    hashtag
    Workflow

    1. Immuta user with schema monitoring enabled.

    2. Every 24 hours, at 12:30 a.m. UTC by default, Immuta checks the servers for any changes to tables and columns.

    3. If Immuta finds a change, it will update the appropriate Immuta data source or column:

    To run schema monitoring or column detection manually, see the .

    hashtag
    Schedule

    The default schedule for schema monitoring to run is every 24 hours. Some organizations may need to schedule it to run more often; however, this needs careful consideration as it can impact performance and compute costs.

    hashtag
    Schema monitoring best practices

    • Manually trigger schema monitoring (filtered down to the database) after your dbt or other transform workflows run. For more information, see the .

    • When manually triggering schema monitoring, specify a table or database for maximum performance efficiency and to reduce data or policy downtime. For more information on triggering schema monitoring, see the .

    • If you are manually managing data tags, activate the to protect newly found and potentially sensitive data. This policy sets all columns with the tag New

    New
    tag from it and as a result any policy that targets the
    New
    column tag no longer applies.
  • Column deleted: Immuta deletes the column from the data source's data dictionary in Immuta. Then, Immuta sends a request to the data owner to validate the deleted column.

  • Data source created: Immuta applies the New tag on the data source that has been newly created and sends a request to the data owner to validate if the new data source contains sensitive data. Once the data owner confirms they have validated the content of the data source, Immuta removes the New tag from it and as a result any policy that targets the New data source tag no longer applies.

  • If Immuta finds a new table, then Immuta creates an Immuta data source for that table and tags it New.
  • If Immuta finds a table has been deleted, then Immuta disables that table's data source.

  • If Immuta finds a previously deleted table has been re-created, then Immuta restores that table's data source and tags it New.

  • If Immuta finds that the backing object type of a data source has been changed (for example, from a TABLE to a VIEW) in Snowflake or Databricks Unity Catalog, Immuta will reapply existing policies on the data source. Note that because of policy limitations on Unity Catalog views, changing a Databricks Unity Catalog object type from a table to a view could result in some types of data policies being removed. See the for a list of data policies that are not supported for views.

  • If Immuta finds a new column within a table, then Immuta adds that column to the data dictionary and tags it New.

  • If Immuta finds a column has been deleted, then Immuta deletes that column from the data dictionary.

  • If Immuta finds a column type has changed, then Immuta updates the column type in the data dictionary and tags it New.

  • Active policies that target the New data source or column tag will be applied until a data owner validates the changes.

  • to NULL until a data owner reviews and validates their content. Using this workflow protects your data and avoids data leaks on new columns getting automatically added.
    This recommendation is unnecessary for users leveraging identification or using an external data catalog.
    Register a data source
    seeded New Column Added global policy
    Clone, activate, or stage a global policy guide
    Manage data source requests guide
    Manage data source requests
    registers a data source
    Run schema monitoring and column detection jobs page
    dbt and transform workflow for limited policy downtime guide
    Manually run schema monitoring guide
    "New Column Added" global policy
    Databricks Unity Catalog integration reference guide

    Data Sources in Immuta

    Data owners expose their data across their organization to other users by registering that data in Immuta as a data source. Data sources are collections of metadata about your tables or data objects and allow for Immuta actions like the following:

    • Apply tags to data sources to enforce access controls

    • Apply data policies to a data source's columns

    • Restrict the users who can query a data source with subscription policies

    • Gather your data sources into various domains for delegation

    • Publish data products containing your data sources in the Request app

    When data is registered, Immuta does not affect existing policies on those tables in the remote system for non-Immuta users, so users who had access to a table before it was registered can still access that data without interruption. However, this behavior is different for Immuta users on an integration-by-integration basis, so see the for more details.

    hashtag
    Integrations

    For policies to properly apply to data sources, there must be an integration configured in Immuta, and that integration's connection details must match the data source's connection details. This allows for Immuta to natively enforce policies on that table in your data platform.

    • Connections combine integration configuration and data source registration, ensuring details match.

    • For all other technologies, integration configuration and data source registration happen separately.

    hashtag
    Connections data platforms

    Use connections to create the integration and data objects with the same credentials. Then, to create the data sources:

    hashtag
    Non-connection data platforms

    For all other technologies, configure your integration and then create data sources. Ensure that the host, port, and other integration details match the data source details so that policies will properly apply:

    hashtag
    Data sources with nested columns

    You can create Databricks data sources with nested columns when you enable . When complex types are enabled, Databricks data sources can have columns that are arrays, maps, or structs that can be nested. These columns get parsed into a nested data dictionary.

    hashtag
    Data source user roles

    There are various roles users and groups can play relating to each data source. These roles are managed through the members tab of the data source. Roles include the following types:

    • Owners: Those who create and manage new data sources and their users, documentation, and .

    • Subscribers: Those who have access to the data source data. With the appropriate data accesses and attributes, these users and groups can view files, run queries, and generate analytics against the data source data. All users and groups granted access to a data source have subscriber status.

    • Experts: Those who are knowledgeable about the data source data and can elaborate on it. They are responsible for managing the data source's documentation and tags and descriptions.

    See for a tutorial on modifying user roles.

    hashtag
    Data dictionary

    The data dictionary provides information about the columns within the data source, including column names and value types.

    Dictionary columns are automatically generated when the data source is created. However, data owners and experts can and .

    hashtag
    Data dictionary column icons

    The data dictionary displays icons on columns that have a masking policy applied to them. The appearance of these icons varies depending on the permission of the user.

    Governors and data owners

    If you have the GOVERNANCE permission or are the data source owner, the data dictionary column icons will appear in these ways:

    • No icon: No masking policy applies to the column.

    • Yellow eye: A masking policy applies to the column, but the column is unmasked for the current user because they meet the exception criteria for the policy.

    • Red eye: A policy on the column masks it for the current user.

    All other users

    The data dictionary column icons will appear in these ways for all other users:

    • No icon: Either no masking policy applies to the column or a masking policy applies to the column, but the column is unmasked for the current user because they meet the exception criteria for the policy.

    • Red eye: A policy on the column masks it for the current user.

    hashtag
    Audit

    The following events related to data sources are and can be found on the :

    • : A data source is created.

    • : A data source is deleted.

    • : A data source is disabled.

  • : A data source is updated.
  • : A data source is added to a project.

  • : A data source is removed from a project.

  • : An external catalog is linked and synced for the data source.

  • : A global policy is applied to a data source.

  • : A policy conflict between two global policies on a data source is resolved.

  • : A global policy is disabled on a data source.

  • : A global policy is removed from a data source.

  • : A local policy is created on a data source.

  • : A local policy is updated on a data source.

  • : A user is subscribed to a data source or project.

  • : A user's subscription to a data source or project is removed.

  • : A user's request to subscribe to a data source or project is approved.

  • : A user's request to subscribe to a data source or project is denied.

  • : A user requests to subscribe to a data source or project.

  • : A user's subscription to a data source or project is updated.

  • integration reference guides
    enable the data object for your tables, views, etc.
    AWS Lake Formation
    Databricks Unity Catalog
    Databricks Lakebase
    Amazon Redshift Spectrum data sources
    Amazon S3 data sources
    Azure Synapse Analytics data sources
    complex data types
    data dictionaries
    data dictionary
    Manage data source members
    tag columns in the data dictionary
    add descriptions to these entries
    audited
    audit page in the UI
    DatasourceCreated
    DatasourceDeleted
    DatasourceDisabled
    MariaDB
    Oracle
    PostgreSQL
    Snowflake
    SQL Server
    Teradata
    Databricks data sources
    Google BigQuery data sources
    Starburst data sources
    DatasourceUpdated
    DatasourceAppliedToProject
    DatasourceRemovedFromProject
    DatasourceCatalogSynced
    DatasourceGlobalPolicyApplied
    DatasourceGlobalPolicyConflictResolved
    DatasourceGlobalPolicyDisabled
    DatasourceGlobalPolicyRemoved
    LocalPolicyCreated
    LocalPolicyUpdated
    SubscriptionCreated
    SubscriptionDeleted
    SubscriptionRequestApproved
    SubscriptionRequestDenied
    SubscriptionRequested
    SubscriptionUpdated

    Manage Data Source Members

    In addition to creating and managing data sources, data owners can add and manage data source members manually. While this is supported, it is not recommended and instead it is much more scalable to manage user access through subscription policies

    For other guides related to data source members and management, see the Related guides section.

    hashtag
    Add members to a data source

    1. Navigate to the data source and click the Members tab.

    2. Click Add Members and enter the group name or username.

    3. Select their Role:

      • Subscriber: The role can have read or write access to the table. This role is only available if there are on the data source.

      • Owner: The role can manage data source members and policies and have read or write access to the table.

    4. Select Read or Write from the Access Grant dropdown. This option is only available if have been enabled.

    5. Click Add.

    hashtag
    Bulk add users to multiple data sources

    1. Navigate to the data sources list page.

    2. Select the data sources you want to add users to by clicking the checkbox next to the data source.

    3. Select Add Users.

    hashtag
    Set user access expiration date for a data source

    As a data owner, you can limit the amount of time a user or group has access to your data source by setting an access expiration date.

    1. Navigate to the Members tab.

    2. Adjust the number of days under the Expires column for the user/group whose access you want to limit (the limit is counting from today, so users/groups with 0 days left means their access will be revoked by the end of today and users with 1 day left means their access will be revoked by the end of tomorrow).

    3. Save your changes.

    To remove the limit (or set the limit to Never), delete the number from the field and save your changes.

    hashtag
    Modify user or group roles within a data source

    1. Navigate to the Members tab.

    2. Click the drop-down arrow under the Role column next to the user/group whose role you’d like to change.

    3. Select another role (subscribed, expert, owner or ingest user, if applicable).

    Notifications about the change will be sent to the affected users and groups (as well as alternative Owners).

    hashtag
    View user or group subscription history

    1. Navigate to the Members tab.

    2. Click the Name of the user or group whose history you want to review.

    hashtag
    Remove users or groups from a data source

    As a data owner, you can deny access to any users or groups at any time.

    1. Navigate to the Members tab.

    2. To remove a user or group from a data source, click Deny in the Actions column next to the user or group you want to remove.

    3. Complete the Deny Access form, including a reason for revoking the access.

    This action will immediately update users' or groups' subscription status, and they will no longer have any access to the data source. Notifications will be sent to the affected users (as well as alternative data owners) informing them of the change in subscription status.

    hashtag
    Related guides

    hashtag
    Reference guide

    For information about data source members and subscriptions, see the .

    hashtag
    How-to guides

    In addition to adding and managing data source members as outlined above, data owners can manage data source

    Expert: The role can manage the data dictionary descriptions and have read or write access to the table. This role is only available if there are on the data source.

    You can also opt to for when the user’s access should expire.

    In the modal, type the user name or group name and select the user or group you want to add from the dropdown menu.
  • Opt to set an Expiration for the users' subscriptions. Additionally, you can change the role from Subscriber to Expert or Owner for the users or groups using the dropdown menu in the Role column.

  • Click Add. All users and groups will be added to the data sources you selected.

  • read access policies
    write policies
    data source user roles section
    column tags
    data dictionaries
    settings
    read access policies
    specify an expiration date

    Disable Immuta from Sampling Raw Data

    If you want to disable the metadata collection that requires sampling data, you must

    1. Stop all data source health checks.

    2. Add the Skip Stats Job tag to all data sources.

    These steps will ensure that Immuta queries no data, under any circumstances. Without this sample data, some Immuta features will be unavailable. Identification cannot be used to automatically detect sensitive data in your data sources, and the following masking policies (which are only available in the Snowflake integration) will not work:

    • Masking with format preserving masking

    • Masking using randomized response

    hashtag
    Data Source Health Checks

    Reach out to your Immuta representative to disable health checks on all data sources.

    hashtag
    Skip Stats Job Tag

    Tag each data source with the seeded Skip Stats Job tag to stop Immuta from collecting a sample and running table stats on the sample. You can tag data sources as you create them in the UI or .

    Note that data sources automatically skip the stats job upon registration, without the Skip Stats Job tag, as long as there are no active policies requiring them. The following policies require stats:

    • Column masking with randomized response

    • Column masking with format preserving masking

    • Column masking with rounding

    Column masking with reversibility
  • Row minimization

  • via the Immuta API

    Create a Starburst (Trino) Data Source

    circle-exclamation

    Using OAuth authentication to create Starburst (Trino) data sources

    If you are using OAuth or asynchronous authentication to create Starburst (Trino) data sources and you encounter one of the scenarios described on the Starburst (Trino) reference guide, work with your Immuta representative to configure the globalAdminUsername property.

    hashtag
    Enter connection information

    1. Navigate to the Data Sources list page and click Register Data Source.

    2. Select the Starburst (Trino) tile in the Data Platform section.

    3. Complete these fields in the Connection Information box:

    circle-info

    Use SSL

    Although not required, it is recommended that all connections use SSL. Additional connection string arguments may also be provided.

    Note: Only Immuta uses the connection you provide and injects all policy controls when users query the system. In other words, users always connect through Immuta with policies enforced and have no direct association with this connection.

    circle-info

    Considerations

    • Immuta pushes down joins to be processed on the remote database when possible. To ensure this happens, make sure the connection information matches between data sources, including host, port, ssl, username, and password. You will see performance degradation on joins against the same database if this information doesn't match.

    hashtag
    Select virtual population

    Decide how to virtually populate the data source by selecting one of the options:

    • Create sources for all tables in this database: This option will create data sources and keep them in sync for every table in the dataset. New tables will be automatically detected and new Immuta views will be created.

    • Schema / Table: This option will allow you to specify tables or datasets that you want Immuta to register.

      1. Opt to Edit in the table selection box that appears.

    hashtag
    Enter basic information

    1. Enter the SQL Schema Name Format to be the SQL name that the data source exists under in Immuta. It must include a schema macro but you may personalize it using lowercase letters, numbers, and underscores to personalize the format. It may have up to 255 characters.

    2. Enter the Schema Project Name Format to be the name of the schema project in the Immuta UI. If you enter a name that already exists, the name will automatically be incremented. For example, if the schema project Customer table already exists and you enter that name in this field, the name for this second schema project will automatically become Customer table 2 when you create it.

    hashtag
    Enable or disable schema monitoring

    circle-info

    Schema monitoring best practices

    Schema monitoring is a powerful tool that ensures tables are all governed by Immuta.

    • Consider using schema monitoring later in your onboarding process, not during your initial setup and configuration when tables are not in a stable state.

    When selecting the Schema/Table option, you can opt to enable by selecting the checkbox in this section.

    Note: This step will only appear if all tables within a server have been selected for creation.

    hashtag
    Opt to configure advanced settings

    Although not required, completing these steps will help maximize the utility of your data source. Otherwise, click Create to save the data source.

    hashtag
    Column detection

    This setting monitors when remote tables' columns have been changed, updates the corresponding data sources in Immuta, and notifies Data Owners of these changes.

    To enable, select the checkbox in this section.

    See the page to learn more about column detection.

    hashtag
    Event time

    An Event Time column denotes the time associated with records returned from this data source. For example, if your data source contains news articles, the time that the article was published would be an appropriate Event Time column.

    1. Click the Edit button in the Event Time section.

    2. Select the column(s).

    3. Click Apply.

    Selecting an Event Time column will enable

    • more statistics to be calculated for this data source including the most recent record time, which is used for determining the freshness of the data source.

    • the creation of time-based restrictions in the policy builder.

    hashtag
    Latency

    1. Click Edit in the Latency section.

    2. Complete the Set Time field, and then select MINUTES, HOURS, or DAYS from the subsequent dropdown menu.

    3. Click Apply.

    This setting impacts how often Immuta checks for new values in a column that is driving row-level redaction policies. For example, if you are redacting rows based on a country column in the data, and you add a new country, it will not be seen by the Immuta policy until this period expires.

    hashtag
    Sensitive data discovery

    Data owners can disable identification for their data sources in this section.

    1. Click Edit in this section.

    2. Select Enabled or Disabled in the window that appears, and then click Apply.

    hashtag
    Data source tags

    Adding tags to your data source allows users to search for the data source using the tags and Governors to apply Global policies to the data source. Note if Schema Detection is enabled, any tags added now will also be added to the tables that are detected.

    To add tags,

    1. Click the Edit button in the Data Source Tags section.

    2. Begin typing in the Search by Tag Name box to select your tag, and then click Add.

    Tags can also be added after you create your data source from the data source details page on the overview tab or the data dictionary tab.

    hashtag
    Create the data source

    Click Create to save the data source(s).

    Server: hostname or IP address

  • Port: port configured for Starburst (Trino)

  • SSL: when enabled, ensures communication between Immuta and the remote database is encrypted

  • Catalog: the remote catalog

  • Username: the username to use to connect to the remote database and retrieve records for this data source

  • Password: the password to use with the above username to connect to the remote database

  • If you are using a proxy server with Starburst (Trino), specify it in the Additional Connection String Options:

  • Opt to Upload Certificates to connect to the database.

  • Click the Test Connection button.

  • If a client certificate is required to connect to the source database, you can add it in the Upload Certificates section.

    By default, all schemas and tables are selected. Select and deselect by clicking the checkbox for the schemas in the Import Schemas/Tables modal. You can create multiple data sources at one time by selecting an entire schema or multiple tables.

  • After making your selection(s), click Apply.

  • When selecting Create sources for all tables in this database and monitor for changes you may personalize this field as you wish, but it must include a schema macro.

  • When selecting Schema/Table this field is prepopulated with the recommended project name and you can edit freely.

  • Select the Data Source Name Format, which will be the format of the name of the data source in the Immuta UI.

    • <Tablename>: The data source name will be the name of the remote table, and the case of the data source name will match the case of the macro.

    • <Schema><Tablename>: The data source name will be the name of the remote schema followed by the name of the remote table, and the case of the data source name will match the cases of the macros.

    • Custom: Enter a custom template for the Data Source Name. You may personalize this field as you wish, but it must include a tablename macro. The case of the macro will apply to the data source name (i.e., <Tablename> will result in "Data Source Name," <tablename> will result in "data source name," and <TABLENAME> will result in "DATA SOURCE NAME").

  • Enter the SQL Table Name Format, which will be the format of the name of the table in Immuta. It must include a table name macro, but you may personalize the format using lowercase letters, numbers, and underscores. It may have up to 255 characters.

  • Consider using Immuta’s API to either run the schema monitoring job when your ETL process adds new tables or to add new tables.

  • Activate the new column added templated global policy to protect potentially sensitive data. This policy will null the new columns until a data owner reviews new columns that have been added, protecting your data to avoid data leaks on new columns getting added without being reviewed first.

  • Schema Monitoring
    Schema projects overview
    UseProxy=1;ProxyHost=my.host.com;ProxyUID=your-username;ProxyPort=6789;ProxyPwd=your-password

    Manage Access Requests and Tasks

    Your outgoing and incoming requests are consolidated on the requests tab on your user profile page. Similar to notifications, a red dot displays on the request icon whenever you have pending requests. The sections below guide you through managing these requests.

    hashtag
    Manage access requests

    1. Navigate to your Profile page, and then click the Requests tab. The names of the users who have submitted requests are displayed in the Requests section. Once a user is selected, the corresponding Pending Requests are displayed.

    2. To view more information about the request, click the Details button in the Actions column of a request.

    3. Click the Approve or Deny button in the Actions column of the request.

    hashtag
    Bulk approvals

    To approve or deny multiple access requests simultaneously,

    1. Navigate to your Profile page, and then click the Requests tab.

    2. Select the checkbox next to each request you want to address, and then click the Approve Selected or Deny Selected button.

    hashtag
    Manage data source requests

    If a policy that includes the New tag is active and schema monitoring is enabled or you have registered a connection, Immuta applies a New tag to new data sources, new columns, or changed columns and sends data owners a request to validate those changes.

    1. Navigate to your Profile page, and then click the Requests tab.

    2. Click the approvals count in the Request Information column to view information about the change to the data source. The change will be one of the following:

      • Column added

    For more information about these requests, see the or the .

    hashtag
    Related guides

    hashtag
    Reference guides

    hashtag
    How-to guides

    In addition to managing data source requests as outlined above, data owners can manage data source

    Column changed

  • Column deleted

  • Data source created

  • After verifying the change, click Validate.

  • Schema monitoring guide
    Connections guide
    Data sources in Immuta overview
    Connections
    Schema monitoring
    column tags
    data dictionaries
    policies
    members
    settings