1 of 30

Connect and Query Data

A data source is how data owners expose their data across their organization to other Immuta users. Throughout this process, the data is not copied. Instead, Immuta uses metadata from the data source to determine how to expose the data. A data source is a virtual representation of data that exists in a remote data platform.

When a data source is exposed, policies (written by data owners and data governors) are dynamically enforced on the data, appropriately redacting and masking information depending on the attributes or groups of the user accessing the data. Once the data source is exposed and subscribed to, the data can be accessed in a consistent manner across analytics and visualization tools, allowing reproducibility and collaboration.

Best practices for connecting data

The best practices outlined below will also appear in callouts within relevant tutorials.

The two-way SSL configuration is highly recommended as it is the most secure configuration for a custom blob store handler endpoint.
Although not required, it is recommended that all connections use SSL. Additional connection string arguments may also be provided.
It is recommended that path not be used in the resource restrictions. Additionally, single-bucket source data is the only tested configuration. Athena databases with source data in multiple buckets may work, but would require that additional resources be specified in the below policy anywhere your-source is referenced.

Section contents

This section includes reference and how-to guides for creating policies. Some of these guides are provided below. See the left navigation for a complete list of resources.

How-to guides

Create a data source
Create a Google BigQuery data source
Create an S3 data source
Bulk create Snowflake data sources
Manage data sources
Manage schema monitoring
Run schema monitoring jobs
Query data:

Reference Guides

Data Sources in Immuta

Data owners expose their data across their organization to other users by registering that data in Immuta as a data source.

By default, data owners can register data in Immuta without affecting existing policies on those tables in their remote system, so users who had access to a table before it was registered can still access that data without interruption. If this default behavior is disabled on the App Settings page, a subscription policy that requires data owners to manually add subscribers to data sources will automatically apply to new data sources (unless a global policy you create applies), blocking access to those tables.

For information about this default subscription policy and how to manage it, see the default subscription policy page.

Data Sources With Nested Columns

When data sources support nested columns, these columns get parsed into a nested Data Dictionary. Below is a list of data sources that support nested columns:

S3
Azure Blob
Databricks sources with complex data types enabled
- When complex types are enabled, Databricks data sources can have columns that are arrays, maps, or structs that can be nested.

Data Source Health Checks

When an Immuta data source is created, background jobs use the connection information provided to compute health checks dependent on the type of data source created and how it was configured. These data source health checks include the

blob crawl status: indicates whether the blob was successfully crawled. If this check fails, the overall health status of the data source will be Not Healthy.
column detection status: indicates whether the job run to determine if a column was added or removed from the remote table registered as an Immuta data source was successful.
external catalog link status: indicates whether or not the external catalog was successfully linked to the data source. If this check fails, the overall health status of the data source will be Not Healthy.
fingerprint generation status: indicates whether or not the data source fingerprint was successfully generated.
global policy applied status: indicates whether global policies were successfully applied to the data source.
high cardinality calculation status: indicates whether the data source's high cardinality column was successfully calculated.
native SQL sync status (for Snowflake data sources): indicates whether Snowflake governance policies have been successfully synced.
native SQL view creation status (for Snowflake and Redshift data sources): indicates whether native views were properly created for Redshift and Snowflake tables registered in Immuta.
row count status: indicates whether the number of rows in the data source was successfully calculated.
schema detection status: indicates whether the job run to determine if a remote table was added or removed from the schema was successful.
sensitive data discovery status: indicates whether sensitive data discovery was successfully run on the data source.

After these jobs complete, the health status for each is updated to indicate whether the status check passed, was skipped, is unknown, or failed.

These background jobs can be disabled during data source creation by adding a specific tag to prevent automatic table statistics. This prevent statistics tag can be set on the App Settings page by a System Administrator. However, with automatic table statistics disabled these policies will be unavailable until the Data Source Owner manually generates the fingerprint:

Masking with format preserving masking
Masking with K-Anonymization
Masking using randomized response

Unhealthy Databricks Data Sources

Unhealthy data sources may fail their row count queries if they run against a cluster that has the Databricks query watchdog enabled.

Limitations

Data sources with over 1600 columns will not have health checks run, but will still appear as healthy. The health check cannot be run automatically or manually.

Data Source User Roles

There are various roles users and groups can play relating to each data source. These roles are managed through the Members tab of the Data Source. They include

Owners: Those who create and manage new data sources and their users, documentation, Data Dictionaries, and queries. They are also capable of ingesting data into their data sources as well as adding ingest users (if their data source is object-backed).
Subscribers: Those who have access to the data source data. With the appropriate data accesses and attributes, these users/groups can view files, run SQL queries, and generate analytics against the data source data. All users/groups granted access to a data source (except for those with the ingest role) have subscriber status.
Experts: Those who are knowledgeable about the data source data and can elaborate on it. They are responsible for managing the data source's documentation and the Data Dictionary.
Ingest: Those who are responsible for ingesting data for the data source. This role only applies to object-backed data sources (since query-backed data sources are ingested automatically). Ingest users cannot access any data once it's inside Immuta, but they are able to verify if their data was successfully ingested or not.

See Manage data source members for a tutorial on modifying user roles.

Data Dictionary

The Data Dictionary provides information about the columns within the data source, including column names and value types. Users subscribed to the data source can post and reply to discussion threads by commenting on the Data Dictionary.

Dictionary columns are automatically generated when the data source is created. However, data owners and experts can tag columns in the data dictionary and add descriptions to these entries.

Domains Overview

Private preview

This feature is in preview and available to select accounts. Reach out to your Immuta representative for details.

Domains are containers of data sources that group data into user-defined sets, where specific users can be assigned a domain-specific permission to manage policies on only the data sources in those domains. Domains eliminate the problem of giving users too much governance over all data sources in an organization. Instead, you can control how much power governance users have over data sources by granting them privileges within domains (and only those domains) in Immuta.

Domains allow you to grant more users authority to manage policies, making Immuta easier to use and more secure.

Permissions

The table below outlines the global Immuta permissions and domain permissions necessary to manage domains.

Permission

User actions

Domains actions

Data source actions

Policy actions

USER_ADMIN (global)

Manage user permissions, including domain-specific permissions on ALL domains

None

GOVERNANCE (global)

None

Create domains
Manage domain description and name
Delete any empty domain

Add existing data sources to any domain
Remove data sources from any domain without adding it to another domain

Create global policies that apply to ANY data sources (inside or outside domains)

Manage Policies (domain)

None

Create policies that apply to the domain(s) they are granted to manage policies in

Domain data sources

Data sources can be assigned to domains to restrict the users who can manage policies on those data sources. Data sources could be assigned to domains based on business units in your organization or any other method that suits your business goals and policy management strategy. Data sources can belong to more than one domain.

Once a data source is assigned to a domain, only users with the global GOVERNANCE or domain-specific Manage Policies permission can create policies that will apply to that data source, allowing you to control who can manage data access.

When data sources are added to a domain, users do not have to be added to the domain to access data. Instead, they must meet the restrictions outlined in the policies on the data sources.

Managing domain data sources

Only users with the GOVERNANCE permission can change the domain that a data source belongs to or remove a data source from a domain. When a data source is removed from a domain, Immuta recomputes the policies. Any policies associated with a domain that were applied to the data source will be removed when the data source is removed from the domain.

Domain policies

When authorized users assign policies to a domain, those policies only apply to the data sources in that domain. Domains restrict who can write policies for data sources assigned to that domain, while Immuta policies are enforced as usual: users who meet the restrictions outlined in the policy of a data source may subscribe to that data source.

When data sources are added to or removed from a domain, Immuta recomputes the data source policies. Then, policies associated with the domain will be applied to the data source.

Managing domain policies

Users with the Manage Policies permission in a domain can set a global policy to apply to only the domains for which they have the Manage Policies permission. For example, if a user has Manage Policies on one domain, all global policies they write will be assigned to just that domain, and data sources within that domain will have those policies enforced on them. If users have the Manage Policies permission on a subset of domains in their organization, they can choose from all of the permutations of the domains for which they have the Manage Policies permission to assign policies to.

If users have the Manage Policies permission on all domains in an organization or the GOVERNANCE permission, they can set a global policy to apply to all data sources.

Deleting a domain

Users with the GOVERNANCE permission can delete any domain that has zero data sources assigned to it.

Migrating to domains

Existing data sources can be assigned to a domain by a user with the GOVERNANCE permission. Once added to a domain, domain policies will be enforced on the data sources.

Schema Monitoring

Schema monitoring allows organizations to monitor their data environments. When it is enabled, Immuta monitors the organization's servers to detect when new tables or columns are created or deleted, and automatically registers (or disables) those tables in Immuta. These newly updated data sources will then have any global policies and tags that are set in Immuta applied to them. The Immuta data dictionary will be updated with any column changes, and the Immuta environment will be in sync with the organization's data environment. This automated process helps organization keep compliant without the need to manually keep data sources up to date.

Schema monitoring is enabled while creating or editing a data source. It runs every night by default but can be configured to a different frequency. Data Owners or Governors can edit the naming convention for newly detected data sources and the Schema Detection Owner from the schema project page after it has been enabled.

See the create a data source tutorials for instructions on enabling schema monitoring or Manage Schema Monitoring for instructions on editing the schema monitoring settings.

Column Detection

Column detection is a part of schema monitoring, but can also be enabled on its own to detect the column changes of a select group of tables. Column detection monitors when columns are added or removed from a table and when column types are changed and updates those changes in the appropriate Immuta data source's data dictionary.

See one of the create a data source tutorials for instructions on enabling column detection.

Tracking New Data Sources and Columns

When new data sources and columns are detected and added to Immuta, they will automatically be tagged with the New tag. This allows Governors to use the seeded New Column Added Global Policy to mask the data sources and columns, since they could contain sensitive data. Data Owners can then review and approve these changes from the Requests tab of their profile page. Approving column changes removes the New tags from the data source.

The New Column Added Global Policy is active by default.

See Clone, Activate, or Stage a Global Policy to stage this seeded Global Policy if you do not want new columns automatically masked.

Workflow

Immuta user creates a data source with Schema Monitoring enabled.
Every 24 hours, at 12:30 a.m. UTC by default, Immuta checks the servers for any changes to tables and columns.
If Immuta detects a change, it will update the appropriate Immuta data source or column:
1. If Immuta detects a new table, then Immuta creates an Immuta data source for that table and tags it "New".
2. If Immuta detects a table has been deleted, then Immuta disables that table's data source.
3. If Immuta detects a previously deleted table has been re-created, then Immuta restores that table's data source and tags it "New".
4. If Immuta detects a new column within a table, then Immuta adds that column to the data dictionary and tags it "New".
5. If Immuta detects a column has been deleted, then Immuta deletes that column from the data dictionary.
6. If Immuta detects a column type has changed, then Immuta updates the column type in the data dictionary.
Data sources and columns tagged "New" will be masked by the seeded New Column Added Global Policy until a Governor or Data Owner approves the changes.

To run schema monitoring or column detection manually, see the Manually Run Jobs page.

Native Schema Monitoring for Snowflake

Immuta can monitor your data environment, detect when new tables or columns are created or deleted in Snowflake, and automatically register (or disable) those tables in Immuta for you. Those newly updated data sources will then have any global policies and tags that you have set up applied to them. The Immuta data dictionary will be updated with any new columns, and your Immuta environment will be in sync with your Snowflake tables. This automated process helps with scaling and keeping your organization compliant without the need to manually keep your data sources up to date.

Architecture

Once enabled on a data source, Immuta calls to Snowflake every 24 hours by default to find when each table within the registered schema was last altered. If the timestamp is after the last time native schema monitoring was run, then Immuta will update the table or columns that have been altered. This process works well when monitoring a large number of data sources because it only updates the recently altered tables and cuts down the amount of Snowflake computing required to run column detection, which specifically updates the columns of registered data sources.

If necessary, an Immuta admin can also manually run native schema monitoring through the API to run globally on all data sources.

If you have an Immuta environment with data sources other than Snowflake, the legacy schema monitoring feature will run on all non-Snowflake data sources. The native schema monitoring feature only works with Snowflake integrations and Snowflake data sources.

Automatic workflow

Immuta user creates a data source with schema monitoring enabled.
Every 24 hours, at 12:30 a.m. UTC by default, Immuta sends a query to Snowflake for the information_schema view asking for when each data source’s table was last altered.
If the table was altered after the last time native schema detection ran, Immuta updates the data source, columns, and data dictionary.
Immuta tags new data sources and columns with the tag “New” so that you can use the templated "New Column Added" global policy to mask all new data until it has been reviewed.

Limitations

This feature only works with Snowflake data sources. Any non-Snowflake data sources will run with the legacy schema monitoring described above.
Your organization will not see performance improvements if it is making changes to all tables consistently. This feature is intended to improve performance for organizations with a large number of tables and relatively few changes made within the ecosystem comparatively.

Migration

There is no migration required for this feature. Native schema monitoring will run on all Snowflake data sources with legacy schema monitoring previously enabled and will run on all new Snowflake data sources with schema monitoring enabled.

Configuration

There is no additional configuration required for this feature. You just need to enable schema monitoring when you create your Snowflake data sources.

Schema Projects

Schema projects are automatically created and managed by Immuta. They group all the data sources of the schema, and when new data sources are created, manually or with schema monitoring, they are automatically added to the schema project. They work as a tool to organize all the data sources within a schema, which is particularly helpful with schema monitoring enabled.

Schema projects are created when tables are registered as data sources in Immuta. The user creating the data source does not need the CREATE_PROJECT permission to have the project auto-create because no data sources can be added by the owner. Instead, new data sources are managed by Immuta. The user can manage Subscription policies for schema projects, but they cannot apply Data policies or purposes to them.

The schema settings, such as schema evolution and connection information, can be edited from the project overview tab. Note: Deleting the project will delete all of the data sources within it as well.

Schema Project Actions

Schema settings are edited from the project overview tab:

Schema Project Connection Details: Editing these details will update them for all the data sources within the schema project.
Data Source Naming Convention: When schema monitoring is enabled, new data sources will be automatically detected and added to the schema project. Updating the naming convention will change how these newly detected data sources are named by Immuta.
Schema Detection Owner: When schema monitoring is enabled, a user is assigned to be the owner of any detected and Immuta created data source.
Disable or delete your schema project. Note: Deleting the project will delete all of the data sources within it as well.

How-to Guides

Registering Data

Create a Data Source

For a complete list of supported databases, see the Immuta Support Matrix.

This page contains references to the term whitelist, which Immuta no longer uses. When the term is removed from the software, it will be removed from this page.

Redshift data sources

Redshift Spectrum data sources must be registered via the Immuta CLI or V2 API using this payload.
Registering Redshift datashares as Immuta data sources is unsupported.

Requirements

CREATE_DATA_SOURCE Immuta permission
Snowflake data source requirements:
- USAGE Snowflake privilege on the schema and database
- REFERENCES Snowflake privilege on the tables
Databricks Spark integration requirements: Ensure that at least one of the traits below is true.
- The user exposing the tables has READ_METADATA and SELECT permissions on the target views/tables (specifically if Table ACLs are enabled).
- The user exposing the tables is listed in the immuta.spark.acl.whitelist configuration on the target cluster.
- The user exposing the tables is a Databricks workspace administrator.
Databricks Unity Catalog integration requirements: When exposing a table from Databricks Unity Catalog, be sure the credentials used to register the data sources have the Databricks privileges listed below.
- The following privileges on the parent catalogs and schemas of those tables:
  - SELECT
  - USE CATALOG
  - USE SCHEMA
- USE SCHEMA on system.information_schema

Snowflake imported databases

Immuta does not support Snowflake tables from imported databases. Instead, create a view of the table and register that view as a data source.

Enter connection information

Best Practice: Connections Use SSL

Although not required, it is recommended that all connections use SSL. Additional connection string arguments may also be provided.

Note: Only Immuta uses the connection you provide and injects all policy controls when users query the system. In other words, users always connect through Immuta with policies enforced and have no direct association with this connection.

Navigate to the My Data Sources page.
Click the New Data Source button in the top right corner.
Select the data platform containing the data you wish to expose by clicking a tile.
Input the connection parameters to the database you're exposing. Click the tabs below for guidance for select data platforms.

See the Create an Amazon S3 data source guide for instructions.

Required Google BigQuery roles for creating data sources

Ensure that the user creating the Google BigQuery data source has these roles:

roles/bigquery.metadataViewer on the source table (if managed at that level) or dataset
roles/bigquery.dataViewer (or higher) on the source table (if managed at that level) or dataset
roles/bigquery.jobUser on the project

See the Create a Google BigQuery data source guide for instructions.

Azure Databricks Unity Catalog limitation

Set all table-level ownership on your Unity Catalog data sources to an individual user or service principal instead of a Databricks group before proceeding. Otherwise, Immuta cannot apply data policies to the table in Unity Catalog. See the Azure Databricks Unity Catalog limitation for details.

Complete the first four fields in the Connection Information box:
- Server: hostname or IP address
- Port: port configured for Databricks, typically port 443
- SSL: when enabled, ensures communication between Immuta and the remote database is encrypted
- Database: the remote database
Enter your Databricks API Token. Use a non-expiring token so that access to the data source is not lost unexpectedly.
Enter the HTTP Path of your Databricks cluster or SQL warehouse.
If you are using a proxy server with Databricks, specify it in the Additional Connection String Options:
UseProxy=1;ProxyHost=my.host.com;ProxyPort=6789

Click the Test Connection button.

Further Considerations

Immuta pushes down joins to be processed on the native database when possible. To ensure this happens, make sure the connection information matches between data sources, including host, port, ssl, username, and password. You will see performance degradation on joins against the same database if this information doesn't match.
Some data platforms require different connection information than pictured in this section. Please refer to the tool-tips in the Immuta UI for this step if you need additional guidance.
If you are creating an Impala data source against a Kerberized instance of Impala, the username field locks down to your Immuta username unless you possess the IMPERSONATE_HDFS_USER permission.
If a client certificate is required to connect to the source database, you can add it in the Upload Certificates section at the bottom of the form.

Select virtual population

Decide how to virtually populate the data source by selecting Create sources for all tables in this database and monitor for changes or Schema/Table.
Complete the workflow for Create sources for all tables in this database and monitor for changes or Schema/Table selection, which are outlined on the tabs below:

Create sources for all tables in this database and monitor for changes

Selecting this option will create and keep in sync all data sources within this database. New schemas will be automatically detected and the corresponding data sources and schema projects will be created.

Select Create sources for all tables in this database and monitor for changes.

Schema/Table

Selecting this option will create and keep in sync all tables within the schema(s) selected. No new schemas will be detected.

If you choose Schema/Table, click Edit in the table selection box that appears.
By default, all schemas and tables are selected. Select and deselect by clicking the checkbox to the left of the name in the Import Schemas/Tables menu. You can create multiple data sources at one time by selecting an entire schema or multiple tables.
After making your selection(s), click Apply.

Enter basic information

Provide information about your source to make it discoverable to users.

Enter the SQL Schema Name Format to be the SQL name that the data source exists under in the Immuta Query Engine. It must include a schema macro but you may personalize it using lowercase letters, numbers, and underscores to personalize the format. It may have up to 255 characters.
Enter the Schema Project Name Format to be the name of the schema project in the Immuta UI. This field is disabled if the schema project already exists within Immuta.
1. When selecting Create sources for all tables in this database and monitor for changes you may personalize this field as you wish, but it must include a schema macro.
2. When selecting Schema/Table this field is prepopulated with the recommended project name and you can edit freely.
Select the Data Source Name Format, which will be the format of the name of the data source in the Immuta UI.

<Tablename>

The data source name will be the name of the remote table, and the case of the data source name will match the case of the macro.

<Schema><Tablename>

The data source name will be the name of the remote schema followed by the name of the remote table, and the case of the data source name will match the cases of the macros.

Custom

Enter a custom template for the Data Source Name. You may personalize this field as you wish, but it must include a tablename macro. The case of the macro will apply to the data source name (i.e., <Tablename> will result in "Data Source Name," <tablename> will result in "data source name," and <TABLENAME> will result in "DATA SOURCE NAME").

Enter the SQL Table Name Format, which will be the format of the name of the table in the Immuta Query Engine. It must include a table name macro, but you may personalize the format using lowercase letters, numbers, and underscores. It may have up to 255 characters.

Data source duplicates

Data source duplicates

In order to avoid two data sources referencing the same table, users can not create duplicate data sources. If you attempt to create a duplicate data source in the UI, you will encounter a warning stating "a data source with the same remote table already exists."

By default Immuta prevents users from creating data source duplicates. If you want to change this behavior,

Navigate to the App Settings page, and scroll to the Advanced Configuration section.

Copy and paste this YAML into the text box:

featureFlags:
  allowDuplicateDataSources: true

Click Save.

Enable or disable schema monitoring

When selecting the Schema/Table option you can opt to enable Schema Monitoring by selecting the checkbox in this section.

Note: This step will only appear if all tables within a server have been selected for creation.

Create a schema detection job in Databricks

In most cases, Immuta’s schema detection job runs automatically from the Immuta web service. For Databricks, that automatic job is disabled because of the ephemeral nature of Databricks clusters. In this case, Immuta requires users to download a schema detection job template (a Python script) and import that into their Databricks workspace.

Generate Your Immuta API Key

Before you can run the script referenced in this tutorial, generate your Immuta API Key from your user profile page. The Immuta API key used in the Databricks notebook job for schema detection must either belong to an Immuta Admin or the user who owns the schema detection groups that are being targeted.

Enable Schema Monitoring or Detect Column Changes on the Data Source creation page.
Click Download Schema Job Detection Template.
Click the Click Here To Download text.
Before you can run the script, follow the Databricks documentation to create the scope and secret using the Immuta API Key generated on your user profile page.
Import the Python script you downloaded into a Databricks workspace as a notebook. Note: The job template has commented out lines for specifying a particular database or table. With those two lines commented out, the schema detection job will run against ALL databases and tables in Databricks. Additionally, if you need to add proxy configuration to the job template, the template uses the Python requests library, which has a simple mechanism for configuring proxies for a request.
Schedule the script as part of a notebook job to run as often as required. Each time the job runs, it will make an API call to Immuta to trigger schema detection queries, and these queries will run on the cluster from which the request was made. Note: Use the api_immuta cluster for this job. The job in Databricks must use an Existing All-Purpose Cluster so that Immuta can connect to it over ODBC. Job clusters do not support ODBC connections.

Create the data source

Opt to configure settings in the Advanced Options section (outlined below), and then click Create to save the data source(s).

Advanced options

None of the following options are required. However, completing these steps will help maximize the utility of your data source.

Column Detection

This setting monitors when remote tables' columns have been changed, updates the corresponding data sources in Immuta, and notifies Data Owners of these changes.

To enable, select the checkbox in this section.

See Schema Projects Overview to learn more about Column Detection.

Event Time

An Event Time column denotes the time associated with records returned from this data source. For example, if your data source contains news articles, the time that the article was published would be an appropriate Event Time column.

Click the Edit button in the Event Time section.
Select the column(s).
Click Apply.

Selecting an Event Time column will enable

more statistics to be calculated for this data source including the most recent record time, which is used for determining the freshness of the data source.
the creation of time-based restrictions in the Policy Builder.

Latency

Click Edit in the Latency section.
Complete the Set Time field, and then select MINUTES, HOURS, or DAYS from the subsequent dropdown menu.
Click Apply.

This setting impacts the following behaviors:

How long Immuta waits to refresh data that is in cache by querying the native data source. For example, if you only load data once a day in the native source, this setting should be greater than 24 hours. If data is constantly loaded in the native source, you need to decide how much data latency is tolerable vs how much load you want on your data source; however this is only relevant to Immuta S3, since SQL will always interactively query the native database.
How often Immuta checks for new values in a column that is driving row-level redaction policies. For example, if you are redacting rows based on a country column in the data, and you add a new country, it will not be seen by the Immuta policy until this period expires.

Sensitive Data Discovery

Data Owners can disable Sensitive Data Discovery for their data sources in this section.

Click Edit in this section.
Select Enabled or Disabled in the window that appears, and then click Apply.

Data Source Tags

Adding tags to your data source allows users to search for the data source using the tags and Governors to apply Global policies to the data source. Note if Schema Detection is enabled, any tags added now will also be added to the tables that are detected.

To add tags,

Click the Edit button in the Data Source Tags section.
Begin typing in the Search by Tag Name box to select your tag, and then click Add.

Tags can also be added after you create your data source from the Data Source details page on the Overview tab or the Data Dictionary tab.

Create an S3 Data Source

Requirement

CREATE_S3_DATA_SOURCE Immuta permission

Prerequisite

Configure the Amazon S3 integration

Private preview

The Amazon S3 integration is available to select accounts. Reach out to your Immuta representative for details.

Register S3 data

Navigate to the My Data Sources page in Immuta.
Click New Data Source.
Select the Native S3 tile in the data platform section.
Select your AWS Account/Region from the dropdown menu.
Click Next.
The prefix field is populated with the base path. Add to this prefix to create a data source for a prefix, bucket, or object.
- If the data source prefix ends in a wildcard (*), it protects all items starting with that prefix. For example, a base location of s3:// and a data source prefix surveys/2024* would protect paths like s3://surveys/2024-internal/research-dept.txt or s3://surveys/2024-customer/april/us.csv.
- If the data source prefix ends without a wildcard (*), it protects a single object. For example, a base location path of s3:// and a data source prefix of research-data/demographics would only protect the object that exactly matches s3://research-data/demographics.
Click Add Prefix, and then click Next.
Verify that your prefixes are correct and click Complete Setup.

Create a Google BigQuery Data Source

Private preview

Google BigQuery is available to select accounts. Reach out to your Immuta representative for details.

Requirements

CREATE_DATA_SOURCE Immuta permission
Google BigQuery roles:
- roles/bigquery.metadataViewer on the source table (if managed at that level) or dataset
- roles/bigquery.dataViewer (or higher) on the source table (if managed at that level) or dataset
- roles/bigquery.jobUser on the project

Prerequisites

Create a Google Cloud service account for creating Google BigQuery data sources

Google BigQuery data sources in Immuta must be created using a Google Cloud service account rather than a Google Cloud user account. If you do not currently have a service account for the Google Cloud project separate from the Google Cloud service account you created when configuring the Google BigQuery integration, you must create a Google Cloud service account with privileges to view and run queries against the tables you are protecting.

You have two options to create the required Google Cloud service account:

Create a service account by using Google Cloud Console.
Create a service account by using gcloud.

Create a service account using the Google Cloud web console

Using the Google Cloud documentation, create a service account with the following roles:
- BigQuery User
- BigQuery Data Viewer
Using the Google Cloud documentation, generate a service account key for the account you just created.

Create a service account using gcloud

Copy the script below and update the SERVICE_ACCOUNT, PROJECT_ID, and IMMUTA_GCP_KEY_FILE values.
- SERVICE_ACCOUNT is the name for the new service account.
- PROJECT_ID is the project ID for the Google Cloud Project that is integrated with Immuta.
- IMMUTA_GCP_KEY_FILE is the path to a new output file for the private key.

Use the script below in the gcloud command line. This script is a template; change values as necessary:

# Fill these out
# Please use .json extension for key
export SERVICE_ACCOUNT=datasource-account
export PROJECT_ID=project123
export IMMUTA_GCP_KEY_FILE=~/GCP_${SERVICE_ACCOUNT}_key.json

# Create service account for creating data sources
gcloud iam service-accounts create ${SERVICE_ACCOUNT} --project ${PROJECT_ID}

# Generate keyfile
gcloud iam service-accounts keys create ${IMMUTA_GCP_KEY_FILE} --iam-account=${SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com

# Allow account to execute queries
#gcloud projects add-iam-policy-binding ${PROJECT_ID} \
#--member="serviceAccount:${SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com" --role=projects/${PROJECT_ID}/roles/bigquery.user
gcloud projects add-iam-policy-binding ${PROJECT_ID} \
--member="serviceAccount:${SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com" --role=roles/bigquery.user

# Allow account to view data
gcloud projects add-iam-policy-binding ${PROJECT_ID} \
--member="serviceAccount:${SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com" --role=roles/bigquery.dataViewer

echo if something went wrong and you want to delete the service account, run:
echo gcloud iam service-accounts delete ${SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com --project ${PROJECT_ID}

Register data sources in Immuta

Required Google BigQuery roles

Ensure that the user creating the data source has these Google BigQuery roles:

roles/bigquery.metadataViewer on the source table (if managed at that level) or dataset
roles/bigquery.dataViewer (or higher) on the source table (if managed at that level) or dataset
roles/bigquery.jobUser on the project

Click the + button in the top-left corner of the screen and select New Data Source.
Select the Google BigQuery tile in the Data Platform section.
Complete these fields in the Connection Information box:
- Account Email Address: Enter the email address of a user with access to the dataset and tables. This is the account created in the Google BigQuery configuration guide.
- Project: Enter the name of the project that has been integrated with Immuta.
- Dataset: Enter the name of the dataset with the tables you want Immuta to ingest.
Upload a BigQuery Key File in the modal. Note that the account in the key file must match the account email address entered in the previous step.
Click the Test Connection button. If the connection is successful, a check mark and successful connection notification will appear and you will be able to proceed. If an error occurs when attempting to connect, the error will be displayed in the UI. In order to proceed to the next step of data source creation, you must be able to connect to this data source using the connection information that you just entered.
Decide how to virtually populate the data source by selecting one of the options:
- Create sources for all tables in this database: This option will create data sources and keep them in sync for every table in the dataset. New tables will be automatically detected and new Immuta views will be created.
- Schema / Table: This option will allow you to specify tables or datasets that you want Immuta to register.
Provide basic information about your data source to make it discoverable to users.
- Enter the SQL Schema Name Format to be the SQL name that the data source exists under in Immuta. For BigQuery the schema will be the BigQuery dataset. The format must include a schema macro but you may personalize it using lowercase letters, numbers, and underscores to personalize the format. It can have up to 255 characters.
- Enter the Schema Project Name Format to be the name of the schema project in the Immuta UI. This is an Immuta project that will hold all of the metadata for the tables in a single dataset.
  - When selecting Create sources for all tables in this database and monitor for changes, you may personalize this field as you wish, but it must include a schema macro to represent the dataset name.
  - When selecting Schema/Table, this field is pre-populated with the recommended project name and you can edit freely.
- Select the Data Source Name Format, which will be the format of the name of the data source in the Immuta UI.
  - <Tablename>: The Immuta data source will have the same name as the original table.
  - <Schema><Tablename>: The Immuta data source will have both the dataset and original table name.
  - Custom: This is a template you create to make the data source name. You may personalize this field as you wish, but it must include a tablename macro. The case of the macro will apply to the data source name (i.e., <Tablename> will result in "Data Source Name," <tablename> will result in "data source name," and <TABLENAME> will result in "DATA SOURCE NAME").
- Enter the SQL Table Name Format, which will be the format of the name of the table in Immuta. It must include a table name macro, but you may personalize the format using lowercase letters, numbers, and underscores. It may have up to 255 characters.
When selecting the Schema/Table option, you can opt to enable schema monitoring by selecting the checkbox in this section. This step will only appear if all tables within a server have been selected for creation.
Optional Advanced Settings:
- Column Detection: To enable, select the checkbox in this section. This setting monitors when remote tables' columns have been changed, updates the corresponding data sources in Immuta, and notifies data owners of these changes. See schema projects overview to learn more about column detection.
- Data Source Tags: Adding tags to your data source allows users to search for the data source using the tags and governors to apply global policies to the data source. Note if schema detection is enabled, any tags added now will also be added to the tables that are detected.
  - Click the Edit button in the Data Source Tags section.
  - Begin typing in the Search by Tag Name box to select your tag, and then click Add.
Click Create to save the data source(s).

Next steps

With data sources registered in Immuta, your organization can now start

building global subscription and data policies to govern data.
creating projects to collaborate.

Bulk Create Snowflake Data Sources

Private preview

This feature is only available to select accounts. Reach out to your Immuta representative to enable this feature.

Requirements

Snowflake Enterprise Edition
Snowflake X-Large or Large warehouse is strongly recommended

Create Snowflake data sources

Set the default subscription policy to None for bulk data source creation. This will simplify the data source creation process by not automatically applying policies.
Make a request to the Immuta V2 API create data source endpoint, as the Immuta UI does not support creating more than 1000 data sources. The following options must be specified in your request to ensure the maximum performance benefits of bulk data source creation. The Skip Stats Job tag is only required if you are using specific policies that require stats; otherwise, Snowflake data sources automatically skip the stats job.
```
"options": {
    "disableSensitiveDataDiscovery": true,
    "tableTags": [
        "Skip Stats Job"
    ]
}
```

Specifying disableSensitiveDataDiscovery as true ensures that sensitive data discovery will not be applied when the new data sources are created in Immuta, regardless of how it is configured for the Immuta tenant. Disabling sensitive data discovery improves performance during data source creation.

Applying the Skip Stats Job tag using the tableTag value will ensure that some jobs that are not vital to data source creation are skipped, specifically the fingerprint and high cardinality check jobs.

When the Snowflake bulk data source creation feature is configured, the create data source endpoint operates asynchronously and responds immediately with a bulkId that can be used for monitoring progress.

Monitor progress

To monitor the progress of the background jobs for the bulk data source creation, make the following request using the bulkId from the response of the previous step:

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    --data @example_payload.json
    https://your-immuta-url.com/jobs?bulkId=<your-bulkId>

The response will contain a list of job states and the number of jobs currently in each state. If errors were encountered during processing, a list of errors will be included in the response:

    {
      "total":"99893",
      "completed":"99892",
      "failed":"0",
      "pending":"1",
      "errors":null
    }

With these recommended configurations, bulk creating 100,000 Snowflake data sources will take between six and seven hours for all associated jobs to complete.

Getting Started with Domains

Private preview

This feature is in preview and available to select accounts. Reach out to your Immuta representative for details.

Domains are containers of data sources that allow you to assign data ownership and access management to specific business units, subject matter experts, or teams at the nexus of cross-functional groups. Instead of centralizing your data governance and giving users too much governance over all your data, you control how much power they have over data sources by granting them permission within domains in Immuta.

Requirement

Domains must be enabled in your account. Reach out to your Immuta representative for guidance.

Create a domain

Required Immuta permission: GOVERNANCE

Navigate to the Domains page.
Click Create Domain.
Enter a Name and Description for your domain.
Click Save.

To create a domain using the API, see the Domains API guide.

Assign domain permissions

Required Immuta permission: USER_ADMIN

Click Domains and navigate to the domain.
Click the Permissions tab.
Click + Grant Permissions.
Select a user and assign the Manage Policies permission to allow them to create policies that will apply to the data sources within the domain.
Click Grant Permissions to save your changes.

To assign permissions using the API, see the Domains API guide.

Protect your data

Required Immuta permission: GOVERNANCE or Manage Policies

Click Policies and select Data Policies or Subscription Policies.
Write your subscription policy or data policy as outlined in the policies how-to guide.
When building your policy, add your domain in the What domain of data should this policy be targeting section. This step will assign the policy to all data sources added to that domain.

Assign existing data sources to a domain

Required Immuta permission: GOVERNANCE

Navigate to the Domains page and select your domain.
Click the Data Sources tab, and then click +Add data sources.
Select the checkboxes for the data sources you want to add to your domain.
Click Add to domain.

To assign data sources using the API, see the Domains API guide.

Delete a domain

Required Immuta permission: GOVERNANCE

Navigate to the Domains page and select your domain.
Click Remove Domain.
Confirm your changes.

To delete a domain using the API, see the Domains API guide.

Schema Monitoring

Manage Schema Projects

Edit Schema Project Connection

Requirement: Must be an owner of the schema project

Navigate to the Project Overview tab.
Click Edit Connection.
Use the Connection Information modal to make any necessary changes.
Click Save.

Edit Schema Monitoring Naming Convention

Requirement: Must be an owner of the schema project

Navigate to the Project Overview tab.
Click Edit Schema Monitoring.
Use the Basic Information modal to make any necessary changes to naming formats.
Click Save.

Add New Schema Monitoring Owner

Requirement: Must be an owner of the schema project

Navigate to the Project Overview tab.
Click Edit Schema Monitoring.
Use the dropdown menu in the Schema Monitoring modal to select a new schema detection owner. The new owner must be an owner of one or more of the data sources belonging to that schema.
Click Save.

Change the Frequency of Schema Monitoring Jobs

Requirement: Immuta permission USER_ADMIN

Navigate to the App Settings page and scroll down to the Advanced Configuration.
Copy and paste this YAML into the text box:
```
schedule:
  schemaEvolutionCheck: 'your-new-frequency'
```
Replace your-new-frequency with the time you would like between schema jobs. For example use */30 * * * * for the queries to run every 30 minutes.
Click Save.

Change the Frequency of Column Detection Jobs

Requirement: Immuta permission USER_ADMIN

Navigate to the App Settings page and scroll down to the Advanced Configuration.
Copy and paste this YAML into the text box:
```
schedule:
  columnEvolutionCheck: 'your-new-frequency'
```
Replace your-new-frequency with the time you would like between column detection jobs. For example use */30 * * * * for the queries to run every 30 minutes.
Click Save.

Run Schema Monitoring Jobs

Manually Run Schema Monitoring Jobs

Manually Run Schema Monitoring Job for All Data Sources

Prerequisite: Immuta permission: USER_ADMIN

You can manually run a schema monitoring job globally using the /dataSource/detectRemoteChanges endpoint of the Immuta API with an empty payload.

Manually Run Schema Monitoring Job as a Data Owner

You can manually run a schema monitoring job for all data sources that you own using the /dataSource/detectRemoteChanges endpoint of the Immuta API with a payload containing the hostname for your data sources or their individual IDs.

Manually Run Schema Monitoring Job as a Data User

You can manually run a schema monitoring job for data sources you are subscribed to using the /dataSource/detectRemoteChanges endpoint of the Immuta API with a payload containing the hostname for your data source and the table name or data source ID.

Manually Run a Column Detection Job

Navigate to the data source overview page.
Click on the health check icon.
Scroll to Column Detection, and click Trigger Detection.

Manage Data Sources

Data Source Settings

As a data owner, you can edit your data source settings and disable, delete, and re-enable a data source.

For other guides related to data source members and management, see the Related guides section.

Edit a data source

Navigate to the Overview tab.
Click the menu icon in the upper right corner of the page and select Edit.
Change your settings in the data source workflow.
Note: Some settings cannot be changed once the data source has been created. In these cases, simply create a new data source with the new settings.
When completed, navigate to the end of the workflow and click Save.
Note: Some data sources may require the data owner to reconnect to the remote database before any changes to the data source can be saved.

For information on specific settings, see the Create a data source guide.

Bulk edit data sources

Data owners can bulk edit data sources that contain the same connection information.

Navigate to the Data Source Overview page and click the hyperlinked Parent Server text, or type the connection string in the search box in the top left of the UI and select your connection string from the list of auto-completed results.
All data sources created from this parent server will display in the center pane.
Select the data sources you would like to edit by clicking the checkbox next to each data source.
Click the Bulk Actions menu and select the change you would like to make from the dropdown menu.
Confirm your edits by following the prompts in the modals that appear.

Disable a data source

Disabling a data source hides it and its data from all users except the data owner. While in this state, the data source will display as disabled in the console for the data owner(s) and other users will not be able to see it at all.

Navigate to the Overview tab.
Click on the menu icon in the upper right corner and select Disable.

A label will appear next to the data source indicating it is now disabled, and a notification will be sent to all users of the data source informing them that the data source has been disabled.

Enable a disabled data source

Navigate to the Overview tab.
Click on the menu icon in the upper right corner and select Enable.

A notification will be sent out to all users of the data source informing them that the data source has been enabled.

Delete a data source

Deleting a data source permanently removes it from Immuta. Data sources must first be disabled before they can be deleted.

Disable the data source.
Navigate to the Overview tab and click the menu icon and select Delete.
Confirm that the data source should be deleted by clicking Delete.

A notification will be sent out to all users of the data source informing them that the data source has been deleted.

Reference guides

For information about data sources and policies, see the following guides:

How-to guides

In addition to adding and managing data source settings as outlined above, data owners can manage data source

column tags
data dictionaries
policies
members

Members

In addition to creating and managing data sources, data owners can add and manage data source members manually. While this is supported, it is not recommended and instead it is much more scalable to manage user access through subscription policies

For other guides related to data source members and management, see the Related guides section.

Add members to a data source

Navigate to the data source and click the Members tab.
Click Add Members and enter the group name or username.
Select their Role:
- Subscriber: The role can have read or write access to the table. This role is only available if there are read access policies on the data source.
- Owner: The role can manage data source members and policies and have read or write access to the table.
- Expert: The role can manage the data dictionary descriptions and have read or write access to the table. This role is only available if there are read access policies on the data source.
You can also opt to specify an expiration date for when the user’s access should expire.
Select Read or Write from the Access Grant dropdown. This option is only available if write policies have been enabled.
Click Add.

Bulk add users to multiple data sources

Search by tag name, column name, global policy, or connection string in the search box in the top left corner of the console. After selecting from the dropdown menu, you will automatically navigate to the Search page, where all relevant data sources will appear.
Select the data sources you want to add users to by clicking the checkbox next to the data source.
Click the dropdown menu in the top right corner of the results page and select Add Users.
In the modal, type the user name or group name in the Enter User Name or Group Name field and select the user or group you would like to add from the dropdown menu.
Opt to set an Expiration for the users' subscriptions. Additionally, you can change the role from Subscriber to Expert or Owner for the users or groups using the dropdown menu in the Actions column.
Click Add. All users and groups will be added to the data sources you selected.

Set user access expiration date for a data source

As a data owner, you can limit the amount of time a user or group has access to your data source by setting an access expiration date.

Navigate to the Members tab.
Adjust the number of days under the Expires column for the user/group whose access you want to limit (the limit is counting from today, so users/groups with 0 days left means their access will be revoked by the end of today and users with 1 day left means their access will be revoked by the end of tomorrow).
Save your changes.

To remove the limit (or set the limit to Never), delete the number from the field and save your changes.

Modify user or group roles within a data source

Navigate to the Members tab.
Click the drop-down arrow under the Role column next to the user/group whose role you’d like to change.
Select another role (subscribed, expert, owner or ingest user, if applicable).

Notifications about the change will be sent to the affected users and groups (as well as alternative Owners).

View user or group subscription history

Navigate to the Members tab.
Click the Name of the user or group whose history you want to review.

Remove users or groups from a data source

As a data owner, you can deny access to any users or groups at any time.

Navigate to the Members tab.
To remove a user or group from a data source, click Deny in the Actions column next to the user or group you want to remove.
Complete the Deny Access form, including a reason for revoking the access.

This action will immediately update users' or groups' subscription status, and they will no longer have any access to the data source. Notifications will be sent to the affected users (as well as alternative data owners) informing them of the change in subscription status.

Reference guide

For information about data source members and subscriptions, see the data source user roles section.

How-to guides

In addition to adding and managing data source members as outlined above, data owners can manage data source

column tags
data dictionaries
settings

Access Requests

Your outgoing and incoming requests for data source access are consolidated on the requests tab on your user profile page. Similar to notifications, a red dot also displays on the request icon whenever you have pending requests. The sections below guide you through managing these requests.

Manage requests

Navigate to your Profile page, and then click the Requests tab. The names of the users who have submitted requests are displayed in the left pane. Once a user is selected, the corresponding pending requests are displayed on the right.
To view more information about the request, click the Details button in the Actions column of a request.
Click the Approve or Deny button in the Actions column of the request.

Bulk approvals

To approve or deny multiple access requests simultaneously,

Navigate to your Profile page, and then click the Requests tab.
Select the checkbox next to each request you want to address, and then click the Approve Selected or Deny Selected button.

Manage Tasks

If users make an unmask request, a tasks tab will appear on the data source overview page for the user making the request and the user receiving the request. From this tab, users can view and manage two different task views:

Your Created Tasks: This page lists the status and information of the unmask requests you've submitted.
Tasks For You: This page lists the status and information of the unmask requests that have been submitted to you.

To complete a task,

Navigate to the Tasks tab from the Data Source Overview page, and then click the toggle at the top of the page to Tasks For You.
Click the Unmask Values icon in the Actions column of the task.
A dialog box will appear with the masked and unmasked value. Note: You can view information about this request, including the reason for the request and the date is was created, by clicking the Task Info button in the Actions column.

To delete a task,

Navigate to the Tasks tab from the Data Source Overview page, and then click the toggle at the top of the page to Tasks For You.
Click Delete Task in the Actions column of the relevant task.

Reference guide

For information about data sources see the Data sources in Immuta overview.

How-to guides

In addition to managing data source requests as outlined above, data owners can manage data source

column tags
data dictionaries
policies
members
settings

Data Dictionary

The data dictionary provides information about the columns within the data source, including column names and value types.

As a data owner, you can manage data dictionary descriptions, discussions, and column tags. For other guides related to the data dictionary, see the Related guides section.

Manage data dictionary descriptions

Navigate to the Data Dictionary tab.
To add or edit column descriptions, click the menu icon in the Actions column next to the entry you want to change and select Edit.
Complete the fields in the form that appears, and then click Save.

Manage dictionary discussions

Deprecation notice

Support for this feature has been deprecated.

Navigate to the Data Dictionary tab.
Click the talk bubble icon to the right of the definition.
View discussions on the far right side of the Data Dictionary page.
Click Resolved to review any resolved threads or Open to review all open threads.
- To reply to an existing thread, click on the comment, type in a reply, and then click the Save button.
- To start a new discussion, click New Discussion, type a new comment or question, and click the Save button.
- To resolve or delete a thread, click the menu icon and select Mark Resolved or Delete.

A notification will be sent to all subscribers of the data source.

Reference guide

For information about the data dictionary, see the Data sources in Immuta overview.

How-to guides

In addition to managing data dictionary descriptions and discussions as outlined above, you can interact with the data dictionary in the following ways:

Data owners or experts can manage column tags.
Users subscribed to the data source can post and reply to discussion threads by commenting on the data dictionary.

Subscribe to Data Sources

Audience: Data Users
Content Summary: Immuta provides a one-stop shop for all data across an organization through data sources, which are virtual exposures of data. This data is protected by policy rules written and applied to the data sources by Data Owners, so Data Users can view details about, comment on, and subscribe to these data sources without violating privacy regulations.
This guide delineates the processes of accessing and using data source features in the Immuta UI.

View My Data Sources

Immuta provides the My Data Sources view for users to quickly view data sources they are affiliated with.

Click on the Data Source icon in the panel on the left.
By default, all data sources will be displayed.
To view only data sources you have access to, click on the My Data Sources tab.
Note: If no data sources are displayed, you are not subscribed to, an expert of, or an owner of any data sources.
To view the details of a data source, simply click on the Data Source.

Click the Get Access button from either this data sources view or the Data Source Overview tab, which can be accessed by clicking on the Data Source.
If prompted, fill out the custom request form set up by the system admin and click Request Access.
A notification will be sent to the Data Owner(s) informing them of your request.
Once reviewed, you will receive a notification with a response indicating if your request was accepted or denied.
If accepted, the status displayed next to that data source will be updated to "Subscribed" and you will have access to the data source via your personal SQL connection. If not accepted, a reason will be provided in the notification details.

Bulk Access Requests

To request access to multiple data sources simultaneously,

Filter data sources by connection string in the search bar.
Click the connection string from the auto-completed results to navigate to the Search Results page.
Select the data sources you would like to subscribe to by clicking the checkbox next to each relevant data source.
Click the dropdown menu button in the top right corner of the page, and then select Request Access.
Describe how you plan to use the data in the dialog box that appears, and then click Subscribe.

Unsubscribe from a Data Source

If you no longer need access to a data source, click Unsubscribe in the upper right corner of the Data Source Overview tab.

Manually Run Health Jobs

If a data source health check fails and needs to be re-generated,

Click the health indicator in the upper right-hand corner.
Select Re-run on the job you want to run.

Note: To generate a fingerprint, the row count must be up-to-date.

View the Data Dictionary

To view the Data Dictionary,

Navigate to the Data Dictionary tab.
When provided, the Data Dictionary will display in the center pane and include the column’s name, type, and comments. Masked columns will display a symbol next to their names.

View and Comment on Data Source Discussions

Deprecation notice

Support for this feature has been deprecated.

Data Users have the ability to comment on or ask questions about the Data Dictionary columns and definitions, public queries, and the data source in general.

Comment on General Data Source Discussions

To comment on general data source discussions,

Navigate to the Discussions tab.
Click Open to review open discussions or Resolved to review resolved discussions.
Click a discussion to view comments.
Click New Discussion to create a new discussion.
Type in your comment or question and click Save.

Comment on Dictionary Discussions

To comment on Data Dictionary discussions,

Navigate to the Data Dictionary tab.
Click the talk bubble icon to the right of the definition.
View discussions on the far right side of the Data Dictionary page.
Click Resolved to review any resolved threads or Open to review all open threads.
To reply to an existing thread, click on the comment, type in a reply, and then click the Save button.
To start a new discussion, click New Discussion, type a new comment or question, and click the Save button.
A notification will be sent to all subscribers of the data source, including the Data Owner and Experts so that they can review the thread and reply.
Once created, Data Dictionary discussions can be continued under the Discussions tab.

View Data Source Contact Information

Contact information for Data Owners is provided for each data source, which allows other users to ask them questions about accessibility and attributes required for viewing the data.

To view this contact information, click the Contacts tab.

Make an Unmasking Request

To request to unmask a value in a data source,

Navigate to the Data Source Overview tab, click the dropdown menu in the top right corner, and select Make Unmasking Request.
In the modal that appears, select the column from the first dropdown menu, and then complete the Values to Unmask and the Reason for Unmasking fields.
Select the user to unmask the value from the final dropdown menu, and then click Submit.

A Tasks tab will then appear for your data source that details the task type and the status of your request. You may also view task information or delete the task from this page.

View and Manage Tasks

After sending an Unmask request, a Tasks tab will appear on the Data Source Overview page listing the target and requesting users, the task type, and the state of the task.

To view information about a task,

Navigate to the Tasks tab from the Data Source Overview page.
Click the Task Info icon in the Actions column of the relevant task.

To delete a task,

Navigate to the Tasks tab from the Data Source Overview page.
Click Delete Task.

Query Data

Snowflake

Audience: Data Users
Content Summary: This page offers a tutorial on how to query data within the .
Prerequisites:

Query Data with Snowflake Governance Controls

With

Query the data exactly as you would have before Immuta.

Without

Prerequisite: Users have been granted SELECT privileges on all or relevant Snowflake tables

Query the data exactly as you would have before Immuta.

Query Data without Snowflake Governance Controls

Prerequisite: REVOKE users' access to backing tables

Create a new worksheet.
Select the appropriate role/database/schema to query the views:
1. Role: any or just PUBLIC.
2. Warehouse: any.
3. Database: The Immuta database name used when (default is IMMUTA).
4. Schema: The schema that houses the backing tables of your Immuta data sources.
Run the following query in the worksheet (replace IMMUTA with the name of your database and replace TABLE.NAME with the name of the backing Snowflake table):
```
SELECT * FROM IMMUTA.TABLE.NAME LIMIT 100
```

Databricks

Audience: Data Users
Content Summary: This page offers a tutorial on how to query data within the .
Prerequisites:

Query Data with Python

Create a new workspace.
Query the Immuta-protected data, which takes the form of database.table_name:
1. Database: The database that houses the backing tables of your Immuta data sources.
2. Table Name: The name of the table backing your Immuta data sources.

Run your query, it should look something like:

df = spark.sql('select * from database.table_name')
df.show()

Query Data with SQL

Create a new workspace.
Query the Immuta-protected data, which takes the form of database.table_name:
1. Database: The database that houses the backing tables of your Immuta data sources.
2. Table Name: The name of the table backing your Immuta data sources.
Run your query. It should look something like this:
```
select * from database.table_name;
```

Query Data with SparkR

Establish the User's Identity

Create a new workspace.
Run:
```
library(SparkR)
```

Run a Query

In the same workspace, but a different cell, query the Immuta-protected data, which takes the form of database.table_name:
1. Database: The database that houses the backing tables of your Immuta data sources.
2. Table Name: The name of the table backing your Immuta data sources.

Run your query. It should look something like this:

df <- SparkR::sql("select * from database.table_name")
SparkR::head(df)

Query Data with Scala

Query the Immuta-protected data, which takes the form of database.table_name:
1. Database: The database that houses the backing tables of your Immuta data sources.
2. Table Name: The name of the table backing your Immuta data sources.

Run your query. It should look something like this:

val sqlDF = spark.sql("select * from database.tablename")
sqlDF.show()

Databricks SQL

Audience: Data Users
Content Summary: This page offers a tutorial on how to query data within the .
Prerequisites:
REVOKE users' access to raw tables

Query Data

Select SQL from the upper left menu in Databricks.
Click Create → Query.
Select the Immuta database name used when .
Query the Immuta-protected data, which takes the form of immuta_database.backing_database_table_name:
1. Immuta Database: The Immuta database name used when .
2. Backing Database: The database that houses the backing tables of your Immuta data sources.
3. Table Name: The name of the table backing your Immuta data sources.
Run your query (you must include the database). It should look something like this:
```
select * from immuta_database.backing_database_table_name limit 100
```

Starburst (Trino)

Audience: Data Users
Content Summary: This page offers a tutorial on how to query data within the .
Prerequisites:
REVOKE users' access to raw tables

Query Data

Use your tool of choice to connect to Starburst (Trino).
Select the Immuta catalog name used when .
Query the Immuta-protected data, which takes the form of immuta_catalog.backing_schema.table_name:
1. Immuta Catalog: The Immuta catalog name used when .
2. Backing Schema: The schema that houses the backing tables of your Immuta data sources.
3. Table Name: The name of the table backing your Immuta data sources.
Run your query (it is recommended that you use the catalog in the query). It should look something like this:
```
select * from immuta_catalog.backing_schema.table_name limit 100
```

Redshift

Audience: Data Users
Content Summary: This page offers a tutorial on how to query data within the .
Prerequisites:
REVOKE users' access to raw tables

Query Data

Use your tool of choice to connect to Redshift.
Select the Immuta database name used when .
Query the Immuta-protected data, which takes the form of immuta_database.backing_schema.table_name:
1. Immuta Database: The Immuta database name used when .
2. Backing Schema: The schema that houses the backing tables of your Immuta data sources.
3. Table Name: The name of the table backing your Immuta data sources.
Run your query (it is recommended that you use the catalog in the query). It should look something like this:
```
select * from immuta_database.backing_schema.table_name limit 100
```

Azure Synapse Analytics

Audience: Data Users
Content Summary: This page offers a tutorial on how to query data within the .
Prerequisites:
REVOKE users' access to raw tables
GRANT users' access to the Immuta schema

Query Data

From Synapse Studio click on the Data menu on the left.
Click on the Workspace tab.
Expand databases and you should see the dedicated pool you specified when .
Expand the dedicated pool and you should see the Immuta schema you created when .
Select that schema.
Select New SQL script and then Empty script.
Run your query (note that Synapse does not support LIMIT and the SQL is case sensitive). It should look something like this:
```
SELECT TOP 100 * FROM immuta_schema.backing_database_backing_table;
```