1 of 5

Reference Guides

Data Sources in Immuta

Data owners expose their data across their organization to other users by registering that data in Immuta as a data source.

By default, data owners can register data in Immuta without affecting existing policies on those tables in their remote system, so users who had access to a table before it was registered can still access that data without interruption. If this default behavior is disabled on the App Settings page, a subscription policy that requires data owners to manually add subscribers to data sources will automatically apply to new data sources (unless a global policy you create applies), blocking access to those tables.

For information about this default subscription policy and how to manage it, see the default subscription policy page.

Data Sources With Nested Columns

When data sources support nested columns, these columns get parsed into a nested Data Dictionary. Below is a list of data sources that support nested columns:

S3
Azure Blob
Databricks sources with complex data types enabled
- When complex types are enabled, Databricks data sources can have columns that are arrays, maps, or structs that can be nested.

Data Source Health Checks

When an Immuta data source is created, background jobs use the connection information provided to compute health checks dependent on the type of data source created and how it was configured. These data source health checks include the

blob crawl status: indicates whether the blob was successfully crawled. If this check fails, the overall health status of the data source will be Not Healthy.
column detection status: indicates whether the job run to determine if a column was added or removed from the remote table registered as an Immuta data source was successful.
external catalog link status: indicates whether or not the external catalog was successfully linked to the data source. If this check fails, the overall health status of the data source will be Not Healthy.
fingerprint generation status: indicates whether or not the data source fingerprint was successfully generated.
global policy applied status: indicates whether global policies were successfully applied to the data source.
high cardinality calculation status: indicates whether the data source's high cardinality column was successfully calculated.
native SQL sync status (for Snowflake data sources): indicates whether Snowflake governance policies have been successfully synced.
native SQL view creation status (for Snowflake and Redshift data sources): indicates whether native views were properly created for Redshift and Snowflake tables registered in Immuta.
row count status: indicates whether the number of rows in the data source was successfully calculated.
schema detection status: indicates whether the job run to determine if a remote table was added or removed from the schema was successful.
sensitive data discovery status: indicates whether sensitive data discovery was successfully run on the data source.

After these jobs complete, the health status for each is updated to indicate whether the status check passed, was skipped, is unknown, or failed.

These background jobs can be disabled during data source creation by adding a specific tag to prevent automatic table statistics. This prevent statistics tag can be set on the App Settings page by a System Administrator. However, with automatic table statistics disabled these policies will be unavailable until the Data Source Owner manually generates the fingerprint:

Masking with format preserving masking
Masking with K-Anonymization
Masking using randomized response

Unhealthy Databricks Data Sources

Unhealthy data sources may fail their row count queries if they run against a cluster that has the Databricks query watchdog enabled.

Limitations

Data sources with over 1600 columns will not have health checks run, but will still appear as healthy. The health check cannot be run automatically or manually.

Data Source User Roles

There are various roles users and groups can play relating to each data source. These roles are managed through the Members tab of the Data Source. They include

Owners: Those who create and manage new data sources and their users, documentation, Data Dictionaries, and queries. They are also capable of ingesting data into their data sources as well as adding ingest users (if their data source is object-backed).
Subscribers: Those who have access to the data source data. With the appropriate data accesses and attributes, these users/groups can view files, run SQL queries, and generate analytics against the data source data. All users/groups granted access to a data source (except for those with the ingest role) have subscriber status.
Experts: Those who are knowledgeable about the data source data and can elaborate on it. They are responsible for managing the data source's documentation and the Data Dictionary.
Ingest: Those who are responsible for ingesting data for the data source. This role only applies to object-backed data sources (since query-backed data sources are ingested automatically). Ingest users cannot access any data once it's inside Immuta, but they are able to verify if their data was successfully ingested or not.

See Manage data source members for a tutorial on modifying user roles.

Data Dictionary

The Data Dictionary provides information about the columns within the data source, including column names and value types. Users subscribed to the data source can post and reply to discussion threads by commenting on the Data Dictionary.

Dictionary columns are automatically generated when the data source is created. However, data owners and experts can tag columns in the data dictionary and add descriptions to these entries.

Domains Overview

Private preview

This feature is in preview and available to select accounts. Reach out to your Immuta representative for details.

Domains are containers of data sources that group data into user-defined sets, where specific users can be assigned a domain-specific permission to manage policies on only the data sources in those domains. Domains eliminate the problem of giving users too much governance over all data sources in an organization. Instead, you can control how much power governance users have over data sources by granting them privileges within domains (and only those domains) in Immuta.

Domains allow you to grant more users authority to manage policies, making Immuta easier to use and more secure.

Permissions

The table below outlines the global Immuta permissions and domain permissions necessary to manage domains.

Permission

User actions

Domains actions

Data source actions

Policy actions

USER_ADMIN (global)

Manage user permissions, including domain-specific permissions on ALL domains

None

GOVERNANCE (global)

None

Create domains
Manage domain description and name
Delete any empty domain

Add existing data sources to any domain
Remove data sources from any domain without adding it to another domain

Create global policies that apply to ANY data sources (inside or outside domains)

Manage Policies (domain)

None

Create policies that apply to the domain(s) they are granted to manage policies in

Domain data sources

Data sources can be assigned to domains to restrict the users who can manage policies on those data sources. Data sources could be assigned to domains based on business units in your organization or any other method that suits your business goals and policy management strategy. Data sources can belong to more than one domain.

Once a data source is assigned to a domain, only users with the global GOVERNANCE or domain-specific Manage Policies permission can create policies that will apply to that data source, allowing you to control who can manage data access.

When data sources are added to a domain, users do not have to be added to the domain to access data. Instead, they must meet the restrictions outlined in the policies on the data sources.

Managing domain data sources

Only users with the GOVERNANCE permission can change the domain that a data source belongs to or remove a data source from a domain. When a data source is removed from a domain, Immuta recomputes the policies. Any policies associated with a domain that were applied to the data source will be removed when the data source is removed from the domain.

Domain policies

When authorized users assign policies to a domain, those policies only apply to the data sources in that domain. Domains restrict who can write policies for data sources assigned to that domain, while Immuta policies are enforced as usual: users who meet the restrictions outlined in the policy of a data source may subscribe to that data source.

When data sources are added to or removed from a domain, Immuta recomputes the data source policies. Then, policies associated with the domain will be applied to the data source.

Managing domain policies

Users with the Manage Policies permission in a domain can set a global policy to apply to only the domains for which they have the Manage Policies permission. For example, if a user has Manage Policies on one domain, all global policies they write will be assigned to just that domain, and data sources within that domain will have those policies enforced on them. If users have the Manage Policies permission on a subset of domains in their organization, they can choose from all of the permutations of the domains for which they have the Manage Policies permission to assign policies to.

If users have the Manage Policies permission on all domains in an organization or the GOVERNANCE permission, they can set a global policy to apply to all data sources.

Deleting a domain

Users with the GOVERNANCE permission can delete any domain that has zero data sources assigned to it.

Migrating to domains

Existing data sources can be assigned to a domain by a user with the GOVERNANCE permission. Once added to a domain, domain policies will be enforced on the data sources.

Schema Monitoring

Schema monitoring allows organizations to monitor their data environments. When it is enabled, Immuta monitors the organization's servers to detect when new tables or columns are created or deleted, and automatically registers (or disables) those tables in Immuta. These newly updated data sources will then have any global policies and tags that are set in Immuta applied to them. The Immuta data dictionary will be updated with any column changes, and the Immuta environment will be in sync with the organization's data environment. This automated process helps organization keep compliant without the need to manually keep data sources up to date.

Schema monitoring is enabled while creating or editing a data source. It runs every night by default but . Data Owners or Governors can edit the naming convention for newly detected data sources and the Schema Detection Owner from the after it has been enabled.

See the for instructions on enabling schema monitoring or for instructions on editing the schema monitoring settings.

Column Detection

Column detection is a part of schema monitoring, but can also be enabled on its own to detect the column changes of a select group of tables. Column detection monitors when columns are added or removed from a table and when column types are changed and updates those changes in the appropriate Immuta data source's data dictionary.

See one of for instructions on enabling column detection.

Tracking New Data Sources and Columns

When new data sources and columns are detected and added to Immuta, they will automatically be tagged with the New tag. This allows Governors to use the to mask the data sources and columns, since they could contain sensitive data. Data Owners can then review and approve these changes from the Requests tab of their profile page. Approving column changes removes the New tags from the data source.

The New Column Added Global Policy is active by default.

See to stage this seeded Global Policy if you do not want new columns automatically masked.

Workflow

Immuta user with Schema Monitoring enabled.
Every 24 hours, at 12:30 a.m. UTC by default, Immuta checks the servers for any changes to tables and columns.
If Immuta detects a change, it will update the appropriate Immuta data source or column:
1. If Immuta detects a new table, then Immuta creates an Immuta data source for that table and tags it "New".
2. If Immuta detects a table has been deleted, then Immuta disables that table's data source.
3. If Immuta detects a previously deleted table has been re-created, then Immuta restores that table's data source and tags it "New".
4. If Immuta detects a new column within a table, then Immuta adds that column to the data dictionary and tags it "New".
5. If Immuta detects a column has been deleted, then Immuta deletes that column from the data dictionary.
6. If Immuta detects a column type has changed, then Immuta updates the column type in the data dictionary.
Data sources and columns tagged "New" will be masked by the until a Governor or Data Owner approves the changes.

Native Schema Monitoring for Snowflake

Immuta can monitor your data environment, detect when new tables or columns are created or deleted in Snowflake, and automatically register (or disable) those tables in Immuta for you. Those newly updated data sources will then have any global policies and tags that you have set up applied to them. The Immuta data dictionary will be updated with any new columns, and your Immuta environment will be in sync with your Snowflake tables. This automated process helps with scaling and keeping your organization compliant without the need to manually keep your data sources up to date.

Architecture

Once enabled on a data source, Immuta calls to Snowflake every 24 hours by default to find when each table within the registered schema was last altered. If the timestamp is after the last time native schema monitoring was run, then Immuta will update the table or columns that have been altered. This process works well when monitoring a large number of data sources because it only updates the recently altered tables and cuts down the amount of Snowflake computing required to run column detection, which specifically updates the columns of registered data sources.

Automatic workflow

Every 24 hours, at 12:30 a.m. UTC by default, Immuta sends a query to Snowflake for the information_schema view asking for when each data source’s table was last altered.
If the table was altered after the last time native schema detection ran, Immuta updates the data source, columns, and data dictionary.

Limitations

This feature only works with Snowflake data sources. Any non-Snowflake data sources will run with the legacy schema monitoring described above.
Your organization will not see performance improvements if it is making changes to all tables consistently. This feature is intended to improve performance for organizations with a large number of tables and relatively few changes made within the ecosystem comparatively.

Migration

There is no migration required for this feature. Native schema monitoring will run on all Snowflake data sources with legacy schema monitoring previously enabled and will run on all new Snowflake data sources with schema monitoring enabled.

Configuration

There is no additional configuration required for this feature. You just need to enable schema monitoring when you create your Snowflake data sources.

Schema Projects

Schema projects are automatically created and managed by Immuta. They group all the data sources of the schema, and when new data sources are created, manually or with schema monitoring, they are automatically added to the schema project. They work as a tool to organize all the data sources within a schema, which is particularly helpful with schema monitoring enabled.

Schema projects are created when tables are registered as data sources in Immuta. The user creating the data source does not need the CREATE_PROJECT permission to have the project auto-create because no data sources can be added by the owner. Instead, new data sources are managed by Immuta. The user can manage Subscription policies for schema projects, but they cannot apply Data policies or purposes to them.

The schema settings, such as schema evolution and connection information, can be edited from the project overview tab. Note: Deleting the project will delete all of the data sources within it as well.

Schema Project Actions

Schema settings are edited from the project overview tab:

Schema Project Connection Details: Editing these details will update them for all the data sources within the schema project.
Data Source Naming Convention: When schema monitoring is enabled, new data sources will be automatically detected and added to the schema project. Updating the naming convention will change how these newly detected data sources are named by Immuta.
Schema Detection Owner: When schema monitoring is enabled, a user is assigned to be the owner of any detected and Immuta created data source.
Disable or delete your schema project. Note: Deleting the project will delete all of the data sources within it as well.

Data Sources in Immuta

Data owners expose their data across their organization to other users by registering that data in Immuta as a data source.

For information about this default subscription policy and how to manage it, see the default subscription policy page.

Data Sources With Nested Columns

When data sources support nested columns, these columns get parsed into a nested Data Dictionary. Below is a list of data sources that support nested columns:

S3
Azure Blob
Databricks sources with complex data types enabled
- When complex types are enabled, Databricks data sources can have columns that are arrays, maps, or structs that can be nested.

Data Source Health Checks

blob crawl status: indicates whether the blob was successfully crawled. If this check fails, the overall health status of the data source will be Not Healthy.
column detection status: indicates whether the job run to determine if a column was added or removed from the remote table registered as an Immuta data source was successful.
external catalog link status: indicates whether or not the external catalog was successfully linked to the data source. If this check fails, the overall health status of the data source will be Not Healthy.
fingerprint generation status: indicates whether or not the data source fingerprint was successfully generated.
global policy applied status: indicates whether global policies were successfully applied to the data source.
high cardinality calculation status: indicates whether the data source's high cardinality column was successfully calculated.
native SQL sync status (for Snowflake data sources): indicates whether Snowflake governance policies have been successfully synced.
native SQL view creation status (for Snowflake and Redshift data sources): indicates whether native views were properly created for Redshift and Snowflake tables registered in Immuta.
row count status: indicates whether the number of rows in the data source was successfully calculated.
schema detection status: indicates whether the job run to determine if a remote table was added or removed from the schema was successful.
sensitive data discovery status: indicates whether sensitive data discovery was successfully run on the data source.

After these jobs complete, the health status for each is updated to indicate whether the status check passed, was skipped, is unknown, or failed.

Masking with format preserving masking
Masking with K-Anonymization
Masking using randomized response

Unhealthy Databricks Data Sources

Unhealthy data sources may fail their row count queries if they run against a cluster that has the Databricks query watchdog enabled.

Limitations

Data sources with over 1600 columns will not have health checks run, but will still appear as healthy. The health check cannot be run automatically or manually.

Data Source User Roles

There are various roles users and groups can play relating to each data source. These roles are managed through the Members tab of the Data Source. They include

Owners: Those who create and manage new data sources and their users, documentation, Data Dictionaries, and queries. They are also capable of ingesting data into their data sources as well as adding ingest users (if their data source is object-backed).
Subscribers: Those who have access to the data source data. With the appropriate data accesses and attributes, these users/groups can view files, run SQL queries, and generate analytics against the data source data. All users/groups granted access to a data source (except for those with the ingest role) have subscriber status.
Experts: Those who are knowledgeable about the data source data and can elaborate on it. They are responsible for managing the data source's documentation and the Data Dictionary.
Ingest: Those who are responsible for ingesting data for the data source. This role only applies to object-backed data sources (since query-backed data sources are ingested automatically). Ingest users cannot access any data once it's inside Immuta, but they are able to verify if their data was successfully ingested or not.

See Manage data source members for a tutorial on modifying user roles.

Data Dictionary

Dictionary columns are automatically generated when the data source is created. However, data owners and experts can tag columns in the data dictionary and add descriptions to these entries.

Domains Overview

Private preview

This feature is in preview and available to select accounts. Reach out to your Immuta representative for details.

Domains allow you to grant more users authority to manage policies, making Immuta easier to use and more secure.

Permissions

The table below outlines the global Immuta permissions and domain permissions necessary to manage domains.

Permission

User actions

Domains actions

Data source actions

Policy actions

USER_ADMIN (global)

Manage user permissions, including domain-specific permissions on ALL domains

None

GOVERNANCE (global)

None

Create domains
Manage domain description and name
Delete any empty domain

Add existing data sources to any domain
Remove data sources from any domain without adding it to another domain

Create global policies that apply to ANY data sources (inside or outside domains)

Manage Policies (domain)

None

Create policies that apply to the domain(s) they are granted to manage policies in

Domain data sources

When data sources are added to a domain, users do not have to be added to the domain to access data. Instead, they must meet the restrictions outlined in the policies on the data sources.

Managing domain data sources

Domain policies

When data sources are added to or removed from a domain, Immuta recomputes the data source policies. Then, policies associated with the domain will be applied to the data source.

Managing domain policies

If users have the Manage Policies permission on all domains in an organization or the GOVERNANCE permission, they can set a global policy to apply to all data sources.

Deleting a domain

Users with the GOVERNANCE permission can delete any domain that has zero data sources assigned to it.

Migrating to domains

Existing data sources can be assigned to a domain by a user with the GOVERNANCE permission. Once added to a domain, domain policies will be enforced on the data sources.

Schema Monitoring

See the for instructions on enabling schema monitoring or for instructions on editing the schema monitoring settings.

Column Detection

See one of for instructions on enabling column detection.

Tracking New Data Sources and Columns

The New Column Added Global Policy is active by default.

See to stage this seeded Global Policy if you do not want new columns automatically masked.

Workflow

Immuta user with Schema Monitoring enabled.
Every 24 hours, at 12:30 a.m. UTC by default, Immuta checks the servers for any changes to tables and columns.
If Immuta detects a change, it will update the appropriate Immuta data source or column:
1. If Immuta detects a new table, then Immuta creates an Immuta data source for that table and tags it "New".
2. If Immuta detects a table has been deleted, then Immuta disables that table's data source.
3. If Immuta detects a previously deleted table has been re-created, then Immuta restores that table's data source and tags it "New".
4. If Immuta detects a new column within a table, then Immuta adds that column to the data dictionary and tags it "New".
5. If Immuta detects a column has been deleted, then Immuta deletes that column from the data dictionary.
6. If Immuta detects a column type has changed, then Immuta updates the column type in the data dictionary.
Data sources and columns tagged "New" will be masked by the until a Governor or Data Owner approves the changes.

To run schema monitoring or column detection manually, see the .

Native Schema Monitoring for Snowflake

Architecture

If you have an Immuta environment with data sources other than Snowflake, the will run on all non-Snowflake data sources. The native schema monitoring feature only works with Snowflake integrations and Snowflake data sources.

Automatic workflow

Immuta user .
Every 24 hours, at 12:30 a.m. UTC by default, Immuta sends a query to Snowflake for the information_schema view asking for when each data source’s table was last altered.
If the table was altered after the last time native schema detection ran, Immuta updates the data source, columns, and data dictionary.
Immuta tags new data sources and columns with the tag “New” so that you can use the to mask all new data until it has been reviewed.

Limitations

This feature only works with Snowflake data sources. Any non-Snowflake data sources will run with the legacy schema monitoring described above.
Your organization will not see performance improvements if it is making changes to all tables consistently. This feature is intended to improve performance for organizations with a large number of tables and relatively few changes made within the ecosystem comparatively.

Migration

Configuration

There is no additional configuration required for this feature. You just need to enable schema monitoring when you create your Snowflake data sources.