Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Schema projects are automatically created and managed by Immuta. They group all the data sources of the schema, and when new data sources are created, manually or with schema monitoring, they are automatically added to the schema project. They work as a tool to organize all the data sources within a schema, which is particularly helpful with schema monitoring enabled.
Schema projects are created when tables are registered as data sources in Immuta. The user creating the data source does not need the CREATE_PROJECT permission to have the project auto-create because no data sources can be added by the owner. Instead, new data sources are managed by Immuta. The user can manage Subscription policies for schema projects, but they cannot apply Data policies or purposes to them.
The schema settings, such as schema evolution and connection information, can be edited from the project overview tab. Note: Deleting the project will delete all of the data sources within it as well.
Schema settings are edited from the project overview tab:
Schema Project Connection Details: Editing these details will update them for all the data sources within the schema project.
Data Source Naming Convention: When schema monitoring is enabled, new data sources will be automatically detected and added to the schema project. Updating the naming convention will change how these newly detected data sources are named by Immuta.
Schema Detection Owner: When schema monitoring is enabled, a user is assigned to be the owner of any detected and Immuta created data source.
Disable or delete your schema project: Deleting the project will delete all of the data sources within it as well.
Navigate to the Project Overview tab.
Click Edit Connection.
Use the Connection Information modal to make any necessary changes.
Click Save.
Navigate to the Project Overview tab.
Click Edit Schema Monitoring.
Use the Basic Information modal to make any necessary changes to naming formats.
Click Save.
Navigate to the Project Overview tab.
Click Edit Schema Monitoring.
Use the dropdown menu in the Schema Monitoring modal to select a new schema detection owner. The new owner must be an owner of one or more of the data sources belonging to that schema.
Click Save.
With schema monitoring enabled, Immuta monitors your organization's servers to find when new tables or columns are created or deleted and automatically registers (or disables) those tables in Immuta.
Manage schema monitoring: Edit connection information, schema project owner, or the naming conventions of data registered in the schema.
Run schema monitoring jobs: Manually trigger schema monitoring.
Schema monitoring: This reference guide describes the design and components of schema monitoring.
Schema projects: This reference guide describes schema projects, which group all the data sources of a schema.
Why use schema monitoring?: This explanatory guide provides a conceptual overview of schema monitoring. It offers a discussion of the benefits of the feature, context for why it was developed, and insights into the features schema monitoring pairs with. This guide is designed to deepen your understanding of schema monitoring's purpose as you implement it.
Schema monitoring allows organizations to monitor their data environments. When it is enabled, Immuta monitors the organization's servers to detect when new tables or columns are created or deleted, and automatically registers (or disables) those tables in Immuta. These newly updated data sources will then have any global policies and tags that are set in Immuta applied to them. The Immuta data dictionary will be updated with any column changes, and the Immuta environment will be in sync with the organization's data environment. This automated process helps organizations keep compliant without the need to manually keep data sources up to date.
Schema monitoring is enabled while creating or editing a data source and only registers new tables and columns within known schemas. It does not register new schemas. Data owners or governors can edit the naming convention for newly detected data sources and the schema detection owner from the schema project page after it has been enabled.
See the Register a data source guides for instructions on enabling schema monitoring or Manage schema monitoring for instructions on editing the schema monitoring settings.
Column detection is a part of schema monitoring, but can also be enabled on its own to detect the column changes of a select group of tables. Column detection monitors when columns are added or removed from a table and when column types are changed and updates those changes in the appropriate Immuta data source's data dictionary.
See one of the Register a data source guides for instructions on enabling column detection.
When new data sources and columns are detected and added to Immuta, or when column types have changed, they will always automatically be tagged with the New
tag. This allows governors to use the seeded New Column Added
global policy to mask columns with the New
tag, since they could contain sensitive data.
The New Column Added
global policy is staged (inactive) by default.
See the Clone, activate, or stage a global policy guide to activate this seeded global policy if you want any columns with the New
tag to be automatically masked.
When schema monitoring is enabled and there is an active policy that targets the New
tag, Immuta sends validation requests to data owners for the following changes made in the remote data platform:
Column added: Immuta applies the New
tag on the column that has been added and sends a request to the data owner to validate if the new column contains sensitive data. Once the data owner confirms they have validated the content of the column, Immuta removes the New
tag from it and as a result any policy that targets the New
column tag no longer applies.
Column data type changed: Immuta applies the New
tag on the column where the data type has been changed and sends a request to the data owner to validate if the column contains sensitive data. Once the data owner confirms they have validated the content of the column, Immuta removes the New
tag from it and as a result any policy that targets the New
column tag no longer applies.
Column deleted: Immuta deletes the column from the data source's data dictionary in Immuta. Then, Immuta sends a request to the data owner to validate the deleted column.
Data source created: Immuta applies the New
tag on the data source that has been newly created and sends a request to the data owner to validate if the new data source contains sensitive data. Once the data owner confirms they have validated the content of the data source, Immuta removes the New
tag from it and as a result any policy that targets the New
data source tag no longer applies.
For instructions on how to view and manage your assigned tasks in the Immuta UI, see the Manage data source requests guide. To view and manage your assigned tasks via the Immuta API, see the Manage data source requests section of the API documentation.
Immuta user registers a data source with schema monitoring enabled.
Every 24 hours, at 12:30 a.m. UTC by default, Immuta checks the servers for any changes to tables and columns.
If Immuta finds a change, it will update the appropriate Immuta data source or column:
If Immuta finds a new table, then Immuta creates an Immuta data source for that table and tags it New
.
If Immuta finds a table has been deleted, then Immuta disables that table's data source.
If Immuta finds a previously deleted table has been re-created, then Immuta restores that table's data source and tags it New
.
If Immuta finds that the backing object type of a data source has been changed (for example, from a TABLE
to a VIEW
) in Snowflake or Databricks Unity Catalog, Immuta will reapply existing policies on the data source. Note that because of policy limitations on Unity Catalog views, changing a Databricks Unity Catalog object type from a table to a view could result in some types of data policies being removed. See the Databricks Unity Catalog integration reference guide for a list of data policies that are not supported for views.
If Immuta finds a new column within a table, then Immuta adds that column to the data dictionary and tags it New
.
If Immuta finds a column has been deleted, then Immuta deletes that column from the data dictionary.
If Immuta finds a column type has changed, then Immuta updates the column type in the data dictionary and tags it New
.
Active policies that target the New
data source or column tag will be applied until a data owner validates the changes.
To run schema monitoring or column detection manually, see the Run schema monitoring and column detection jobs page.
The default schedule for schema monitoring to run is every 24 hours. Some organizations may need to schedule it to run more often; however, this needs careful consideration as it can impact performance and compute costs.
Manually trigger schema monitoring (filtered down to the database) after your dbt or other transform workflows run. For more information, see the dbt and transform workflow for limited policy downtime guide.
When manually triggering schema monitoring, specify a table or database for maximum performance efficiency and to reduce data or policy downtime. For more information on triggering schema monitoring, see the Manually run schema monitoring guide.
If you are manually managing data tags, activate the "New Column Added" global policy to protect newly found and potentially sensitive data. This policy sets all columns with the tag New
to NULL until a data owner reviews and validates their content. Using this workflow protects your data and avoids data leaks on new columns getting automatically added. This recommendation is unnecessary for users leveraging sensitive data discovery (SDD) or using an external data catalog.
Requirement: Immuta permission USER_ADMIN
You can manually run a schema monitoring job globally using the /dataSource/detectRemoteChanges
endpoint of the Immuta API with an empty payload.
You can manually run a schema monitoring job for all data sources that you own using the /dataSource/detectRemoteChanges
endpoint of the Immuta API with a payload containing the hostname for your data sources or their individual IDs.
You can manually run a schema monitoring job for data sources you are subscribed to using the /dataSource/detectRemoteChanges
endpoint of the Immuta API with a payload containing the hostname for your data source and the table name or data source ID.
Navigate to the data source overview page.
Click on the health check icon.
Scroll to Column Detection, and click Trigger Detection.
Immuta is a live metadata aggregator - metadata about your data and your users. With data metadata specifically, Immuta can monitor changes in your database and reflect those changes in your Immuta tenant through schema monitoring.
When schema monitoring is enabled, Immuta monitors your organization's servers to identify when new tables or columns are created or deleted, and automatically registers (or disables) those tables in Immuta. The newly updated data sources then have global policies and tags applied to them, and the Immuta data dictionary is updated with column changes.
Schema monitoring keeps Immuta in sync with your data environment, helping you remain compliant without having to manually update individual data sources.
Without schema monitoring, data owners have to manually add and remove Immuta data sources when users add or remove tables from databases in their data platforms. At worst, data owners are not aware of these changes; at best they are aware of the changes and have to manually update Immuta with those changes, which is a time-consuming, error-prone process.
Beyond draining data owners' time, manually updating data sources to reflect the state of the data platform also complicates the process: not only must they understand when a new table is present, but they then must remember to tag it and protect it appropriately. This leaves organizations ripe for data leaks as new data is created across the business, perhaps daily.
Schema monitoring, by contrast, is scalable and accounts for the evolution of your schemas and policies. Instead of manually managing access to these tables or adding and removing data sources, you are empowered to register a schema, create policies, and allow Immuta to manage those policies and changes to your schema for you to keep your data in sync and restrict access appropriately.
Both monitoring for new data and discovering and tagging sensitive data align with the concepts of scalability and evolvability, removing redundant and arduous work. Once tables are registered and tagged, policies can immediately be applied - this means humans can be completely removed from the process by creating tag-based policies that dynamically apply themselves to new tables.
Then, your business reaps the following benefits:
Increased revenue: Accelerate data access and time-to-data access because where sensitive data lives is well understood.
Decreased cost: Operate efficiently and move with agility at scale.
Decreased risk: Discover and protect sensitive data immediately.
Schema monitoring pairs with the following features:
Column detection: Column detection identifies when a column has been added to or removed from a table and adds or removes that column from the data source in Immuta.
New column added templated global policy: When paired with column detection or schema monitoring, this policy locks down access to those newly added columns and tables to prevent data leaks.
Sensitive data discovery: When the tables are discovered through the registration process, Immuta evaluates the table data for sensitive information and tags it as such. These tags are critical for scaling tag-based policies.
Global data and subscription policies: Global data and subscription policies can be created using tags so that they immediately enforce appropriate access restrictions on tables and columns when they are added.