Private preview: This feature is only available to select accounts.
Sensitive data discovery (SDD) runs identifiers to discover data. These identifiers are grouped into domains with data sources. Each identifier contains a single criteria and the tags that will be applied when the criteria's conditions have been met.
There are two types of identifiers in Immuta:
Reference identifiers: These identifiers are a library of the identifiers that can be added to domains. When added to a domain, reference identifiers are copied over and become domain-specific identifiers.
Immuta comes with built-in identifiers to discover common categories of data. These cannot be modified or deleted.
Data governors can create their own reference identifiers for use within your organization.
Domain-specific identifiers: These identifiers only exist within a specific domain and are checked against the data sources in that domain when SDD runs.
Users with the Manage Identifiers permission can create these identifiers or add them to a domain from a reference identifier.
If a domain-specific identifier was copied over from a reference identifier, there is no lineage and any edits to the reference identifier will not be reflected in the domain-specific copy.
Criteria are the conditions in an identifier that need to be met for resulting tags to be applied to data.
SDD only supports regular expressions (regex) written in RE2 syntax.
Competitive criteria analysis: This criteria is a process that will review all the regex and dictionary criteria within the identifiers of the domain and search for the identifier with the best fit. In this review, each competitive criteria analysis identifier in the domain competes against each other to find the best and most specific identifier that fits the data. The resulting tags for the best identifier are then applied to the column. Only one competitive criteria analysis identifier for each domain will apply per column. Competitive criteria identifiers, both built-in and custom, must match at least 90% of the data sampled. To learn more about the competitive nature, see the How competitive criteria analysis works guide.
Regex: This criteria contains a case-insensitive regular expression that searches for matches against column values.
Dictionary: This criteria contains a list of words and phrases to match against column values.
Column name: This criteria includes a case-insensitive regular expression matched against column names, not against the values in the column. The identifier's tags will be applied to the column where the name is found. Multiple column name identifiers can match a column and be applied.
Create a new identifier in the Immuta UI or with the sdd/classifier
endpoint.
If you used SDD prior to this feature release in January 2025, there are some differences:
There are now two types of identifiers:
Reference identifiers
Domain identifiers
See information about these in the Identifiers section.
There is a new permission to manage identifiers within domains: Manage Identifiers. The permission allows you to do the following:
Create an identifier within your domain
View the reference identifiers in Immuta
Add, edit, and delete identifiers within your domain
The following have been removed:
Identification frameworks: Previously, all identifiers had to be contained within a framework and that framework had to be assigned to a data source to run. Now, identifiers are added to domains with data sources.
Global framework: Previously, a global framework could be set to run SDD automatically on all new data sources. This behavior cannot be achieved with identifiers in domains.
See the table below for information on when SDD runs with the SDD feature before vs after with identifiers in domains.
SDD runs automatically on all new data sources
Yes, if a global framework is set
No
SDD runs automatically on new data sources found from schema monitoring
Yes, if a global framework is set
No
SDD runs automatically on new columns found from column detection in a data source where SDD has already run
Yes
Yes
SDD runs when a user manually triggers it from the data source health check menu
Yes
Yes
SDD runs when a user manually triggers it from the domain's page
No
Yes
SDD runs when a user manually triggers it from the identification framework page
Yes
No
SDD runs when a user manually triggers it through the API
Yes
Yes