Sensitive Data Discovery Pre-Configuration Details
Only application admins can enable sensitive data discovery (SDD) on the Immuta app settings page. Then, data source creators can disable SDD on a data-source-by-data-source basis. Additionally, governors, data source owners, and data source experts can disable any unwanted Discovered tags in the data dictionary to prevent them from being used and auto-tagged on that data source in the future.
Configurable global settings
When SDD is triggered on a data source, the job is run for the identifiers within the set template. If a template is not set, the identifier and template within the SDD job are defined by the global setting. By default, the global setting will run for all identifiers in the system. However, a system administrator can configure Immuta to use a custom global template instead.
An active global template cannot be deleted.
SDD uses a sample of data to assess the likelihood that a column contains data that fits the pattern specified in the configured identifiers.
The default for SDD is to sample 1000 records (the sample size) during this process. However, administrators can configure the sample size taken by SDD on the Immuta app settings page. In general, increasing the sample size increases the accuracy of SDD predictions, but decreasing the number of records sampled during SDD may be necessary to meet some organizations' compliance requirements.
When SDD is triggered by a data owner, all column tags that were previously applied by SDD are removed and the tags prescribed by the latest run are applied. However, if SDD is triggered because a new column is detected by schema monitoring, tags will only be applied to the new column, and no tags will be modified on existing columns.
Users can also configure SDD to do a
dryRun, which allows them to see what tags would be applied to a data source
without actually applying them. See the
Run sensitive data discovery on data sources page
Two common workflows for using SDD are outlined below. The first illustrates how to apply a single global template to all data sources, while the second outlines how users can create and apply templates to data sources they own.
Workflow 1: Apply a global template to all data sources
- Data governor creates a template using one or more built-in or custom identifiers.
- System administrator adds this template to the global settings so that it applies to all data sources.
- Users trigger SDD on data sources.
Workflow 2: Apply a template to a specific data source
- Data governor creates one or more custom identifiers:
- Data owner creates a template containing one or more identifiers.
- Data owner applies their template to one or more data sources.
- Data owner triggers SDD on one or more data sources, and tags are applied to columns where identifiers were recognized.