Data Discovery Customization Details
Only application admins can enable sensitive data discovery (SDD) on the Immuta app settings page. Then, data source creators can disable SDD on a data-source-by-data-source basis. Additionally, governors, data source owners, and data source experts can disable any unwanted Discovered tags in the data dictionary to prevent them from being used and auto-tagged on that data source in the future.
Users can also configure SDD to do a
dryRun, which allows them to see what resulting tags would be applied to a
data source without actually applying them. See the
Run sensitive data discovery on data sources page
Trigger an SDD job through the API or CLI
Data governors can create frameworks with overrides. These overrides will be used instead of the specifications within rules.
tagsis an optional override for the tags applied by the rules.
minConfidenceis an optional override for the
minConfidenceestablished in the rules. When the match confidence is at least the percentage defined in
minConfidence, tags are applied.
sampleSizeis an optional override for how many records to sample from the data source.
Native SDD does not support
sampleSize because they are optimized for each data source.
SDD uses a sample of data to assess the likelihood that a column contains data that fits the pattern specified in the configured rules.
The default for non-native SDD is to sample 1000 records (the sample size) during this process. However, administrators can configure the sample size on the Immuta app settings page. In general, increasing the sample size increases the accuracy of SDD predictions, but decreasing the number of records sampled during SDD may be necessary to meet some organizations' compliance requirements. Sample size can also be configured as a framework override.
Native SDD's sample size is not configurable because it is optimized for each data source.
Three common workflows for using SDD are outlined below. The first walks through the SDD customization that can be completed entirely within the UI. The second illustrates how to apply a single global framework to all data sources; while the third outlines how users can create and apply frameworks to data sources they own, both with the API.
Workflow 1: Adjust the global framework for all data sources
Workflow 2: Apply a global framework to all data sources
- Data governor creates a framework using one or more rules.
- System administrator adds this framework as the global framework, so that it applies to all data sources.
- Users trigger SDD on data sources.
Workflow 3: Apply a framework to a specific data source
- Data governor creates one or more rules with patterns:
- Data owner creates a framework containing one or more rules.
- Data owner applies their framework to one or more data sources.
- Data owner triggers SDD on one or more data sources, and resulting tags are applied to columns where criteria were met and patterns were recognized.