Skip to content

Data Discovery Customization Details

Only application admins can enable sensitive data discovery (SDD) on the Immuta app settings page. Then, data source creators can disable SDD on a data-source-by-data-source basis. Additionally, governors, data source owners, and data source experts can disable any unwanted Discovered tags in the data dictionary to prevent them from being used and auto-tagged on that data source in the future.

Dry run

Users can also configure SDD to do a dryRun, which allows them to see what resulting tags would be applied to a data source without actually applying them. See the Run sensitive data discovery on data sources page for details.

Trigger an SDD job through the API or CLI

Native and non-native SDD can also be triggered through the Immuta CLI or through the API.

Framework overrides

Data governors can create frameworks with overrides. These overrides will be used instead of the specifications within rules.

  • tags is an optional override for the tags applied by the rules.
  • minConfidence is an optional override for the minConfidence established in the rules. When the match confidence is at least the percentage defined in minConfidence, tags are applied.
  • sampleSize is an optional override for how many records to sample from the data source.

Native SDD does not support minConfidence or sampleSize because they are optimized for each data source.

Sample size

SDD uses a sample of data to assess the likelihood that a column contains data that fits the pattern specified in the configured rules.

The default for non-native SDD is to sample 1000 records (the sample size) during this process. However, administrators can configure the sample size on the Immuta app settings page. In general, increasing the sample size increases the accuracy of SDD predictions, but decreasing the number of records sampled during SDD may be necessary to meet some organizations' compliance requirements. Sample size can also be configured as a framework override.

Native SDD's sample size is not configurable because it is optimized for each data source.

SDD workflow

Three common workflows for using SDD are outlined below. The first walks through the SDD customization that can be completed entirely within the UI. The second illustrates how to apply a single global framework to all data sources; while the third outlines how users can create and apply frameworks to data sources they own, both with the API.

Workflow 1: Adjust the global framework for all data sources

Data governor enables and disables rules within the default global framework.

Workflow 2: Apply a global framework to all data sources

  1. Data governor creates a framework using one or more rules.
  2. System administrator adds this framework as the global framework, so that it applies to all data sources.
  3. Users trigger SDD on data sources.

Workflow 3: Apply a framework to a specific data source

  1. Data governor creates one or more rules with patterns:
  2. Data owner creates a framework containing one or more rules.
  3. Data owner applies their framework to one or more data sources.
  4. Data owner triggers SDD on one or more data sources, and resulting tags are applied to columns where criteria were met and patterns were recognized.