SDD Pre-Configuration Details
Last updated
Last updated
Only application admins can on the Immuta app settings page. Then, data source creators can disable SDD on a data-source-by-data-source basis. Additionally, governors, data source owners, and data source experts can disable any unwanted Discovered tags in the data dictionary to prevent them from being used and auto-tagged on that data source in the future.
When SDD is triggered on a data source, the job is run for the identifiers within the set template. If a template is not set, the identifier and template within the SDD job are defined by the global setting. By default, the global setting will run for all identifiers in the system. However, a system administrator can instead.
An active global template cannot be deleted.
SDD uses a sample of data to assess the likelihood that a column contains data that fits the pattern specified in the configured identifiers.
The default for SDD is to sample 1000 records (the sample size) during this process. However, administrators can taken by SDD on the Immuta app settings page. In general, increasing the sample size increases the accuracy of SDD predictions, but decreasing the number of records sampled during SDD may be necessary to meet some organizations' compliance requirements.
When SDD is triggered by a data owner, all column tags that were previously applied by SDD are removed and the tags prescribed by the latest run are applied. However, if SDD is triggered because a new column is detected by schema monitoring, tags will only be applied to the new column, and no tags will be modified on existing columns.
Users can also configure SDD to do a dryRun
, which allows them to see what tags would be applied to a data source without actually applying them. See the for details.
Two common workflows for using SDD are outlined below. The first illustrates how to apply a single global template to all data sources, while the second outlines how users can create and apply templates to data sources they own.
Data governor creates one or more custom identifiers:
Data governor using one or more built-in or custom identifiers.
.
.
.
.
.