Enable Sensitive Data Discovery (SDD)
Requirements:
Immuta permission
GOVERNANCE
Registered Snowflake, Databricks, Redshift, or Starburst (Trino) data sources
This how-to guide is for enabling sensitive data discovery (SDD) for the first time. For additional information on sensitive data discovery, see the Data discovery page.
Turn on SDD
Requirement: Immuta permission APPLICATION_ADMIN
Navigate to the App Settings page and scroll to the Sensitive Data Discovery section.
Select the Enable Sensitive Data Discovery (SDD) checkbox to enable SDD.
Click Save and then click Confirm to apply your changes. Note that the Immuta tenant will have a system restart.
Note that the global framework is not set by default, so SDD will not run automatically on any data sources. Set a global framework to have identification automatically run on all new data sources.
Configure the global framework
Requirement: Immuta permission APPLICATION_ADMIN
Navigate to the App Settings page and scroll to the Sensitive Data Discovery section.
Enter the request-friendly name of your global identification framework in the Global SDD Template Name field. This name can be found in the URL when you navigate to the identification framework's page.
Click Save, and then Confirm your changes.
Create a new framework with identifiers
Once SDD is enabled on your tenant, SDD will automatically run when new data sources are added, but it must be manually run for all existing data sources. This allows you to test out SDD with a select few data sources without worrying that it will add tags throughout all your data sources.
For this step, you will pick the identifiers to match the data that matters to your organization. For example, for international data, you may want to enable many different identifiers for many countries, like the "Australia Passport" identifier and the "Finland National ID Number" identifier. However, if you are dealing with United States domestic financial data, those identifiers would be irrelevant. In that case, it would be better to identify the data likely to appear, like Bitcoin or US Bank Routing MICR.
First, create an empty framework,
Navigate to Discover and Identification.
Select Create New.
Enter a Name and Description for your new identification framework.
Select Create empty framework.
Then, add a new identifier to that framework,
Navigate to Discover and Identifiers.
Use the checkboxes to select all the identifiers relevant to your data. Tip: From the overview page you can see the name and the tags that will be applied by the identifier. To better understand the data it will match, click the name to read the description.
Once you have checked the identifiers you want in your framework, click Add to Framework.
Type the framework name in the text box.
Click Add to Framework.
Run identification on your data sources
Once you have created a framework relevant to your data, it is time to test it on your data and customize it. Run identification on a select number of data sources where you understand the data to assess and adjust the tags to reflect what you expect to see.
Add those select data sources to your new framework,
Navigate to Discover and Identification.
Click your new framework name.
Navigate to the Data Sources tab.
Click Add Data Sources.
Check the checkboxes for the select data sources you want to try SDD on.
Click Add Data Source(s).
Then, run identification on those data sources,
Navigate to Discover and Identification.
Click the action menu for your new framework.
Click Run Identification.
View the identification results
After identification runs, you will receive a notification that the job is complete. Then, you can view the results from the data source dictionary.
Navigate to the data source overview page of the data source you added to the framework.
Click the Data Dictionary tab.
Assess whether the Discovered tags are applied as expected.
If you are happy with the Discovered tags, follow the Assign data sources to frameworks guide to add the rest of your data sources to the framework and follow the Run identification guide to run identification on all your data sources.
If you want additional tags, follow the Create an identifier guide to create identifiers that matter to your data.
Last updated