Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
This guide provides information and best practices for migrating from the deprecated legacy sensitive data discovery (SDD) option to the improved native SDD. This guide is for users who have already enabled SDD on their tenant and have Discovered tags on their data sources.
Legacy SDD is deprecated. It will be removed and replaced by native SDD. Native SDD is significantly improved from legacy SDD for discovering and tagging your data with upgrades to the built-in identifiers. Additionally, the greatest benefit is the respect for data residency. Native SDD doesn't move any of your data when running. The discovery is done right in your data platform, and the platform only returns the matching identifiers and column names to Immuta.
See the Sensitive data discovery reference page for more information on native SDD.
Native SDD requires Snowflake, Databricks, Starburst (Trino), or Redshift data sources
Legacy SDD enabled on your tenant
Legacy SDD tags applied to your data sources: To find out if you have legacy SDD tags applied, create a governance report as described in the understand the context of you tags section.
Contact your Immuta representative to enable native SDD on your Immuta tenant. Many users already have native SDD enabled, so proceed to understand the context of your tags if you want to self-service check if native SDD is already running and tagging your data before you reach out to the representative.
This action will not change anything immediately on your tenant; however, anytime identification runs in the future, it will be native SDD instead of the legacy version.
To assess native SDD for your data, proceed with the steps below. If you do not review native SDD, the legacy SDD tags will all remain on your data source columns. However, when identification automatically runs on new data sources and columns, it will apply native SDD tags, and because of the improvements to SDD, it may tag different data than legacy SDD.
Requirement: Immuta permission GOVERNANCE
Manually run identification globally to run native identification on your data sources.
To check the tags on an individual data source, navigate to the data source data dictionary and select a Discovered tag. On the tag side sheet, you can determine the context of the tag. When identifiers match data, native SDD will apply tags, and their tag context will be Sensitive Data Discovery
. Any tags with the context Legacy Sensitive Data Discovery
were not matched by native SDD but will remain on the data source.
To check your tags globally, navigate to the governance reports page and build a report for sensitive data discovery. This report will present the legacy tags on your data sources' columns and native SDD tags that are also on those columns. Use this report to assess the context of the Discovered tags and understand if native SDD is matching the data you want it to.
These actions will allow you to understand the differences between how native SDD and legacy SDD tag your data and whether your data is recognized as expected by native SDD or if legacy SDD was over-tagging your data. This way you can better tune SDD to your data.
If there are any legacy SDD tags that you want native SDD to catch, you need to tune native SDD so that this type of data is discovered in future tables and columns; see guidance on that in the next section.
Requirement: Immuta permission GOVERNANCE
Using the report you built above, complete these actions to tune SDD:
Focus on a legacy SDD tag properly applied to your data. Assess whether the native SDD tag on the column instead was applied more accurately than the legacy tag. If it is applied incorrectly, proceed to the next step.
Create a new regex or dictionary identifier in the framework to discover this data with the tag you want applied. Ensure it is specific and will match your data with at least 90% confidence (or match).
Complete the steps above for all legacy SDD tags.
Retest your updated identifiers by re-running identification on the select data sources and continue refining to the level of accuracy you want.
Completing the actions above will create parity between what legacy SDD was tagging your data and what native SDD will tag in the future.
Requirements:
Immuta permission GOVERNANCE
This how-to guide is for enabling sensitive data discovery (SDD) for the first time. For additional information on sensitive data discovery, see the Data discovery page.
Navigate to the App Settings page and scroll to the Sensitive Data Discovery section.
Select the Enable Sensitive Data Discovery (SDD) checkbox to enable SDD.
Click Save and then click Confirm to apply your changes. Note that the Immuta tenant will have a system restart.
Once SDD is enabled on your tenant, SDD will automatically run when new data sources are added, but it must be manually run for all existing data sources. This allows you to test out SDD with a select few data sources without worrying that it will add tags throughout all your data sources.
For this step, you will pick the identifiers to match the data that matters to your organization. For example, for international data, you may want to enable many different identifiers for many countries, like the "Australia Passport" identifier and the "Finland National ID Number" identifier. However, if you are dealing with United States domestic financial data, those identifiers would be irrelevant. In that case, it would be better to identify the data likely to appear, like Bitcoin or US Bank Routing MICR.
First, create an empty framework,
Navigate to Discover and Identification.
Select Create New.
Enter a Name and Description for your new identification framework.
Select Create empty framework.
Then, add a new identifier to that framework,
Navigate to Discover and Identifiers.
Use the checkboxes to select all the identifiers relevant to your data. Tip: From the overview page you can see the name and the tags that will be applied by the identifier. To better understand the data it will match, click the name to read the description.
Once you have checked the identifiers you want in your framework, click Add to Framework.
Type the framework name in the text box.
Click Add to Framework.
Once you have created a framework relevant to your data, it is time to test it on your data and customize it. Run identification on a select number of data sources where you understand the data to assess and adjust the tags to reflect what you expect to see.
Add those select data sources to your new framework,
Navigate to Discover and Identification.
Click your new framework name.
Navigate to the Data Sources tab.
Click Add Data Sources.
Check the checkboxes for the select data sources you want to try SDD on.
Click Add Data Source(s).
Then, run identification on those data sources,
Navigate to Discover and Identification.
Click the action menu for your new framework.
Click Run Identification.
After identification runs, you will receive a notification that the job is complete. Then, you can view the results from the data source dictionary.
Navigate to the data source overview page of the data source you added to the framework.
Click the Data Dictionary tab.
Assess whether the Discovered tags are applied as expected.
If you are happy with the Discovered tags, follow the Assign data sources to frameworks guide to add the rest of your data sources to the framework and follow the Run identification guide to run identification on all your data sources.
If you want additional tags, follow the Create an identifier guide to create identifiers that matter to your data.
Requirements:
Registered
Immuta permission GOVERNANCE
Identification (or sensitive data discovery (SDD)) runs automatically. If you want to re-run identification when a new global framework is set or when new identifiers have been added to a framework, you can or from the UI by following a how-to below.
Click the Discover icon and the Identification tab in the navigation menu.
Select the more actions icon.
Select Run Identification and then select it again in the modal.
Navigate to the data source overview page.
Click the health status.
Select Re-run next to Sensitive Data Discovery (SDD).
Verify discovered tags
If sensitive data discovery has been enabled, then manually adding tags to columns in the data dictionary will be unnecessary in most cases. The data owner will just need to verify that the Discovered tags are correct.
If a governor, data owner, or data source expert disables a Discovered tag from the data dictionary, the column will not be re-tagged next time identification (or SDD) runs. When a Discovered tag is disabled, it will not completely disappear, and it can be manually enabled through the tag side sheet.
To disable a discovered tag,
Navigate to a data source and click the Data Dictionary tab.
Scroll to the column you want to remove the tag from and click the tag you want to remove.
Click Disable in the side sheet and then click Confirm.
Requirements:
Immuta permission GOVERNANCE
Click the Discover icon in the navigation menu and select the Identification tab.
Click Create New.
Enter a Name and Description for the identification framework.
Select the option to Create empty framework.
Click Create.
After you create the identification framework, you can create new identifiers.
Click the Discover icon in the navigation menu and select the Identification tab.
Click Create New.
Enter a Name and Description for the identification framework.
Select the option to Create identifiers from an existing framework.
Select the checkbox for the framework you want to copy. You can only copy a single framework. For more information about a framework, click the framework name to open a new tab with details about the framework.
Click Create.
To add an identifier to a framework,
Click the Discover icon in the navigation menu and select the Identification tab.
Select the framework name for the identification framework you want to edit.
Click Add Identifier.
Choose in the dropdown to add an identifier from those already in Immuta or create a new identifier for the framework.
For existing identifiers: Opt to edit the tags. Then click Add Identifier.
For new identifiers:
Fill out a Name and Description.
Enter criteria: Select the Type of criteria.
For regex, enter a regex to be matched against column values. The default criteria encoding is case-sensitive. You can change this encoding using the regex criteria. The regex must use RE2.
For column name regex, enter a regex to be matched against column names. The default criteria encoding is not case-sensitive. You can change this encoding using the regex criteria. The regex must use RE2 syntax.
For a dictionary, enter the values in a comma-separated list to match against column values. Opt to toggle the Case insensitive switch to on if you want the dictionary to be case sensitive.
Select the tags to apply: Use the text box to search for a tag under the "Discovered" hierarchy or type a tag name to create a new tag under the "Discovered" hierarchy to apply to columns that match your identifier.
Click Next to review your new identifier and click Create Identifier to create it.
Only tags can be edited within a framework. Edits made to an identifier within a framework will only impact that specific identifier. To fully edit an identifier (including the name, description, or criteria) for all frameworks, use the Edit an identifier how-to guide.
To edit the tags applied by an identifier for a framework,
Click the Discover icon in the navigation menu and select the Identification tab.
Select the framework name for the identification framework you want to edit.
Click the more actions icon for an identifier and select Edit tags.
Remove the tags or type a tag name to add tags.
Click Save.
Click the Discover icon in the navigation menu and select the Identification tab.
Select the framework name for the identification framework you want to edit.
Click the more actions icon for an identifier and select Delete.
Click Delete again in the modal.
To assign a framework to run on specific data sources,
Click the Discover icon in the navigation menu and select the Identification tab.
Select the framework you want to assign and navigate to the Data Sources tab.
Click Add Data Sources.
Select the checkbox for the data source you want this framework to run on. You may select more than one.
Click Add Data Source(s).
After a data source is removed from a framework, it will use the global framework for any SDD scans and the tags applied by the removed framework will be replaced. The global framework is signified by the globe icon.
To remove data sources from a framework,
Click the Discover icon in the navigation menu and select the Identification tab.
Select the framework you want to remove data sources from and navigate to the Data Sources tab.
Select the checkbox for the data source you want to remove from the framework. You may select more than one.
Select Remove and click Remove again in the modal.
Requirement: No data sources assigned to the framework
To delete a framework,
Click the Discover icon in the navigation menu and select the Identification tab.
Click the more actions icon in the Action column for the framework you want to delete. The global framework cannot be deleted. If you want to delete it, configure a different framework as the global framework.
Select Delete and click Delete again in the modal.
Requirement: Immuta permission APPLICATION_ADMIN
Click the App Settings icon in the left sidebar.
Click Sensitive Data Discovery in the left panel to navigate to that section.
Enter the request-friendly name of your global identification framework in the Global SDD Template Name field. This name can be found in the URL when you navigate to the identification framework's page.
Click Save, and then Confirm your changes.
Requirements:
Immuta permission GOVERNANCE
Click the Discover icon in the navigation menu and select the Identifiers tab.
Click Create New.
Enter a Name and Description for the new identifier.
Enter criteria: Select the Type of criteria.
For regex, enter a regex to be matched against column values. The default criteria encoding is case-sensitive. You can change this encoding using the regex criteria. The regex must use RE2.
For column name regex, enter a regex to be matched against column names. The default criteria encoding is case-insensitive. You can change this encoding using the regex criteria. The regex must use RE2 syntax.
For a dictionary, enter the values in a comma-separated list to match against column values. Opt to toggle the Case insensitive switch to on if you want the dictionary to be case sensitive.
Select the tags to apply: Use the text box to search for a tag under the "Discovered" hierarchy or type a tag name to create a new tag under the "Discovered" hierarchy to apply to columns that match your identifier.
Click Next to review your new identifier and click Create Identifier to create it.
See the Manage identification frameworks page to add your new identifier to a framework.
Note that all user-created identifiers must be a 90% match or greater for the contents of the column to be tagged.
Editing the details or criteria of an identifier from the identifiers menu will affect any framework with that identifier throughout Immuta. Editing the tags will only affect new frameworks the identifier is added to.
To edit an identifier,
Click the Discover icon in the navigation menu and select the Identifiers tab.
Click the name of the identifier you want to edit.
Click Edit.
Edit the field you want to change.
Click Save.
Built-in identifiers cannot be edited.
Deleting an identifier will remove it from all the frameworks it is in throughout Immuta.
To delete an identifier,
Click the Discover icon in the navigation menu and select the Identifiers tab.
Click the more actions icon in the Action column for the identifier you want to delete.
Select Delete and click Delete again in the modal.
Built-in identifiers cannot be deleted.