1 of 11

Sensitive Data Discovery

Deprecation notice

Support for this feature has been deprecated.

Sensitive data discovery (SDD) is an Immuta feature that uses sensitive data patterns to determine what type of data your column represents. Using identification rules and data samples from your tables, Immuta matches your data and can assign the appropriate tags to your data dictionary. This saves the time of identifying your data manually and provides the benefit of a standard taxonomy across all your data sources in Immuta.

Architecture

SDD works by looking at a sample of data from each table that it checks against templates compiled of built-in or customized identifiers. If an identifier's pattern is matched with a column of the sampled data with an appropriate amount of confidence, then the corresponding tag is applied to that column, signifying the data it contains.

SDD queries a small sample of data for each data source in Immuta. This sample is temporarily held in memory to check for identifier matches. Then Immuta applies the relevant tags to those columns where matches were found.

This sampling and tagging process will happen anytime SDD is run. SDD can be triggered through the , through , or in the Immuta UI on the data sources overview page. SDD will also run automatically anytime one of the following events occurs:

A new data source is created.
Schema detection is enabled and a new data source is detected.
Column detection is enabled and new columns are detected. Here, SDD will only run on new columns and no existing tags will be removed or changed.

Components

Sensitive data discovery (SDD) comprises two major elements: and .

Identifier

The identifier is the basic building block of SDD. Each identifier in Immuta is a unique pattern (e.g., a regex or a list of values) and a list of tags to apply to data that matches the pattern. When Immuta recognizes that pattern, it can understand the type of data and tag the data to describe the type. For example, Immuta has the built-in identifier US_SOCIAL_SECURITY_NUMBER. Immuta will use a regex to look for strings of exactly nine digits, with or without hyphens after the third and fifth digits, with a leading digit between 0 and 8. SDD then scores columns by the percentage of values that match the pattern defined. This score determines whether or not the configured tags will be applied to a column. Once it finds a column that fits the expected pattern of US_SOCIAL_SECURITY_NUMBER with a reasonable match score, it will know how to tag it.

There are two types of identifiers:

Custom identifier: Custom identifiers allow data governors to create their own regular expressions, dictionaries, and tags that SDD will use to discover and tag data.

Supported identifier types

The three types of identifiers are described below:

Templates

A template is a collection of identifiers and settings that drive the configuration of SDD runs. The settings users can apply through templates include the following:

classifiers (identifiers) are applied to data sources in the SDD run.
tags is an optional override for the tags applied by the identifiers.
minConfidence is an optional override for the minConfidence established in the identifier(s). When the detection confidence is at least the percentage defined in minConfidence, tags are applied.
sampleSize is an optional override for how many records to sample from the data source.

Considerations

SDD does not run on data sources with over 1600 columns.
Deleting the built-in Discovered tags is not recommended: If you do delete built-in Discovered tags and use SDD, when the identifier is detected, the column will not be tagged. Tags can be disabled on a column-by-column basis from the data dictionary, or SDD can be turned off on a data-source-by-data-source basis when creating a data source.

Configure and customize SDD

SDD Pre-Configuration Details

Only application admins can enable sensitive data discovery (SDD) on the Immuta app settings page. Then, data source creators can disable SDD on a data-source-by-data-source basis. Additionally, governors, data source owners, and data source experts can disable any unwanted Discovered tags in the data dictionary to prevent them from being used and auto-tagged on that data source in the future.

Configurable global settings

Global template

When SDD is triggered on a data source, the job is run for the identifiers within the set template. If a template is not set, the identifier and template within the SDD job are defined by the global setting. By default, the global setting will run for all identifiers in the system. However, a system administrator can configure Immuta to use a custom global template instead.

An active global template cannot be deleted.

Sample size

SDD uses a sample of data to assess the likelihood that a column contains data that fits the pattern specified in the configured identifiers.

The default for SDD is to sample 1000 records (the sample size) during this process. However, administrators can configure the sample size taken by SDD on the Immuta app settings page. In general, increasing the sample size increases the accuracy of SDD predictions, but decreasing the number of records sampled during SDD may be necessary to meet some organizations' compliance requirements.

Tag mutability

When SDD is triggered by a data owner, all column tags that were previously applied by SDD are removed and the tags prescribed by the latest run are applied. However, if SDD is triggered because a new column is detected by schema monitoring, tags will only be applied to the new column, and no tags will be modified on existing columns.

Dry run

Users can also configure SDD to do a dryRun, which allows them to see what tags would be applied to a data source without actually applying them. See the Run sensitive data discovery on data sources page for details.

SDD workflow

Two common workflows for using SDD are outlined below. The first illustrates how to apply a single global template to all data sources, while the second outlines how users can create and apply templates to data sources they own.

Workflow 1: Apply a global template to all data sources

Data governor creates a template using one or more built-in or custom identifiers.
System administrator adds this template to the global settings so that it applies to all data sources.
Users trigger SDD on data sources.

Workflow 2: Apply a template to a specific data source

Data governor creates one or more custom identifiers:
Data owner creates a template containing one or more identifiers.
Data owner applies their template to one or more data sources.
Data owner triggers SDD on one or more data sources, and tags are applied to columns where identifiers were recognized.

Customize Sensitive Data Discovery

Enable and Manage Global SDD Settings

Enable Sensitive Data Discovery

Click the App Settings icon in the left sidebar.
Click Sensitive Data Discovery in the left panel to navigate to that section.
Select the checkbox to enable SDD, and then click Save and Confirm to apply your changes.

Configure a Global Template

Click the App Settings icon in the left sidebar.
Click Sensitive Data Discovery in the left panel to navigate to that section.
Enter the name of your global template in the Global SDD Template Name field.
Click Save, and then Confirm your changes.

Configure a Default Sample Size

When a sample size is not specified in a template, SDD will use the default sample size of 1000 records. To adjust the sample size,

Click the App Settings icon in the left sidebar.
Click Sensitive Data Discovery in the left panel to navigate to that section.
Enter the number of rows in a data source you would like sampled when running SDD in the Default SDD Sample Size field.
Click Save, and then Confirm your changes.

Run Sensitive Data Discovery on Data Sources

In previous documentation, identifier is referred to as classifier. The language is being updated to identifier to be more accurate and not conflate meaning with the Immuta data classification and frameworks feature.

Attributes overview

Attributes of all custom identifiers and templates are provided on the Sensitive data discovery API page. However, attributes specific to this section are outlined below.

Attribute

Description

Run SDD on data sources

Specify the data sources you would like to run SDD on, and save the payload in a .json file.
```
{
  "sources": [
    "Insurance Data"
  ]
}
```
Or choose to run SDD on all the data sources in Immuta, and save the payload in a .json file.
```
{
  "all": true
}
```
Trigger SDD using one of these methods:

Immuta CLI

immuta api sdd/run -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/run

If sensitive data discovery was successfully run, you will receive a response similar to this:

{
  "Insurance Data": {
    "id": "d2edc1d0-328c-11ec-9d5a-6793988ccf95",
    "state": "completed",
    "output": {
      "diff": {
        "addedTags": {
          "ssn": [
            "Discovered.PII"
          ],
          "email": [
            "Discovered.PII"
          ]
        },
        "removedTags": {
          "ssn": [
            "Discovered.Country.US"
          ]
        }
      },
      "sddTagResult": {
        "ssn": [
          "Discovered.Entity.Social Security Number",
          "Discovered.Identifier Direct",
          "Discovered.PHI",
          "Discovered.PII"
        ],
        "email": [
          "Discovered.Entity.Electronic Mail Address",
          "Discovered.Identifier Direct",
          "Discovered.PHI",
          "Discovered.PII"
        ]
      }
    }
  }
}

Additional tutorials

Test SDD on a data source

Users can test how SDD will apply tags to their data sources by completing a dryRun, which allows users to test templates and tags:

test templates: If a template is specified in the payload when the dryRun is true, SDD will use this template instead of the template applied to the data source. Note: SDD will error if a template is specified here when dryRun is false.
test tags: Instead of applying tags, SDD just returns the tags that would be applied to the data source. This allows users to evaluate whether or not identifiers or templates are applying tags correctly without updating the data source.

After evaluating whether or not the tags have been applied appropriately, users can then make necessary changes to a template before triggering SDD again.

To complete a dryRun,

Specify the data sources you would like to run sensitive data discovery on and set dryRun to true in the payload in a .json file. Note: You can also apply a template to a data source as a dryRun, like in the example below. However, when dryRun is false, a template cannot be included in the payload. Instead, the template must be added to the data source before running SDD.

{
  "sources": [
    "Medical Claims"
  ],
  "dryRun": true,
  "template": "PII_REVISION"
}

Trigger SDD using one of these methods:

Immuta CLI

immuta api sdd/run -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/run

You will receive a response that illustrates tags that will be added, tags that will be removed, and the final SDD result:

{
  "Medical Claims": {
    "id": "86fc4f70-380f-11ec-a432-81748c911385",
    "state": "completed",
    "output": {
      "diff": {
        "addedTags": {},
        "removedTags": {
          "dob": [
            "Discovered.Entity.Date",
            "Discovered.Entity.Date of Birth",
            "Discovered.Identifier Indirect",
            "Discovered.PHI",
            "Discovered.PII"
          ],
          "ssn": [
            "Discovered.Country.US",
            "Discovered.Entity.Social Security Number",
            "Discovered.Identifier Direct",
            "Discovered.PHI"
          ],
          "state": [
            "Discovered.Country.US",
            "Discovered.Entity.Location",
            "Discovered.Entity.State",
            "Discovered.Identifier Indirect"
          ],
          "gender": [
            "Discovered.Entity.Gender",
            "Discovered.Identifier Indirect",
            "Discovered.PHI",
            "Discovered.PII"
          ],
          "date_of_service": [
            "Discovered.Entity.Date",
            "Discovered.Identifier Indirect",
            "Discovered.PHI",
            "Discovered.PII"
          ]
        }
      },
      "sddTagResult": {
        "ssn": [
          "Discovered.PII"
        ]
      }
    }
  }
}

Once you are satisfied with how tags are applied by SDD, set dryRun to false (or omit it from the payload).

{
  "sources": [
    "Medical Claims"
  ],
  "dryRun": false
}

Trigger SDD again:

Immuta CLI

immuta api sdd/run -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/run

If the request was successful, you will receive a response similar to this one:

{
  "Medical Claims": {
    "id": "2afcfe00-3813-11ec-b171-9331e3d3aa04",
    "state": "completed",
    "output": {
      "diff": {
        "addedTags": {},
        "removedTags": {
          "dob": [
            "Discovered.Entity.Date",
            "Discovered.Entity.Date of Birth",
            "Discovered.Identifier Indirect",
            "Discovered.PHI",
            "Discovered.PII"
          ],
          "ssn": [
            "Discovered.Country.US",
            "Discovered.Entity.Social Security Number",
            "Discovered.Identifier Direct",
            "Discovered.PHI"
          ],
          "state": [
            "Discovered.Country.US",
            "Discovered.Entity.Location",
            "Discovered.Entity.State",
            "Discovered.Identifier Indirect"
          ],
          "gender": [
            "Discovered.Entity.Gender",
            "Discovered.Identifier Indirect",
            "Discovered.PHI",
            "Discovered.PII"
          ],
          "date_of_service": [
            "Discovered.Entity.Date",
            "Discovered.Identifier Indirect",
            "Discovered.PHI",
            "Discovered.PII"
          ]
        }
      },
      "sddTagResult": {
        "ssn": [
          "Discovered.PII"
        ]
      }
    }
  }
}

Trigger SDD in the Immuta UI

Select a data source from your My Data Sources page.
Click the Health Check dropdown menu.
In the Sensitive Data Discovery (SDD) section, click Re-run.

What's next

Continue to one of the following tutorials:

Run sensitive data discovery on data sources: Trigger SDD to run on specified data sources.
Create a template: Although only data governors can create identifiers, data owners can add identifiers to templates, which they then apply to their data sources to override minConfidence or tags for identifiers within the template.
Create a custom identifier: Data governors can create custom identifiers to define their own regular expressions, dictionaries, and tags that SDD will use to discover and tag data.

Create and Apply a Template to a Data Source

Create a template

Generate your API key on the and save the API key somewhere secure. You will include this API key in the authorization header when you make a request to the Immuta API.
Find identifiers to include in your template using one of these methods:

Immuta CLI

HTTP API

If the request was successful, you will receive a list of available identifiers.

Save the template payload in a .json file. Use the tabs below to see different examples of templates.

Create the template:

Immuta CLI

HTTP API

If the request is successful, you will receive a response that contains details about the template. Use the tabs below to see different responses for different templates.

After the template is applied to data sources and sensitive data discovery is run, the Discovered.account-number tag will be applied to columns that Immuta identifies with 50% confidence, as configured in the identifier.

After the template is applied to data sources and sensitive data discovery is run, the Discovered.desk-location tag will be applied to columns when Immuta detects the values Research Lab, Blue Room or Purple Room with 60% confidence, as configured in the identifier.

After the template is applied to data sources and sensitive data discovery is run, the Discovered.residence-hall tag will be applied to columns when Immuta detects values that match those listed in the Residence Halls data source with 70% confidence, as configured in the identifier.

Apply a template to data sources

Find templates to apply to your data sources:

Immuta CLI

HTTP API

If the request was successful, you will receive a list of available templates.

Select an appropriate template to apply to your data sources, and save the payload in a .json file:

Apply the template to your data source(s):

Immuta CLI

HTTP API

You will receive a response that indicates whether or not the template was successfully applied to your data sources.

Additional tutorials

Clone a template

Users cannot modify templates created by other data owners, but they can clone templates and make changes to the clone.

Get a list of templates to determine the template you want to clone using one of these methods:

Immuta CLI

HTTP API

Save the template clone name and details in a .json file.

Clone the template:

Immuta CLI

HTTP API

If the request was successful, you will receive a response that provides details about the template clone.

You can now modify the template, such as changing the identifiers (classifiers) included and the sampleSize.

Configure entity tags and confidence

To disable entity tags from being set, you can create a template to that configures the identifier that contains that tag.

For example, the built-in PERSON_NAME identifier contains the following tags: Discovered.PHI, Discovered.PII, Discovered.Entity.Person Name, and Discovered.Identifier Indirect. However, your organization doesn't have any health data, so you don't want the PHI tag to be applied to your data sources but you do want all the other tags within that identifier.

To override the Discovered.PHI tag, you would create a template that includes the PERSON_NAME identifier and removes the Discovered.PHI from the list of tags in the template payload.

View the details about the PERSON_NAME identifier so you know what to include in your template using one of these methods:

Immuta CLI

HTTP API

If the request was successful, the response will include details about the PERSON_NAME identifier.

Remove the Discovered.PHI tag from the list of tags in the identifier config, and save the template payload in a .json file.

Create the template:

Immuta CLI

HTTP API

If the request is successful, you will receive a response that details the new template:

What's next

Now that you've created a template, continue to one of the following tutorials:

Create a Custom Identifier

Create a Regex Identifier

Use case: Custom regex identifier

Scenario: You've listed Immuta's built-in identifiers for sensitive data discovery, but you discover there is no identifier that can automatically identify and tag columns that contain account numbers in your database.

A regular expression (regex) custom identifier allows you to create your own rules that enable Immuta's sensitive data discovery to find matches based on a regex pattern. For example, if a table contains account numbers in the form of xxxxxxxxx-xxx-x, you could define a regex pattern in a custom identifier to identify and tag these columns. The tutorial below uses this scenario to illustrate creating this identifier.

Attributes of the custom regex identifier

Attributes of all custom identifiers are provided on the Sensitive data discovery API page. However, attributes specific to the custom regex identifier are outlined in the table below.

Attribute

Description

Required

Create a custom regex identifier

Generate your API key on the API Keys tab on your profile page and save the API key somewhere secure. You will include this API key in the authorization header when you make a request to the Immuta API or use it to configure your instance with the Immuta CLI.

Save the custom regex identifier payload in a .json file.

{
  "name": "ACCOUNT_NUMBER_IDENTIFIER",
  "displayName": "Account Number Identifier",
  "description": "This identifier recognizes account numbers using a regex",
  "type": "regex",
  "config": {
    "regex": "^[0-9]{9}-[0-9]{3}-[0-9]{1}$",
    "minConfidence": 0.5,
    "tags": ["Discovered.account-number"]
  }
}

Create the identifier using one of these methods:

Immuta CLI

immuta api sdd/classifier -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/classifier

If the request is successful, you will receive a response that contains details about the identifier.

{
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "name": "ACCOUNT_NUMBER_IDENTIFIER",
  "displayName": "Account Number Identifier",
  "description": "This identifier recognizes account numbers using a regex",
  "type": "regex",
  "config": {
    "tags": [
      "Discovered.account-number"
    ],
    "regex": "[0-9]{9}-[0-9]{3}-[0-9]{1}",
    "minConfidence": 0.5
  },
  "id": 1,
  "createdAt": "2021-10-14T18:48:56.289Z",
  "updatedAt": "2021-10-14T18:48:56.289Z"
}

What's next

Continue to one of the following tutorials:

Run sensitive data discovery on data sources: Trigger SDD to run on specified data sources.
Create a template: Although only data governors can create identifiers, data owners can add identifiers to templates, which they then apply to their data sources to override minConfidence or tags for identifiers within the template.

Create a Column Name Regex Identifier

Use case: Custom column name regex identifier

Scenario: You've for sensitive data discovery, but you discover there is no identifier that can automatically detect and tag columns that contain account numbers in your database.

A custom column name regular expression (regex) identifier allows you to create your own detectors that enable Immuta's sensitive data discovery to find column name matches based on a regex pattern. For example, if your database contains tables with social security numbers, you could define a regex pattern to match against the names of the column instead of the values within the column. The tutorial below uses this scenario to illustrate creating this identifier.

Attributes of the custom column name regex identifier

Attributes of all custom identifiers are provided on the . However, attributes specific to the custom column name regex identifier are outlined in the table below.

Attribute

Description

Required

Create a custom column name regex identifier

Generate your API key on the and save the API key somewhere secure. You will include this API key in the authorization header when you make a request to the Immuta API or use it to .
Save the custom column name regex identifier payload in a .json file. The regex ^ssn|social ?security$ looks for column names that match ssn, socialsecurity, or social security.
Create the identifier using one of these methods:

Immuta CLI

HTTP API

If the request is successful, you will receive a response that contains details about the identifier.

What's Next

Continue to one of the following tutorials:

Create a Dictionary Identifier

Use case: Custom dictionary identifier

Scenario: You have data that includes the names of the rooms employees' desks are in across your organization. Although these locations may be considered sensitive in particular datasets, they would not be recognized by Immuta's .

A custom dictionary identifier allows you to create your own rules that enable Immuta's sensitive data discovery to match a list of room names to values in the dataset. The tutorial below uses this scenario to illustrate creating this identifier.

Attributes of the custom dictionary identifier

Attributes of all custom identifiers are provided on the . However, attributes specific to the custom dictionary identifier are outlined in the table below.

Attribute

Description

Create a custom dictionary identifier

Generate your API key on the and save the API key somewhere secure. You will include this API key in the authorization header when you make a request to the Immuta API or use it to .
Save the custom dictionary identifier payload in a .json file. The dictionary below contains the words Research Lab, Blue Room, and Purple Room.
Create the identifier using one of these methods:

Immuta CLI

HTTP API

If the request is successful, you will receive a response that contains details about the identifier.

What's next

Continue to one of the following tutorials:

List Built-In Identifier

Attributes

Attributes of identifiers and templates are provided on the Sensitive data discovery API page. However, attributes specific to listing identifiers are outlined in the table below.

Attribute

Description

Response details

The response lists all built-in identifiers that are currently supported in Immuta SDD and their details, including their name and description. For example,

{
  "count": 67,
  "hits": [
    {
      "createdBy": {
        "id": 21,
        "name": "Immuta System Account",
        "email": "immuta_system@immuta.com"
      },
      "name": "AGE",
      "displayName": "Age",
      "description": "Detects numeric strings between 10 and 199, provided the column header contains text such as `age`, `year`, `years`, `yr`, or `yrs`.",
      "type": "builtIn",
      "config": {
        "minConfidence": 0.7,
        "tags": [
          "Discovered.PII",
          "Discovered.Identifier Indirect",
          "Discovered.PHI",
          "Discovered.Entity.Age"
        ],
        "conditionalTags": {}
      },
      "id": 3,
      "createdAt": "2021-10-28T07:34:58.761Z",
      "updatedAt": "2021-10-28T07:34:58.761Z"
    }
  ]
}

List built-in identifiers

Generate your API key on the API Keys tab on your profile page and save the API key somewhere secure. You will include this API key in the authorization header when you make a request to the Immuta API or use it to configure your instance with the Immuta CLI.
List built-in identifiers using one of these methods:

Immuta CLI

immuta api sdd/classifier?sortField=name&sortOrder=asc&limit=100&type=builtIn

HTTP API

curl \
    --request GET \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    https://your-immuta-url.immuta.com/sdd/classifier?sortField=name&sortOrder=asc&limit=100&type=builtIn

If the request was successful, you will receive a list of built-in identifiers.

What's next

Run sensitive data discovery on data sources: Trigger SDD to run on specified data sources.
Create a template: Although only data governors can create identifiers, data owners can add identifiers to templates, which they then apply to their data sources to override minConfidence or tags for identifiers within the template.
Create a custom identifier: Data governors can create custom identifiers to define their own regular expressions, dictionaries, and tags that SDD will use to discover and tag data.

Run Sensitive Data Discovery on Data Sources

Attributes overview

Attributes of all custom identifiers and templates are provided on the Sensitive data discovery API page. However, attributes specific to this section are outlined below.

Attribute

Description

Run SDD on data sources

Specify the data sources you would like to run SDD on, and save the payload in a .json file.
```
{
  "sources": [
    "Insurance Data"
  ]
}
```
Or choose to run SDD on all the data sources in Immuta, and save the payload in a .json file.
```
{
  "all": true
}
```
Trigger SDD using one of these methods:

Immuta CLI

immuta api sdd/run -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/run

If sensitive data discovery was successfully run, you will receive a response similar to this:

{
  "Insurance Data": {
    "id": "d2edc1d0-328c-11ec-9d5a-6793988ccf95",
    "state": "completed",
    "output": {
      "diff": {
        "addedTags": {
          "ssn": [
            "Discovered.PII"
          ],
          "email": [
            "Discovered.PII"
          ]
        },
        "removedTags": {
          "ssn": [
            "Discovered.Country.US"
          ]
        }
      },
      "sddTagResult": {
        "ssn": [
          "Discovered.Entity.Social Security Number",
          "Discovered.Identifier Direct",
          "Discovered.PHI",
          "Discovered.PII"
        ],
        "email": [
          "Discovered.Entity.Electronic Mail Address",
          "Discovered.Identifier Direct",
          "Discovered.PHI",
          "Discovered.PII"
        ]
      }
    }
  }
}

Additional tutorials

Test SDD on a data source

Users can test how SDD will apply tags to their data sources by completing a dryRun, which allows users to test templates and tags:

test templates: If a template is specified in the payload when the dryRun is true, SDD will use this template instead of the template applied to the data source. Note: SDD will error if a template is specified here when dryRun is false.
test tags: Instead of applying tags, SDD just returns the tags that would be applied to the data source. This allows users to evaluate whether or not identifiers or templates are applying tags correctly without updating the data source.

After evaluating whether or not the tags have been applied appropriately, users can then make necessary changes to a template before triggering SDD again.

To complete a dryRun,

Specify the data sources you would like to run sensitive data discovery on and set dryRun to true in the payload in a .json file. Note: You can also apply a template to a data source as a dryRun, like in the example below. However, when dryRun is false, a template cannot be included in the payload. Instead, the template must be added to the data source before running SDD.

{
  "sources": [
    "Medical Claims"
  ],
  "dryRun": true,
  "template": "PII_REVISION"
}

Trigger SDD using one of these methods:

Immuta CLI

immuta api sdd/run -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/run

You will receive a response that illustrates tags that will be added, tags that will be removed, and the final SDD result:

{
  "Medical Claims": {
    "id": "86fc4f70-380f-11ec-a432-81748c911385",
    "state": "completed",
    "output": {
      "diff": {
        "addedTags": {},
        "removedTags": {
          "dob": [
            "Discovered.Entity.Date",
            "Discovered.Entity.Date of Birth",
            "Discovered.Identifier Indirect",
            "Discovered.PHI",
            "Discovered.PII"
          ],
          "ssn": [
            "Discovered.Country.US",
            "Discovered.Entity.Social Security Number",
            "Discovered.Identifier Direct",
            "Discovered.PHI"
          ],
          "state": [
            "Discovered.Country.US",
            "Discovered.Entity.Location",
            "Discovered.Entity.State",
            "Discovered.Identifier Indirect"
          ],
          "gender": [
            "Discovered.Entity.Gender",
            "Discovered.Identifier Indirect",
            "Discovered.PHI",
            "Discovered.PII"
          ],
          "date_of_service": [
            "Discovered.Entity.Date",
            "Discovered.Identifier Indirect",
            "Discovered.PHI",
            "Discovered.PII"
          ]
        }
      },
      "sddTagResult": {
        "ssn": [
          "Discovered.PII"
        ]
      }
    }
  }
}

Once you are satisfied with how tags are applied by SDD, set dryRun to false (or omit it from the payload).

{
  "sources": [
    "Medical Claims"
  ],
  "dryRun": false
}

Trigger SDD again:

Immuta CLI

immuta api sdd/run -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/run

If the request was successful, you will receive a response similar to this one:

{
  "Medical Claims": {
    "id": "2afcfe00-3813-11ec-b171-9331e3d3aa04",
    "state": "completed",
    "output": {
      "diff": {
        "addedTags": {},
        "removedTags": {
          "dob": [
            "Discovered.Entity.Date",
            "Discovered.Entity.Date of Birth",
            "Discovered.Identifier Indirect",
            "Discovered.PHI",
            "Discovered.PII"
          ],
          "ssn": [
            "Discovered.Country.US",
            "Discovered.Entity.Social Security Number",
            "Discovered.Identifier Direct",
            "Discovered.PHI"
          ],
          "state": [
            "Discovered.Country.US",
            "Discovered.Entity.Location",
            "Discovered.Entity.State",
            "Discovered.Identifier Indirect"
          ],
          "gender": [
            "Discovered.Entity.Gender",
            "Discovered.Identifier Indirect",
            "Discovered.PHI",
            "Discovered.PII"
          ],
          "date_of_service": [
            "Discovered.Entity.Date",
            "Discovered.Identifier Indirect",
            "Discovered.PHI",
            "Discovered.PII"
          ]
        }
      },
      "sddTagResult": {
        "ssn": [
          "Discovered.PII"
        ]
      }
    }
  }
}

Trigger SDD in the Immuta UI

Select a data source from your My Data Sources page.
Click the Health Check dropdown menu.
In the Sensitive Data Discovery (SDD) section, click Re-run.

What's next

Continue to one of the following tutorials:

Run sensitive data discovery on data sources: Trigger SDD to run on specified data sources.
Create a template: Although only data governors can create identifiers, data owners can add identifiers to templates, which they then apply to their data sources to override minConfidence or tags for identifiers within the template.
Create a custom identifier: Data governors can create custom identifiers to define their own regular expressions, dictionaries, and tags that SDD will use to discover and tag data.

Create and Apply a Template to a Data Source

Create a template

Generate your API key on the and save the API key somewhere secure. You will include this API key in the authorization header when you make a request to the Immuta API.
Find identifiers to include in your template using one of these methods:

Immuta CLI

HTTP API

If the request was successful, you will receive a list of available identifiers.

Save the template payload in a .json file. Use the tabs below to see different examples of templates.

{
  "name": "ACCOUNT_NUMBERS_TEMPLATE",
  "displayName": "Account Numbers Template",
  "description": "This template contains the identifier that recognizes account numbers.",
  "classifiers": [
    {
      "name": "ACCOUNT_NUMBER_IDENTIFIER"
    }
  ],
  "sampleSize": 100
}

{
  "name": "EMPLOYEE_DESK_LOCATION_TEMPLATE",
  "displayName": "Employee Desk Location Template",
  "description": "This template contains the identifier that detects when the name of the room an employee's desk is in appears in a dataset.",
  "classifiers": [
    {
      "name": "EMPLOYEE_DESK_LOCATION_IDENTIFIER"
    }
  ],
  "sampleSize": 100
}

{
  "name": "SOCIAL_SECURITY_NUMBERS_TEMPLATE",
  "displayName": "Social Security Numbers Template",
  "description": "This template contains the identifier that matches social security number column names with the defined regex.",
  "classifiers": [
    {
      "name": "SOCIAL_SECURITY_NUMBER_COLUMNS_IDENTIFIER"
    }
  ],
  "sampleSize": 100
}

{
  "name": "STUDENT_LOCATION_TEMPLATE",
  "displayName": "Student Location Template",
  "description": "This template contains the identifier that detects when a student's residence hall, floor, or room appears in a dataset.",
  "classifiers": [
    {
      "name": "STUDENT_LOCATION_IDENTIFIER"
    }
  ],
  "sampleSize": 100
}

Create the template:

Immuta CLI

immuta api sdd/template -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/template

If the request is successful, you will receive a response that contains details about the template. Use the tabs below to see different responses for different templates.

{
  "name": "ACCOUNT_NUMBERS_TEMPLATE",
  "displayName": "Account Numbers Template",
  "description": "This template contains the identifier that recognizes account numbers.",
  "sampleSize": 100,
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "id": 1,
  "createdAt": "2021-10-21T19:12:22.092Z",
  "updatedAt": "2021-10-21T19:12:22.092Z",
  "classifiers": [
    {
      "name": "ACCOUNT_NUMBER_IDENTIFIER",
      "overrides": {}
    }
  ]
}

{
  "name": "EMPLOYEE_DESK_LOCATION_TEMPLATE",
  "displayName": "Employee Desk Location Template",
  "description": "This template contains the identifier that detects when the name of the room an employee's desk is in appears in a dataset.",
  "sampleSize": 100,
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "id": 1,
  "createdAt": "2021-10-21T18:03:58.967Z",
  "updatedAt": "2021-10-21T18:03:58.967Z",
  "classifiers": [{
    "name": "EMPLOYEE_DESK_LOCATION_IDENTIFIER",
    "overrides": {}
  }]
}

{
  "name": "SOCIAL_SECURITY_NUMBERS_TEMPLATE",
  "displayName": "Social Security Numbers Template",
  "description": "This template contains the identifier that matches social security number column names with the defined regex.",
  "sampleSize": 100,
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "id": 2,
  "createdAt": "2021-10-21T19:12:22.092Z",
  "updatedAt": "2021-10-21T19:12:22.092Z",
  "classifiers": [
    {
      "name": "SOCIAL_SECURITY_NUMBER_COLUMNS_IDENTIFIER",
      "overrides": {}
    }
  ]
}

After the template is applied to data sources and sensitive data discovery is run, the Discovered.social-security-number tag will be applied to columns that have a name that match the ssn|social ?security regex, such as ssn, socialsecurity, or social security.

{
  "name": "STUDENT_LOCATION_TEMPLATE",
  "displayName": "Student Location Template",
  "description": "This template contains the identifier that detects when a student's residence hall, floor, or room appears in a dataset.",
  "sampleSize": 100,
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "id": 1,
  "createdAt": "2021-10-21T18:03:58.967Z",
  "updatedAt": "2021-10-21T18:03:58.967Z",
  "classifiers": [{
    "name": "STUDENT_LOCATION_IDENTIFIER",
    "overrides": {}
  }]
}

Apply a template to data sources

Attributes of all custom identifiers and templates are provided on the . However, attributes specific to this section are outlined in the table below.

Attribute

Description

Find templates to apply to your data sources:

Immuta CLI

immuta api sdd/template

HTTP API

curl \
    --request GET \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    https://your-immuta-url.immuta.com/sdd/template

If the request was successful, you will receive a list of available templates.

{
  "count": 3,
  "hits": [
    {
      "name": "ACCOUNT_NUMBERS_TEMPLATE",
      "displayName": "Account Numbers Template",
      "description": "This template contains the identifier that recognizes account numbers.",
      "sampleSize": 100,
      "createdBy": {
        "id": 1,
        "name": "John",
        "email": "john@example.com"
      },
      "id": 2,
      "createdAt": "2021-10-20T19:13:35.319Z",
      "updatedAt": "2021-10-20T19:13:35.319Z",
      "classifiers": [
        {
          "name": "ACCOUNT_NUMBER_IDENTIFIER",
          "overrides": {}
        }
      ]
    },
    {
      "name": "EMPLOYEE_DESK_LOCATION_TEMPLATE",
      "displayName": "Employee Desk Location Template",
      "description": "Contains identifier that detects when the name of a room a desk is in appears in a dataset.",
      "sampleSize": 100,
      "createdBy": {
        "id": 1,
        "name": "John",
        "email": "john@example.com"
      },
      "id": 1,
      "createdAt": "2021-10-20T18:03:58.967Z",
      "updatedAt": "2021-10-20T18:03:58.967Z",
      "classifiers": [
        {
          "name": "EMPLOYEE_DESK_LOCATION_IDENTIFIER",
          "overrides": {}
        }
      ]
    },
    {
      "name": "SOCIAL_SECURITY_NUMBERS_TEMPLATE",
      "displayName": "Social Security Numbers Template",
      "description": "Contains identifier that matches ssn column names with the defined regex.",
      "sampleSize": 100,
      "createdBy": {
        "id": 1,
        "name": "John",
        "email": "john@example.com"
      },
      "id": 3,
      "createdAt": "2021-10-20T19:13:58.359Z",
      "updatedAt": "2021-10-20T19:13:58.359Z",
      "classifiers": [
        {
          "name": "SOCIAL_SECURITY_NUMBER_COLUMNS_IDENTIFIER",
          "overrides": {}
        }
      ]
    }
  ]
}

Select an appropriate template to apply to your data sources, and save the payload in a .json file:

{
  "template": "ACCOUNT_NUMBERS_TEMPLATE",
  "sources": [
    "Insurance Data"
  ]
}

Apply the template to your data source(s):

Immuta CLI

immuta api sdd/template/apply -X PUT --input ./example-payload.json

HTTP API

curl \
    --request PUT \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/template/apply

You will receive a response that indicates whether or not the template was successfully applied to your data sources.

{
  "success": true
}

Additional tutorials

Clone a template

Users cannot modify templates created by other data owners, but they can clone templates and make changes to the clone.

Get a list of templates to determine the template you want to clone using one of these methods:

Immuta CLI

immuta api sdd/sdd/template?sortField=name&sortOrder=asc&offset=5&limit=5

HTTP API

curl \
    --request GET \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    https://your-immuta-url.immuta.com/sdd/template?sortField=name&sortOrder=asc&offset=5&limit=5

Save the template clone name and details in a .json file.

{
  "name": "INSURANCE_ACCOUNT_NUMBERS",
  "displayName": "Insurance Account Numbers",
  "description": "This template is specific to insurance accounts."
}

Clone the template:

Immuta CLI

immuta api sdd/template/ACCOUNT_NUMBERS_TEMPLATE/clone -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/template/ACCOUNT_NUMBERS_TEMPLATE/clone

If the request was successful, you will receive a response that provides details about the template clone.

{
  "name": "INSURANCE_ACCOUNT_NUMBERS",
  "displayName": "Insurance Account Numbers",
  "description": "This template is specific to insurance accounts.",
  "sampleSize": 100,
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "id": 4,
  "createdAt": "2021-10-20T20:48:37.805Z",
  "updatedAt": "2021-10-20T20:48:37.805Z",
  "classifiers": [
    {
      "name": "ACCOUNT_NUMBER_IDENTIFIER",
      "overrides": {}
    }
  ]
}

You can now modify the template, such as changing the identifiers (classifiers) included and the sampleSize.

Configure entity tags and confidence

To disable entity tags from being set, you can create a template to that configures the identifier that contains that tag.

To override the Discovered.PHI tag, you would create a template that includes the PERSON_NAME identifier and removes the Discovered.PHI from the list of tags in the template payload.

View the details about the PERSON_NAME identifier so you know what to include in your template using one of these methods:

Immuta CLI

immuta api sdd/classifier?sortField=name&sortOrder=asc&limit=25&searchText=PERSON_NAME

HTTP API

curl \
    --request GET \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    https://your-immuta-url.immuta.com/sdd/classifier?sortField=name&sortOrder=asc&limit=25&searchText=PERSON_NAME

If the request was successful, the response will include details about the PERSON_NAME identifier.

{
  "createdBy": {
    "id": 21,
    "name": "Immuta System Account",
    "email": "immuta_system@immuta.com"
  },
  "name": "PERSON_NAME",
  "displayName": "Person Name",
  "description": "Detects strings consistent with a dictionary of people's names.",
  "type": "builtIn",
  "config": {
    "tags": [
      "Discovered.PHI",
      "Discovered.PII",
      "Discovered.Entity.Person Name",
      "Discovered.Identifier Indirect"
    ],
    "minConfidence": 0.3
  },
  "id": 54,
  "createdAt": "2021-10-21T07:35:14.416Z",
  "updatedAt": "2021-10-21T12:57:43.919Z"
}

Remove the Discovered.PHI tag from the list of tags in the identifier config, and save the template payload in a .json file.

{
  "name": "PERSON_NAME_OVERRIDE",
  "displayName": "Person Name Override",
  "description": "This template removes the PHI tag from the PERSON_NAME identifier.",
  "classifiers": [
    {
      "name": "PERSON_NAME",
        "overrides": {
          "tags": [
            "Discovered.PII",
            "Discovered.Entity.Person Name",
            "Discovered.Identifier Indirect"
          ]
        }
      }
    ],
  "sampleSize": 100
}

Create the template:

Immuta CLI

immuta api sdd/template -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/template

If the request is successful, you will receive a response that details the new template:

{
  "name": "PERSON_NAME_OVERRIDE",
  "displayName": "Person Name Override",
  "description": "This template removes the PHI tag from the PERSON_NAME identifier.",
  "sampleSize": 100,
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "id": 1,
  "createdAt": "2021-10-21T17:11:18.057Z",
  "updatedAt": "2021-10-21T17:11:18.057Z",
  "classifiers": [
    {
      "name": "PERSON_NAME",
      "overrides": {
        "tags": [
          "Discovered.PII",
          "Discovered.Entity.Person Name",
          "Discovered.Identifier Indirect"
        ]
      }
    }
  ]
}

What's next

Now that you've created a template, continue to one of the following tutorials:

: Opt to add your template to the SDD global settings so that Immuta will use this template to run SDD for all data sources.