1 of 9

Customize Sensitive Data Discovery

Enable and Manage Global SDD Settings

Enable Sensitive Data Discovery

Click the App Settings icon in the left sidebar.
Click Sensitive Data Discovery in the left panel to navigate to that section.
Select the checkbox to enable SDD, and then click Save and Confirm to apply your changes.

Configure a Global Template

Click the App Settings icon in the left sidebar.
Click Sensitive Data Discovery in the left panel to navigate to that section.
Enter the name of your global template in the Global SDD Template Name field.
Click Save, and then Confirm your changes.

Configure a Default Sample Size

When a sample size is not specified in a template, SDD will use the default sample size of 1000 records. To adjust the sample size,

Click the App Settings icon in the left sidebar.
Click Sensitive Data Discovery in the left panel to navigate to that section.
Enter the number of rows in a data source you would like sampled when running SDD in the Default SDD Sample Size field.
Click Save, and then Confirm your changes.

Run Sensitive Data Discovery on Data Sources

In previous documentation, identifier is referred to as classifier. The language is being updated to identifier to be more accurate and not conflate meaning with the Immuta data classification and frameworks feature.

Attributes overview

Attributes of all custom identifiers and templates are provided on the Sensitive data discovery API page. However, attributes specific to this section are outlined below.

Attribute

Description

Run SDD on data sources

Specify the data sources you would like to run SDD on, and save the payload in a .json file.
```
{
  "sources": [
    "Insurance Data"
  ]
}
```
Or choose to run SDD on all the data sources in Immuta, and save the payload in a .json file.
```
{
  "all": true
}
```
Trigger SDD using one of these methods:

Immuta CLI

immuta api sdd/run -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/run

If sensitive data discovery was successfully run, you will receive a response similar to this:

{
  "Insurance Data": {
    "id": "d2edc1d0-328c-11ec-9d5a-6793988ccf95",
    "state": "completed",
    "output": {
      "diff": {
        "addedTags": {
          "ssn": [
            "Discovered.PII"
          ],
          "email": [
            "Discovered.PII"
          ]
        },
        "removedTags": {
          "ssn": [
            "Discovered.Country.US"
          ]
        }
      },
      "sddTagResult": {
        "ssn": [
          "Discovered.Entity.Social Security Number",
          "Discovered.Identifier Direct",
          "Discovered.PHI",
          "Discovered.PII"
        ],
        "email": [
          "Discovered.Entity.Electronic Mail Address",
          "Discovered.Identifier Direct",
          "Discovered.PHI",
          "Discovered.PII"
        ]
      }
    }
  }
}

Additional tutorials

Test SDD on a data source

Users can test how SDD will apply tags to their data sources by completing a dryRun, which allows users to test templates and tags:

test templates: If a template is specified in the payload when the dryRun is true, SDD will use this template instead of the template applied to the data source. Note: SDD will error if a template is specified here when dryRun is false.
test tags: Instead of applying tags, SDD just returns the tags that would be applied to the data source. This allows users to evaluate whether or not identifiers or templates are applying tags correctly without updating the data source.

After evaluating whether or not the tags have been applied appropriately, users can then make necessary changes to a template before triggering SDD again.

To complete a dryRun,

Specify the data sources you would like to run sensitive data discovery on and set dryRun to true in the payload in a .json file. Note: You can also apply a template to a data source as a dryRun, like in the example below. However, when dryRun is false, a template cannot be included in the payload. Instead, the template must be added to the data source before running SDD.

{
  "sources": [
    "Medical Claims"
  ],
  "dryRun": true,
  "template": "PII_REVISION"
}

Trigger SDD using one of these methods:

Immuta CLI

immuta api sdd/run -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/run

You will receive a response that illustrates tags that will be added, tags that will be removed, and the final SDD result:

{
  "Medical Claims": {
    "id": "86fc4f70-380f-11ec-a432-81748c911385",
    "state": "completed",
    "output": {
      "diff": {
        "addedTags": {},
        "removedTags": {
          "dob": [
            "Discovered.Entity.Date",
            "Discovered.Entity.Date of Birth",
            "Discovered.Identifier Indirect",
            "Discovered.PHI",
            "Discovered.PII"
          ],
          "ssn": [
            "Discovered.Country.US",
            "Discovered.Entity.Social Security Number",
            "Discovered.Identifier Direct",
            "Discovered.PHI"
          ],
          "state": [
            "Discovered.Country.US",
            "Discovered.Entity.Location",
            "Discovered.Entity.State",
            "Discovered.Identifier Indirect"
          ],
          "gender": [
            "Discovered.Entity.Gender",
            "Discovered.Identifier Indirect",
            "Discovered.PHI",
            "Discovered.PII"
          ],
          "date_of_service": [
            "Discovered.Entity.Date",
            "Discovered.Identifier Indirect",
            "Discovered.PHI",
            "Discovered.PII"
          ]
        }
      },
      "sddTagResult": {
        "ssn": [
          "Discovered.PII"
        ]
      }
    }
  }
}

Once you are satisfied with how tags are applied by SDD, set dryRun to false (or omit it from the payload).

{
  "sources": [
    "Medical Claims"
  ],
  "dryRun": false
}

Trigger SDD again:

Immuta CLI

immuta api sdd/run -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/run

If the request was successful, you will receive a response similar to this one:

{
  "Medical Claims": {
    "id": "2afcfe00-3813-11ec-b171-9331e3d3aa04",
    "state": "completed",
    "output": {
      "diff": {
        "addedTags": {},
        "removedTags": {
          "dob": [
            "Discovered.Entity.Date",
            "Discovered.Entity.Date of Birth",
            "Discovered.Identifier Indirect",
            "Discovered.PHI",
            "Discovered.PII"
          ],
          "ssn": [
            "Discovered.Country.US",
            "Discovered.Entity.Social Security Number",
            "Discovered.Identifier Direct",
            "Discovered.PHI"
          ],
          "state": [
            "Discovered.Country.US",
            "Discovered.Entity.Location",
            "Discovered.Entity.State",
            "Discovered.Identifier Indirect"
          ],
          "gender": [
            "Discovered.Entity.Gender",
            "Discovered.Identifier Indirect",
            "Discovered.PHI",
            "Discovered.PII"
          ],
          "date_of_service": [
            "Discovered.Entity.Date",
            "Discovered.Identifier Indirect",
            "Discovered.PHI",
            "Discovered.PII"
          ]
        }
      },
      "sddTagResult": {
        "ssn": [
          "Discovered.PII"
        ]
      }
    }
  }
}

Trigger SDD in the Immuta UI

Select a data source from your My Data Sources page.
Click the Health Check dropdown menu.
In the Sensitive Data Discovery (SDD) section, click Re-run.

What's next

Continue to one of the following tutorials:

Run sensitive data discovery on data sources: Trigger SDD to run on specified data sources.
Create a template: Although only data governors can create identifiers, data owners can add identifiers to templates, which they then apply to their data sources to override minConfidence or tags for identifiers within the template.
Create a custom identifier: Data governors can create custom identifiers to define their own regular expressions, dictionaries, and tags that SDD will use to discover and tag data.

Create a Custom Identifier

Create a Regex Identifier

Use case: Custom regex identifier

Scenario: You've listed Immuta's built-in identifiers for sensitive data discovery, but you discover there is no identifier that can automatically identify and tag columns that contain account numbers in your database.

A regular expression (regex) custom identifier allows you to create your own rules that enable Immuta's sensitive data discovery to find matches based on a regex pattern. For example, if a table contains account numbers in the form of xxxxxxxxx-xxx-x, you could define a regex pattern in a custom identifier to identify and tag these columns. The tutorial below uses this scenario to illustrate creating this identifier.

Attributes of the custom regex identifier

Attributes of all custom identifiers are provided on the Sensitive data discovery API page. However, attributes specific to the custom regex identifier are outlined in the table below.

Attribute

Description

Required

Create a custom regex identifier

Generate your API key on the API Keys tab on your profile page and save the API key somewhere secure. You will include this API key in the authorization header when you make a request to the Immuta API or use it to configure your instance with the Immuta CLI.

Save the custom regex identifier payload in a .json file.

{
  "name": "ACCOUNT_NUMBER_IDENTIFIER",
  "displayName": "Account Number Identifier",
  "description": "This identifier recognizes account numbers using a regex",
  "type": "regex",
  "config": {
    "regex": "^[0-9]{9}-[0-9]{3}-[0-9]{1}$",
    "minConfidence": 0.5,
    "tags": ["Discovered.account-number"]
  }
}

Create the identifier using one of these methods:

Immuta CLI

immuta api sdd/classifier -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/classifier

If the request is successful, you will receive a response that contains details about the identifier.

{
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "name": "ACCOUNT_NUMBER_IDENTIFIER",
  "displayName": "Account Number Identifier",
  "description": "This identifier recognizes account numbers using a regex",
  "type": "regex",
  "config": {
    "tags": [
      "Discovered.account-number"
    ],
    "regex": "[0-9]{9}-[0-9]{3}-[0-9]{1}",
    "minConfidence": 0.5
  },
  "id": 1,
  "createdAt": "2021-10-14T18:48:56.289Z",
  "updatedAt": "2021-10-14T18:48:56.289Z"
}

What's next

Continue to one of the following tutorials:

Run sensitive data discovery on data sources: Trigger SDD to run on specified data sources.
Create a template: Although only data governors can create identifiers, data owners can add identifiers to templates, which they then apply to their data sources to override minConfidence or tags for identifiers within the template.

Create a Column Name Regex Identifier

Use case: Custom column name regex identifier

Scenario: You've listed Immuta's built-in identifiers for sensitive data discovery, but you discover there is no identifier that can automatically detect and tag columns that contain account numbers in your database.

A custom column name regular expression (regex) identifier allows you to create your own detectors that enable Immuta's sensitive data discovery to find column name matches based on a regex pattern. For example, if your database contains tables with social security numbers, you could define a regex pattern to match against the names of the column instead of the values within the column. The tutorial below uses this scenario to illustrate creating this identifier.

Attributes of the custom column name regex identifier

Attributes of all custom identifiers are provided on the Sensitive data discovery API page. However, attributes specific to the custom column name regex identifier are outlined in the table below.

Attribute

Description

Required

Create a custom column name regex identifier

Generate your API key on the API Keys tab on your profile page and save the API key somewhere secure. You will include this API key in the authorization header when you make a request to the Immuta API or use it to configure your instance with the Immuta CLI.

Save the custom column name regex identifier payload in a .json file. The regex ^ssn|social ?security$ looks for column names that match ssn, socialsecurity, or social security.

{
  "name": "SOCIAL_SECURITY_NUMBER_COLUMNS_IDENTIFIER",
  "displayName": "Social Security Number Columns Identifier",
  "description": "This identifier recognizes column names that match the defined regex pattern.",
  "type": "columnNameRegex",
  "config": {
    "columnNameRegex": "^ssn|social ?security$",
    "tags": ["Discovered.Social Security Numbers"]
  }
}

Create the identifier using one of these methods:

Immuta CLI

immuta api sdd/classifier -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/classifier

If the request is successful, you will receive a response that contains details about the identifier.

{
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "name": "SOCIAL_SECURITY_NUMBER_COLUMNS_IDENTIFIER",
  "displayName": "Social Security Number Columns Identifier",
  "description": "This identifier recognizes column names that match the defined regex pattern.",
  "type": "columnNameRegex",
  "config": {
    "tags": [
      "Discovered.Social Security Number"
    ],
    "columnNameRegex": "^ssn|social ?security$"
  },
  "id": 2,
  "createdAt": "2021-10-14T18:48:56.289Z",
  "updatedAt": "2021-10-14T18:48:56.289Z"
}

What's Next

Continue to one of the following tutorials:

Run sensitive data discovery on data sources: Trigger SDD to run on specified data sources.
Create a template: Although only data governors can create identifiers, data owners can add identifiers to templates, which they then apply to their data sources to override minConfidence or tags for identifiers within the template.

Create a Dictionary Identifier

Use case: Custom dictionary identifier

Scenario: You have data that includes the names of the rooms employees' desks are in across your organization. Although these locations may be considered sensitive in particular datasets, they would not be recognized by Immuta's built-in identifiers.

A custom dictionary identifier allows you to create your own rules that enable Immuta's sensitive data discovery to match a list of room names to values in the dataset. The tutorial below uses this scenario to illustrate creating this identifier.

Attributes of the custom dictionary identifier

Attributes of all custom identifiers are provided on the Sensitive data discovery API page. However, attributes specific to the custom dictionary identifier are outlined in the table below.

Attribute

Description

Create a custom dictionary identifier

Generate your API key on the API Keys tab on your profile page and save the API key somewhere secure. You will include this API key in the authorization header when you make a request to the Immuta API or use it to configure your instance with the Immuta CLI.

Save the custom dictionary identifier payload in a .json file. The dictionary below contains the words Research Lab, Blue Room, and Purple Room.

{
  "name": "EMPLOYEE_DESK_LOCATION_IDENTIFIER",
  "displayName": "Employee Desk Location Identifier",
  "description": "This identifier detects when an employee's desk location appears in a dataset.",
  "type": "dictionary",
  "config": {
    "values": ["Research Lab", "Blue Room", "Purple Room"],
    "caseSensitive": false,
    "minConfidence": 0.6,
    "tags": ["Discovered.desk-location"]
  }
}

Create the identifier using one of these methods:

Immuta CLI

immuta api sdd/classifier -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/classifier

If the request is successful, you will receive a response that contains details about the identifier.

{
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "name": "EMPLOYEE_DESK_LOCATION_IDENTIFIER",
  "displayName": "Employee Desk Location Identifier",
  "description": "This identifier detects when an employee's desk location appears in a dataset.",
  "type": "dictionary",
  "config": {
    "tags": [
      "Discovered.desk-location"
    ],
    "values": [
      "Research Lab",
      "Blue Room",
      "Purple Room"
    ],
    "caseSensitive": false,
    "minConfidence": 0.6
  },
  "id": 68,
  "createdAt": "2021-10-20T17:57:51.696Z",
  "updatedAt": "2021-10-20T17:57:51.696Z"
}

What's next

Continue to one of the following tutorials:

Run sensitive data discovery on data sources: Trigger SDD to run on specified data sources.
Create a template: Although only data governors can create identifiers, data owners can add identifiers to templates, which they then apply to their data sources to override minConfidence or tags for identifiers within the template.

List Built-In Identifier

Attributes

Attributes of identifiers and templates are provided on the Sensitive data discovery API page. However, attributes specific to listing identifiers are outlined in the table below.

Attribute

Description

Response details

The response lists all built-in identifiers that are currently supported in Immuta SDD and their details, including their name and description. For example,

{
  "count": 67,
  "hits": [
    {
      "createdBy": {
        "id": 21,
        "name": "Immuta System Account",
        "email": "immuta_system@immuta.com"
      },
      "name": "AGE",
      "displayName": "Age",
      "description": "Detects numeric strings between 10 and 199, provided the column header contains text such as `age`, `year`, `years`, `yr`, or `yrs`.",
      "type": "builtIn",
      "config": {
        "minConfidence": 0.7,
        "tags": [
          "Discovered.PII",
          "Discovered.Identifier Indirect",
          "Discovered.PHI",
          "Discovered.Entity.Age"
        ],
        "conditionalTags": {}
      },
      "id": 3,
      "createdAt": "2021-10-28T07:34:58.761Z",
      "updatedAt": "2021-10-28T07:34:58.761Z"
    }
  ]
}

List built-in identifiers

Generate your API key on the API Keys tab on your profile page and save the API key somewhere secure. You will include this API key in the authorization header when you make a request to the Immuta API or use it to configure your instance with the Immuta CLI.
List built-in identifiers using one of these methods:

Immuta CLI

immuta api sdd/classifier?sortField=name&sortOrder=asc&limit=100&type=builtIn

HTTP API

curl \
    --request GET \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    https://your-immuta-url.immuta.com/sdd/classifier?sortField=name&sortOrder=asc&limit=100&type=builtIn

If the request was successful, you will receive a list of built-in identifiers.

What's next

Run sensitive data discovery on data sources: Trigger SDD to run on specified data sources.
Create a template: Although only data governors can create identifiers, data owners can add identifiers to templates, which they then apply to their data sources to override minConfidence or tags for identifiers within the template.
Create a custom identifier: Data governors can create custom identifiers to define their own regular expressions, dictionaries, and tags that SDD will use to discover and tag data.

Create and Apply a Template to a Data Source

Create a template

Generate your API key on the API Keys tab on your profile page and save the API key somewhere secure. You will include this API key in the authorization header when you make a request to the Immuta API.
Find identifiers to include in your template using one of these methods:

Immuta CLI

immuta api sdd/classifier?sortField=name&sortOrder=asc&limit=25&searchText=IDENTIFIER

HTTP API

curl \
    --request GET \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    https://your-immuta-url.immuta.com/sdd/classifier?sortField=name&sortOrder=asc&limit=25&searchText=IDENTIFIER

If the request was successful, you will receive a list of available identifiers.

{
  "count": 3,
  "hits": [
    {
      "createdBy": {
        "id": 1,
        "name": "John",
        "email": "john@example.com"
      },
      "name": "ACCOUNT_NUMBER_IDENTIFIER",
      "displayName": "Account Number Identifier",
      "description": "This identifier recognizes account numbers using a regex",
      "type": "regex",
      "config": {
        "tags": [
          "Discovered.account-number"
        ],
        "regex": "[0-9]{9}-[0-9]{3}-[0-9]{1}",
        "minConfidence": 0.5
      },
      "id": 104,
      "createdAt": "2021-10-20T19:12:24.889Z",
      "updatedAt": "2021-10-20T19:12:24.889Z"
    },
    {
      "createdBy": {
        "id": 1,
        "name": "John",
        "email": "john@example.com"
      },
      "name": "EMPLOYEE_DESK_LOCATION_IDENTIFIER",
      "displayName": "Employee Desk Location Identifier",
      "description": "This identifier detects when an employee's desk location appears in a dataset.",
      "type": "dictionary",
      "config": {
        "tags": [
          "Discovered.desk-location"
        ],
        "values": [
          "Research Lab",
          "Blue Room",
          "Purple Room"
        ],
        "caseSensitive": false,
        "minConfidence": 0.6
      },
      "id": 68,
      "createdAt": "2021-10-20T17:57:51.696Z",
      "updatedAt": "2021-10-20T17:57:51.696Z"
    },
    {
      "createdBy": {
        "id": 1,
        "name": "John",
        "email": "john@example.com"
      },
      "name": "SOCIAL_SECURITY_NUMBER_COLUMNS_IDENTIFIER",
      "displayName": "Social Security Number Columns Identifier",
      "description": "This identifier recognizes column names that match the defined regex pattern.",
      "type": "columnNameRegex",
      "config": {
        "tags": [
          "Discovered.Social Security Numbers"
        ],
        "columnNameRegex": "ssn|social ?security"
      },
      "id": 67,
      "createdAt": "2021-10-20T17:57:17.930Z",
      "updatedAt": "2021-10-20T17:57:17.930Z"
    }
  ]
}

Save the template payload in a .json file. Use the tabs below to see different examples of templates.

{
  "name": "ACCOUNT_NUMBERS_TEMPLATE",
  "displayName": "Account Numbers Template",
  "description": "This template contains the identifier that recognizes account numbers.",
  "classifiers": [
    {
      "name": "ACCOUNT_NUMBER_IDENTIFIER"
    }
  ],
  "sampleSize": 100
}

{
  "name": "EMPLOYEE_DESK_LOCATION_TEMPLATE",
  "displayName": "Employee Desk Location Template",
  "description": "This template contains the identifier that detects when the name of the room an employee's desk is in appears in a dataset.",
  "classifiers": [
    {
      "name": "EMPLOYEE_DESK_LOCATION_IDENTIFIER"
    }
  ],
  "sampleSize": 100
}

{
  "name": "SOCIAL_SECURITY_NUMBERS_TEMPLATE",
  "displayName": "Social Security Numbers Template",
  "description": "This template contains the identifier that matches social security number column names with the defined regex.",
  "classifiers": [
    {
      "name": "SOCIAL_SECURITY_NUMBER_COLUMNS_IDENTIFIER"
    }
  ],
  "sampleSize": 100
}

{
  "name": "STUDENT_LOCATION_TEMPLATE",
  "displayName": "Student Location Template",
  "description": "This template contains the identifier that detects when a student's residence hall, floor, or room appears in a dataset.",
  "classifiers": [
    {
      "name": "STUDENT_LOCATION_IDENTIFIER"
    }
  ],
  "sampleSize": 100
}

Create the template:

Immuta CLI

immuta api sdd/template -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/template

If the request is successful, you will receive a response that contains details about the template. Use the tabs below to see different responses for different templates.

{
  "name": "ACCOUNT_NUMBERS_TEMPLATE",
  "displayName": "Account Numbers Template",
  "description": "This template contains the identifier that recognizes account numbers.",
  "sampleSize": 100,
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "id": 1,
  "createdAt": "2021-10-21T19:12:22.092Z",
  "updatedAt": "2021-10-21T19:12:22.092Z",
  "classifiers": [
    {
      "name": "ACCOUNT_NUMBER_IDENTIFIER",
      "overrides": {}
    }
  ]
}

After the template is applied to data sources and sensitive data discovery is run, the Discovered.account-number tag will be applied to columns that Immuta identifies with 50% confidence, as configured in the identifier.

{
  "name": "EMPLOYEE_DESK_LOCATION_TEMPLATE",
  "displayName": "Employee Desk Location Template",
  "description": "This template contains the identifier that detects when the name of the room an employee's desk is in appears in a dataset.",
  "sampleSize": 100,
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "id": 1,
  "createdAt": "2021-10-21T18:03:58.967Z",
  "updatedAt": "2021-10-21T18:03:58.967Z",
  "classifiers": [{
    "name": "EMPLOYEE_DESK_LOCATION_IDENTIFIER",
    "overrides": {}
  }]
}

After the template is applied to data sources and sensitive data discovery is run, the Discovered.desk-location tag will be applied to columns when Immuta detects the values Research Lab, Blue Room or Purple Room with 60% confidence, as configured in the identifier.

{
  "name": "SOCIAL_SECURITY_NUMBERS_TEMPLATE",
  "displayName": "Social Security Numbers Template",
  "description": "This template contains the identifier that matches social security number column names with the defined regex.",
  "sampleSize": 100,
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "id": 2,
  "createdAt": "2021-10-21T19:12:22.092Z",
  "updatedAt": "2021-10-21T19:12:22.092Z",
  "classifiers": [
    {
      "name": "SOCIAL_SECURITY_NUMBER_COLUMNS_IDENTIFIER",
      "overrides": {}
    }
  ]
}

After the template is applied to data sources and sensitive data discovery is run, the Discovered.social-security-number tag will be applied to columns that have a name that match the ssn|social ?security regex, such as ssn, socialsecurity, or social security.

{
  "name": "STUDENT_LOCATION_TEMPLATE",
  "displayName": "Student Location Template",
  "description": "This template contains the identifier that detects when a student's residence hall, floor, or room appears in a dataset.",
  "sampleSize": 100,
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "id": 1,
  "createdAt": "2021-10-21T18:03:58.967Z",
  "updatedAt": "2021-10-21T18:03:58.967Z",
  "classifiers": [{
    "name": "STUDENT_LOCATION_IDENTIFIER",
    "overrides": {}
  }]
}

After the template is applied to data sources and sensitive data discovery is run, the Discovered.residence-hall tag will be applied to columns when Immuta detects values that match those listed in the Residence Halls data source with 70% confidence, as configured in the identifier.

Apply a template to data sources

Attributes of all custom identifiers and templates are provided on the Sensitive data discovery API page. However, attributes specific to this section are outlined in the table below.

Attribute

Description

Find templates to apply to your data sources:

Immuta CLI

immuta api sdd/template

HTTP API

curl \
    --request GET \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    https://your-immuta-url.immuta.com/sdd/template

If the request was successful, you will receive a list of available templates.

{
  "count": 3,
  "hits": [
    {
      "name": "ACCOUNT_NUMBERS_TEMPLATE",
      "displayName": "Account Numbers Template",
      "description": "This template contains the identifier that recognizes account numbers.",
      "sampleSize": 100,
      "createdBy": {
        "id": 1,
        "name": "John",
        "email": "john@example.com"
      },
      "id": 2,
      "createdAt": "2021-10-20T19:13:35.319Z",
      "updatedAt": "2021-10-20T19:13:35.319Z",
      "classifiers": [
        {
          "name": "ACCOUNT_NUMBER_IDENTIFIER",
          "overrides": {}
        }
      ]
    },
    {
      "name": "EMPLOYEE_DESK_LOCATION_TEMPLATE",
      "displayName": "Employee Desk Location Template",
      "description": "Contains identifier that detects when the name of a room a desk is in appears in a dataset.",
      "sampleSize": 100,
      "createdBy": {
        "id": 1,
        "name": "John",
        "email": "john@example.com"
      },
      "id": 1,
      "createdAt": "2021-10-20T18:03:58.967Z",
      "updatedAt": "2021-10-20T18:03:58.967Z",
      "classifiers": [
        {
          "name": "EMPLOYEE_DESK_LOCATION_IDENTIFIER",
          "overrides": {}
        }
      ]
    },
    {
      "name": "SOCIAL_SECURITY_NUMBERS_TEMPLATE",
      "displayName": "Social Security Numbers Template",
      "description": "Contains identifier that matches ssn column names with the defined regex.",
      "sampleSize": 100,
      "createdBy": {
        "id": 1,
        "name": "John",
        "email": "john@example.com"
      },
      "id": 3,
      "createdAt": "2021-10-20T19:13:58.359Z",
      "updatedAt": "2021-10-20T19:13:58.359Z",
      "classifiers": [
        {
          "name": "SOCIAL_SECURITY_NUMBER_COLUMNS_IDENTIFIER",
          "overrides": {}
        }
      ]
    }
  ]
}

Select an appropriate template to apply to your data sources, and save the payload in a .json file:

{
  "template": "ACCOUNT_NUMBERS_TEMPLATE",
  "sources": [
    "Insurance Data"
  ]
}

Apply the template to your data source(s):

Immuta CLI

immuta api sdd/template/apply -X PUT --input ./example-payload.json

HTTP API

curl \
    --request PUT \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/template/apply

You will receive a response that indicates whether or not the template was successfully applied to your data sources.

{
  "success": true
}

Additional tutorials

Clone a template

Users cannot modify templates created by other data owners, but they can clone templates and make changes to the clone.

Get a list of templates to determine the template you want to clone using one of these methods:

Immuta CLI

immuta api sdd/sdd/template?sortField=name&sortOrder=asc&offset=5&limit=5

HTTP API

curl \
    --request GET \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    https://your-immuta-url.immuta.com/sdd/template?sortField=name&sortOrder=asc&offset=5&limit=5

Save the template clone name and details in a .json file.

{
  "name": "INSURANCE_ACCOUNT_NUMBERS",
  "displayName": "Insurance Account Numbers",
  "description": "This template is specific to insurance accounts."
}

Clone the template:

Immuta CLI

immuta api sdd/template/ACCOUNT_NUMBERS_TEMPLATE/clone -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/template/ACCOUNT_NUMBERS_TEMPLATE/clone

If the request was successful, you will receive a response that provides details about the template clone.

{
  "name": "INSURANCE_ACCOUNT_NUMBERS",
  "displayName": "Insurance Account Numbers",
  "description": "This template is specific to insurance accounts.",
  "sampleSize": 100,
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "id": 4,
  "createdAt": "2021-10-20T20:48:37.805Z",
  "updatedAt": "2021-10-20T20:48:37.805Z",
  "classifiers": [
    {
      "name": "ACCOUNT_NUMBER_IDENTIFIER",
      "overrides": {}
    }
  ]
}

You can now modify the template, such as changing the identifiers (classifiers) included and the sampleSize.

Configure entity tags and confidence

To disable entity tags from being set, you can create a template to that configures the identifier that contains that tag.

For example, the built-in PERSON_NAME identifier contains the following tags: Discovered.PHI, Discovered.PII, Discovered.Entity.Person Name, and Discovered.Identifier Indirect. However, your organization doesn't have any health data, so you don't want the PHI tag to be applied to your data sources but you do want all the other tags within that identifier.

To override the Discovered.PHI tag, you would create a template that includes the PERSON_NAME identifier and removes the Discovered.PHI from the list of tags in the template payload.

View the details about the PERSON_NAME identifier so you know what to include in your template using one of these methods:

Immuta CLI

immuta api sdd/classifier?sortField=name&sortOrder=asc&limit=25&searchText=PERSON_NAME

HTTP API

curl \
    --request GET \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    https://your-immuta-url.immuta.com/sdd/classifier?sortField=name&sortOrder=asc&limit=25&searchText=PERSON_NAME

If the request was successful, the response will include details about the PERSON_NAME identifier.

{
  "createdBy": {
    "id": 21,
    "name": "Immuta System Account",
    "email": "immuta_system@immuta.com"
  },
  "name": "PERSON_NAME",
  "displayName": "Person Name",
  "description": "Detects strings consistent with a dictionary of people's names.",
  "type": "builtIn",
  "config": {
    "tags": [
      "Discovered.PHI",
      "Discovered.PII",
      "Discovered.Entity.Person Name",
      "Discovered.Identifier Indirect"
    ],
    "minConfidence": 0.3
  },
  "id": 54,
  "createdAt": "2021-10-21T07:35:14.416Z",
  "updatedAt": "2021-10-21T12:57:43.919Z"
}

Remove the Discovered.PHI tag from the list of tags in the identifier config, and save the template payload in a .json file.

{
  "name": "PERSON_NAME_OVERRIDE",
  "displayName": "Person Name Override",
  "description": "This template removes the PHI tag from the PERSON_NAME identifier.",
  "classifiers": [
    {
      "name": "PERSON_NAME",
        "overrides": {
          "tags": [
            "Discovered.PII",
            "Discovered.Entity.Person Name",
            "Discovered.Identifier Indirect"
          ]
        }
      }
    ],
  "sampleSize": 100
}

Create the template:

Immuta CLI

immuta api sdd/template -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/template

If the request is successful, you will receive a response that details the new template:

{
  "name": "PERSON_NAME_OVERRIDE",
  "displayName": "Person Name Override",
  "description": "This template removes the PHI tag from the PERSON_NAME identifier.",
  "sampleSize": 100,
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "id": 1,
  "createdAt": "2021-10-21T17:11:18.057Z",
  "updatedAt": "2021-10-21T17:11:18.057Z",
  "classifiers": [
    {
      "name": "PERSON_NAME",
      "overrides": {
        "tags": [
          "Discovered.PII",
          "Discovered.Entity.Person Name",
          "Discovered.Identifier Indirect"
        ]
      }
    }
  ]
}

What's next

Now that you've created a template, continue to one of the following tutorials:

SDD global settings: Opt to add your template to the SDD global settings so that Immuta will use this template to run SDD for all data sources.
Run sensitive data discovery on a data source

Run Sensitive Data Discovery on Data Sources

Attributes overview

Attributes of all custom identifiers and templates are provided on the Sensitive data discovery API page. However, attributes specific to this section are outlined below.

Attribute

Description

Run SDD on data sources

Specify the data sources you would like to run SDD on, and save the payload in a .json file.
```
{
  "sources": [
    "Insurance Data"
  ]
}
```
Or choose to run SDD on all the data sources in Immuta, and save the payload in a .json file.
```
{
  "all": true
}
```
Trigger SDD using one of these methods:

Immuta CLI

immuta api sdd/run -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/run

If sensitive data discovery was successfully run, you will receive a response similar to this:

{
  "Insurance Data": {
    "id": "d2edc1d0-328c-11ec-9d5a-6793988ccf95",
    "state": "completed",
    "output": {
      "diff": {
        "addedTags": {
          "ssn": [
            "Discovered.PII"
          ],
          "email": [
            "Discovered.PII"
          ]
        },
        "removedTags": {
          "ssn": [
            "Discovered.Country.US"
          ]
        }
      },
      "sddTagResult": {
        "ssn": [
          "Discovered.Entity.Social Security Number",
          "Discovered.Identifier Direct",
          "Discovered.PHI",
          "Discovered.PII"
        ],
        "email": [
          "Discovered.Entity.Electronic Mail Address",
          "Discovered.Identifier Direct",
          "Discovered.PHI",
          "Discovered.PII"
        ]
      }
    }
  }
}

Additional tutorials

Test SDD on a data source

Users can test how SDD will apply tags to their data sources by completing a dryRun, which allows users to test templates and tags:

test templates: If a template is specified in the payload when the dryRun is true, SDD will use this template instead of the template applied to the data source. Note: SDD will error if a template is specified here when dryRun is false.
test tags: Instead of applying tags, SDD just returns the tags that would be applied to the data source. This allows users to evaluate whether or not identifiers or templates are applying tags correctly without updating the data source.

After evaluating whether or not the tags have been applied appropriately, users can then make necessary changes to a template before triggering SDD again.

To complete a dryRun,

Specify the data sources you would like to run sensitive data discovery on and set dryRun to true in the payload in a .json file. Note: You can also apply a template to a data source as a dryRun, like in the example below. However, when dryRun is false, a template cannot be included in the payload. Instead, the template must be added to the data source before running SDD.

{
  "sources": [
    "Medical Claims"
  ],
  "dryRun": true,
  "template": "PII_REVISION"
}

Trigger SDD using one of these methods:

Immuta CLI

immuta api sdd/run -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/run

You will receive a response that illustrates tags that will be added, tags that will be removed, and the final SDD result:

{
  "Medical Claims": {
    "id": "86fc4f70-380f-11ec-a432-81748c911385",
    "state": "completed",
    "output": {
      "diff": {
        "addedTags": {},
        "removedTags": {
          "dob": [
            "Discovered.Entity.Date",
            "Discovered.Entity.Date of Birth",
            "Discovered.Identifier Indirect",
            "Discovered.PHI",
            "Discovered.PII"
          ],
          "ssn": [
            "Discovered.Country.US",
            "Discovered.Entity.Social Security Number",
            "Discovered.Identifier Direct",
            "Discovered.PHI"
          ],
          "state": [
            "Discovered.Country.US",
            "Discovered.Entity.Location",
            "Discovered.Entity.State",
            "Discovered.Identifier Indirect"
          ],
          "gender": [
            "Discovered.Entity.Gender",
            "Discovered.Identifier Indirect",
            "Discovered.PHI",
            "Discovered.PII"
          ],
          "date_of_service": [
            "Discovered.Entity.Date",
            "Discovered.Identifier Indirect",
            "Discovered.PHI",
            "Discovered.PII"
          ]
        }
      },
      "sddTagResult": {
        "ssn": [
          "Discovered.PII"
        ]
      }
    }
  }
}

Once you are satisfied with how tags are applied by SDD, set dryRun to false (or omit it from the payload).

{
  "sources": [
    "Medical Claims"
  ],
  "dryRun": false
}

Trigger SDD again:

Immuta CLI

immuta api sdd/run -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/run

If the request was successful, you will receive a response similar to this one:

{
  "Medical Claims": {
    "id": "2afcfe00-3813-11ec-b171-9331e3d3aa04",
    "state": "completed",
    "output": {
      "diff": {
        "addedTags": {},
        "removedTags": {
          "dob": [
            "Discovered.Entity.Date",
            "Discovered.Entity.Date of Birth",
            "Discovered.Identifier Indirect",
            "Discovered.PHI",
            "Discovered.PII"
          ],
          "ssn": [
            "Discovered.Country.US",
            "Discovered.Entity.Social Security Number",
            "Discovered.Identifier Direct",
            "Discovered.PHI"
          ],
          "state": [
            "Discovered.Country.US",
            "Discovered.Entity.Location",
            "Discovered.Entity.State",
            "Discovered.Identifier Indirect"
          ],
          "gender": [
            "Discovered.Entity.Gender",
            "Discovered.Identifier Indirect",
            "Discovered.PHI",
            "Discovered.PII"
          ],
          "date_of_service": [
            "Discovered.Entity.Date",
            "Discovered.Identifier Indirect",
            "Discovered.PHI",
            "Discovered.PII"
          ]
        }
      },
      "sddTagResult": {
        "ssn": [
          "Discovered.PII"
        ]
      }
    }
  }
}

Trigger SDD in the Immuta UI

Select a data source from your My Data Sources page.
Click the Health Check dropdown menu.
In the Sensitive Data Discovery (SDD) section, click Re-run.

What's next

Continue to one of the following tutorials:

Run sensitive data discovery on data sources: Trigger SDD to run on specified data sources.
Create a template: Although only data governors can create identifiers, data owners can add identifiers to templates, which they then apply to their data sources to override minConfidence or tags for identifiers within the template.
Create a custom identifier: Data governors can create custom identifiers to define their own regular expressions, dictionaries, and tags that SDD will use to discover and tag data.