1 of 18

Manage Tags

Create Tags

Create tags

Click the Governance icon in the navigation menu and select the Tags tab.
Click Add Tags in the top right corner.
Complete the Enter tag name field.
Additional nested tags are optional. These nested tags follow a tree structure. There are parent, sibling, and child tags. Click Remove Tag to remove a nested tag.
Click Save.

Deleting tags from the governance page will not remove them from data sources

Deleting a tag from the governance page only means it cannot be used on data sources in the future. To remove a tag from a data source, delete it from the data source directly. This design prevents mass exposure of data from just the deletion of a tag.

View data source tags

Click the Data icon in the navigation menu and select the Data Sources tab.
Select a data source.
Navigate to the Data Dictionary tab.
Hover over tags for metadata or click on a tag to open the side sheet with information about the tag.

View all tags

Click the Governance icon in the navigation menu and select the Tags tab.
A list of all top-level tags will be displayed. Click the expand arrow to view nested tags.
Click the tag itself or the icon in the Actions column to edit tags, generate tag reports, or delete tags.

Import tags from an external catalog

You can pull external tags that you had previously defined in the external catalog (e.g., Collibra, Snowflake, etc.).

Click the Governance icon in the navigation menu and select the Tags tab.
Click Refresh External Tags.

Link an external catalog to a data source

External tags will be automatically detected when you create a new data source that originates in an external catalog, or they can be linked directly from the data source overview page.

Custom REST catalog

When using custom REST catalogs, the GET/dataSource/page/{id} endpoint returns a human-readable information page from the REST catalog for the data source associated with {id}. Immuta provides this as a mechanism for allowing the REST catalog to provide additional information about the data source that may not be directly ingested by or visible within Immuta. This link is accessed in the Immuta UI when a user clicks the catalog logo associated with the data source on the data source overview page.

Add Tags to Data Sources and Projects

Use case

Compliance Requirement: Users can only interact with Dev data, and all personal information should be redacted for everyone, except for queries run in Test and Prod.

For this requirement, data owners need to tag data sources with the corresponding environment tag and verify the accuracy of to ensure that the global policies written by data governors are enforced on the appropriate data sources.

Add tags to data sources

Click the Data icon in the navigation menu and select the Data Sources tab.
Select a data source.
Click the Add Tags button at the bottom of the Overview tab.
Begin typing a tag name in the Search by Name field and select the tag from the dropdown list.
Click Add. A list of the applied tags will populate at the bottom of the Overview tab.
Repeat as necessary for other data sources and tags.

Verify discovered tags

If sensitive data discovery is enabled by an application admin on the , Immuta will automatically discover sensitive data and tag those columns with Discovered tags when the data source is created. For more information about this feature, see the .

Best practice: Verify discovered tags

If sensitive data discovery has been enabled, then manually adding tags to columns in the data dictionary will be unnecessary in most cases. The data owner will just need to verify that the Discovered tags are correct.

Disable discovered tags from the data dictionary

If a governor, data owner, or data source expert disables a Discovered tag from the data dictionary, the column will not be re-tagged when the data source's fingerprint is recalculated or SDD is re-run. When a Discovered tag is disabled, the tag will not completely disappear, so it can be manually enabled from the tag side sheet.

To disable a discovered tag,

Navigate to a data source and click the Data Dictionary tab.
Scroll to the column you want to remove the tag from and click the tag you want to remove.
Click Disable in the side sheet and then click Confirm.

Remove tags from data sources

Click the Data icon in the navigation menu and select the Data Sources tab.
Select a data source.
Scroll to the Tags section at the bottom of the Overview tab, and click on the tag you want to remove.
Click Delete in the side sheet and then click Confirm.

Manage data dictionary tags

The data dictionary lists the columns within the data source and the value type of the data within each column. From this page, governors can add tags to or remove them from specific columns in a data source.

Add tags to the data dictionary

Navigate to a data source and click the Data Dictionary tab.
Scroll to the column you want to add a tag to and click Add Tags.
Begin typing in the Search by Name field and select the tag from the dropdown list.
Click Add. The applied tag will appear below the column name in the data dictionary.

Remove tags from the data dictionary

Navigate to a data source and click the Data Dictionary tab.
Scroll to the column you want to remove the tag from and click on the tag you want to delete.
Click Delete in the side sheet and then click Confirm.

Manage project tags

Add tags to projects

Click the Data icon and select Projects in the left sidebar.
Select a project.
Click the Add Tags button at the bottom of the Project Overview tab.
Begin typing in the Search by Name field that appears, and then select the tag from the dropdown list.
Click Add. A list of the applied tags will populate at the bottom of the project overview.

Remove tags from projects

Click the Data icon and select Projects in the left sidebar.
Select a project.
Scroll to the Tags section at the bottom of the Overview tab, and then click the tag you want to delete.
Click Delete in the side sheet and then click Confirm.

Sensitive Data Discovery

Deprecation notice

Support for this feature has been deprecated.

Sensitive data discovery (SDD) is an Immuta feature that uses sensitive data patterns to determine what type of data your column represents. Using identification rules and data samples from your tables, Immuta matches your data and can assign the appropriate tags to your data dictionary. This saves the time of identifying your data manually and provides the benefit of a standard taxonomy across all your data sources in Immuta.

Architecture

SDD works by looking at a sample of data from each table that it checks against templates compiled of built-in or customized identifiers. If an identifier's pattern is matched with a column of the sampled data with an appropriate amount of confidence, then the corresponding tag is applied to that column, signifying the data it contains.

SDD queries a small sample of data for each data source in Immuta. This sample is temporarily held in memory to check for identifier matches. Then Immuta applies the relevant tags to those columns where matches were found.

This sampling and tagging process will happen anytime SDD is run. SDD can be triggered through the Immuta CLI, through the API, or in the Immuta UI on the data sources overview page. SDD will also run automatically anytime one of the following events occurs:

A new data source is created.
Schema detection is enabled and a new data source is detected.
Column detection is enabled and new columns are detected. Here, SDD will only run on new columns and no existing tags will be removed or changed.

Components

Sensitive data discovery (SDD) comprises two major elements: identifiers and templates.

Identifier

The identifier is the basic building block of SDD. Each identifier in Immuta is a unique pattern (e.g., a regex or a list of values) and a list of tags to apply to data that matches the pattern. When Immuta recognizes that pattern, it can understand the type of data and tag the data to describe the type. For example, Immuta has the built-in identifier US_SOCIAL_SECURITY_NUMBER. Immuta will use a regex to look for strings of exactly nine digits, with or without hyphens after the third and fifth digits, with a leading digit between 0 and 8. SDD then scores columns by the percentage of values that match the pattern defined. This score determines whether or not the configured tags will be applied to a column. Once it finds a column that fits the expected pattern of US_SOCIAL_SECURITY_NUMBER with a reasonable match score, it will know how to tag it.

There are two types of identifiers:

Built-in identifier: These identifiers are included with Immuta and discover common categories of data (such as social security numbers, zip codes, and routing numbers). They cannot be modified. Users can list built-in identifiers through the Immuta API or view the Built-in identifiers reference page.
Custom identifier: Custom identifiers allow data governors to create their own regular expressions, dictionaries, and tags that SDD will use to discover and tag data.

By default, all identifiers are matched against data sources when SDD is triggered, unless a template is applied to a data source.

Supported identifier types

The three types of identifiers are described below:

Regex identifier: This identifier contains a case-insensitive regular expression that allows users to match a custom regex against column values.
Column name regex identifier: This identifier includes a case-insensitive regular expression that is only matched against column names, not against the values in the column.
Dictionary identifier: This identifier contains a list of words and phrases to match against column values.

Templates

A template is a collection of identifiers and settings that drive the configuration of SDD runs. The settings users can apply through templates include the following:

classifiers (identifiers) are applied to data sources in the SDD run.
tags is an optional override for the tags applied by the identifiers.
minConfidence is an optional override for the minConfidence established in the identifier(s). When the detection confidence is at least the percentage defined in minConfidence, tags are applied.
sampleSize is an optional override for how many records to sample from the data source.

Users may apply a template globally or to a specific set of data sources. When SDD is triggered on a data source, it will use the identifiers and settings in its configured template to run the detection job. If no template has been configured, SDD will use the global settings. By default, the global settings will use all identifiers in the system to run the detection.

Considerations

SDD does not run on data sources with over 1600 columns.
Deleting the built-in Discovered tags is not recommended: If you do delete built-in Discovered tags and use SDD, when the identifier is detected, the column will not be tagged. Tags can be disabled on a column-by-column basis from the data dictionary, or SDD can be turned off on a data-source-by-data-source basis when creating a data source.

Configure and customize SDD

To configure settings and customize SDD, see the SDD pre-configuration page.

SDD Pre-Configuration Details

Only application admins can on the Immuta app settings page. Then, data source creators can disable SDD on a data-source-by-data-source basis. Additionally, governors, data source owners, and data source experts can disable any unwanted Discovered tags in the data dictionary to prevent them from being used and auto-tagged on that data source in the future.

Configurable global settings

Global template

When SDD is triggered on a data source, the job is run for the identifiers within the set template. If a template is not set, the identifier and template within the SDD job are defined by the global setting. By default, the global setting will run for all identifiers in the system. However, a system administrator can instead.

An active global template cannot be deleted.

Sample size

SDD uses a sample of data to assess the likelihood that a column contains data that fits the pattern specified in the configured identifiers.

The default for SDD is to sample 1000 records (the sample size) during this process. However, administrators can taken by SDD on the Immuta app settings page. In general, increasing the sample size increases the accuracy of SDD predictions, but decreasing the number of records sampled during SDD may be necessary to meet some organizations' compliance requirements.

Tag mutability

When SDD is triggered by a data owner, all column tags that were previously applied by SDD are removed and the tags prescribed by the latest run are applied. However, if SDD is triggered because a new column is detected by schema monitoring, tags will only be applied to the new column, and no tags will be modified on existing columns.

Dry run

Users can also configure SDD to do a dryRun, which allows them to see what tags would be applied to a data source without actually applying them. See the for details.

SDD workflow

Two common workflows for using SDD are outlined below. The first illustrates how to apply a single global template to all data sources, while the second outlines how users can create and apply templates to data sources they own.

Workflow 1: Apply a global template to all data sources

Workflow 2: Apply a template to a specific data source

Data governor creates one or more custom identifiers:

Customize Sensitive Data Discovery

Enable and Manage Global SDD Settings

Enable Sensitive Data Discovery

Click the App Settings icon in the left sidebar.
Click Sensitive Data Discovery in the left panel to navigate to that section.
Select the checkbox to enable SDD, and then click Save and Confirm to apply your changes.

Configure a Global Template

Click the App Settings icon in the left sidebar.
Click Sensitive Data Discovery in the left panel to navigate to that section.
Enter the name of your global template in the Global SDD Template Name field.
Click Save, and then Confirm your changes.

Configure a Default Sample Size

When a sample size is not specified in a template, SDD will use the default sample size of 1000 records. To adjust the sample size,

Click the App Settings icon in the left sidebar.
Click Sensitive Data Discovery in the left panel to navigate to that section.
Enter the number of rows in a data source you would like sampled when running SDD in the Default SDD Sample Size field.
Click Save, and then Confirm your changes.

Run Sensitive Data Discovery on Data Sources

In previous documentation, identifier is referred to as classifier. The language is being updated to identifier to be more accurate and not conflate meaning with the Immuta data classification and frameworks feature.

Attributes overview

Attributes of all custom identifiers and templates are provided on the Sensitive data discovery API page. However, attributes specific to this section are outlined below.

Attribute

Description

sources

string The name of the data sources to apply the template to.

all

boolean If true, SDD will run on all Immuta data sources. The default is false.

wait

integer The number of seconds to wait for the SDD jobs to finish. The value -1 will wait until the jobs complete. The default is -1.

dryRun

template

string If passed, Immuta will run SDD with this template instead of the applied template on the data source(s). Passing template when dryRun is false will cause an error.

Run SDD on data sources

Specify the data sources you would like to run SDD on, and save the payload in a .json file.
```
{
  "sources": [
    "Insurance Data"
  ]
}
```
Or choose to run SDD on all the data sources in Immuta, and save the payload in a .json file.
```
{
  "all": true
}
```
Trigger SDD using one of these methods:

Immuta CLI

immuta api sdd/run -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/run

If sensitive data discovery was successfully run, you will receive a response similar to this:

{
  "Insurance Data": {
    "id": "d2edc1d0-328c-11ec-9d5a-6793988ccf95",
    "state": "completed",
    "output": {
      "diff": {
        "addedTags": {
          "ssn": [
            "Discovered.PII"
          ],
          "email": [
            "Discovered.PII"
          ]
        },
        "removedTags": {
          "ssn": [
            "Discovered.Country.US"
          ]
        }
      },
      "sddTagResult": {
        "ssn": [
          "Discovered.Entity.Social Security Number",
          "Discovered.Identifier Direct",
          "Discovered.PHI",
          "Discovered.PII"
        ],
        "email": [
          "Discovered.Entity.Electronic Mail Address",
          "Discovered.Identifier Direct",
          "Discovered.PHI",
          "Discovered.PII"
        ]
      }
    }
  }
}

Additional tutorials

Test SDD on a data source

Users can test how SDD will apply tags to their data sources by completing a dryRun, which allows users to test templates and tags:

test templates: If a template is specified in the payload when the dryRun is true, SDD will use this template instead of the template applied to the data source. Note: SDD will error if a template is specified here when dryRun is false.
test tags: Instead of applying tags, SDD just returns the tags that would be applied to the data source. This allows users to evaluate whether or not identifiers or templates are applying tags correctly without updating the data source.

After evaluating whether or not the tags have been applied appropriately, users can then make necessary changes to a template before triggering SDD again.

To complete a dryRun,

Specify the data sources you would like to run sensitive data discovery on and set dryRun to true in the payload in a .json file. Note: You can also apply a template to a data source as a dryRun, like in the example below. However, when dryRun is false, a template cannot be included in the payload. Instead, the template must be added to the data source before running SDD.

{
  "sources": [
    "Medical Claims"
  ],
  "dryRun": true,
  "template": "PII_REVISION"
}

Trigger SDD using one of these methods:

Immuta CLI

immuta api sdd/run -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/run

You will receive a response that illustrates tags that will be added, tags that will be removed, and the final SDD result:

{
  "Medical Claims": {
    "id": "86fc4f70-380f-11ec-a432-81748c911385",
    "state": "completed",
    "output": {
      "diff": {
        "addedTags": {},
        "removedTags": {
          "dob": [
            "Discovered.Entity.Date",
            "Discovered.Entity.Date of Birth",
            "Discovered.Identifier Indirect",
            "Discovered.PHI",
            "Discovered.PII"
          ],
          "ssn": [
            "Discovered.Country.US",
            "Discovered.Entity.Social Security Number",
            "Discovered.Identifier Direct",
            "Discovered.PHI"
          ],
          "state": [
            "Discovered.Country.US",
            "Discovered.Entity.Location",
            "Discovered.Entity.State",
            "Discovered.Identifier Indirect"
          ],
          "gender": [
            "Discovered.Entity.Gender",
            "Discovered.Identifier Indirect",
            "Discovered.PHI",
            "Discovered.PII"
          ],
          "date_of_service": [
            "Discovered.Entity.Date",
            "Discovered.Identifier Indirect",
            "Discovered.PHI",
            "Discovered.PII"
          ]
        }
      },
      "sddTagResult": {
        "ssn": [
          "Discovered.PII"
        ]
      }
    }
  }
}

Once you are satisfied with how tags are applied by SDD, set dryRun to false (or omit it from the payload).

{
  "sources": [
    "Medical Claims"
  ],
  "dryRun": false
}

Trigger SDD again:

Immuta CLI

immuta api sdd/run -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/run

If the request was successful, you will receive a response similar to this one:

{
  "Medical Claims": {
    "id": "2afcfe00-3813-11ec-b171-9331e3d3aa04",
    "state": "completed",
    "output": {
      "diff": {
        "addedTags": {},
        "removedTags": {
          "dob": [
            "Discovered.Entity.Date",
            "Discovered.Entity.Date of Birth",
            "Discovered.Identifier Indirect",
            "Discovered.PHI",
            "Discovered.PII"
          ],
          "ssn": [
            "Discovered.Country.US",
            "Discovered.Entity.Social Security Number",
            "Discovered.Identifier Direct",
            "Discovered.PHI"
          ],
          "state": [
            "Discovered.Country.US",
            "Discovered.Entity.Location",
            "Discovered.Entity.State",
            "Discovered.Identifier Indirect"
          ],
          "gender": [
            "Discovered.Entity.Gender",
            "Discovered.Identifier Indirect",
            "Discovered.PHI",
            "Discovered.PII"
          ],
          "date_of_service": [
            "Discovered.Entity.Date",
            "Discovered.Identifier Indirect",
            "Discovered.PHI",
            "Discovered.PII"
          ]
        }
      },
      "sddTagResult": {
        "ssn": [
          "Discovered.PII"
        ]
      }
    }
  }
}

Trigger SDD in the Immuta UI

Select a data source from your My Data Sources page.
Click the Health Check dropdown menu.
In the Sensitive Data Discovery (SDD) section, click Re-run.

What's next

Continue to one of the following tutorials:

Run sensitive data discovery on data sources: Trigger SDD to run on specified data sources.
Create a template: Although only data governors can create identifiers, data owners can add identifiers to templates, which they then apply to their data sources to override minConfidence or tags for identifiers within the template.
Create a custom identifier: Data governors can create custom identifiers to define their own regular expressions, dictionaries, and tags that SDD will use to discover and tag data.

Create and Apply a Template to a Data Source

Create a template

Generate your API key on the API Keys tab on your profile page and save the API key somewhere secure. You will include this API key in the authorization header when you make a request to the Immuta API.
Find identifiers to include in your template using one of these methods:

Immuta CLI

immuta api sdd/classifier?sortField=name&sortOrder=asc&limit=25&searchText=IDENTIFIER

HTTP API

curl \
    --request GET \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    https://your-immuta-url.immuta.com/sdd/classifier?sortField=name&sortOrder=asc&limit=25&searchText=IDENTIFIER

If the request was successful, you will receive a list of available identifiers.

{
  "count": 3,
  "hits": [
    {
      "createdBy": {
        "id": 1,
        "name": "John",
        "email": "john@example.com"
      },
      "name": "ACCOUNT_NUMBER_IDENTIFIER",
      "displayName": "Account Number Identifier",
      "description": "This identifier recognizes account numbers using a regex",
      "type": "regex",
      "config": {
        "tags": [
          "Discovered.account-number"
        ],
        "regex": "[0-9]{9}-[0-9]{3}-[0-9]{1}",
        "minConfidence": 0.5
      },
      "id": 104,
      "createdAt": "2021-10-20T19:12:24.889Z",
      "updatedAt": "2021-10-20T19:12:24.889Z"
    },
    {
      "createdBy": {
        "id": 1,
        "name": "John",
        "email": "john@example.com"
      },
      "name": "EMPLOYEE_DESK_LOCATION_IDENTIFIER",
      "displayName": "Employee Desk Location Identifier",
      "description": "This identifier detects when an employee's desk location appears in a dataset.",
      "type": "dictionary",
      "config": {
        "tags": [
          "Discovered.desk-location"
        ],
        "values": [
          "Research Lab",
          "Blue Room",
          "Purple Room"
        ],
        "caseSensitive": false,
        "minConfidence": 0.6
      },
      "id": 68,
      "createdAt": "2021-10-20T17:57:51.696Z",
      "updatedAt": "2021-10-20T17:57:51.696Z"
    },
    {
      "createdBy": {
        "id": 1,
        "name": "John",
        "email": "john@example.com"
      },
      "name": "SOCIAL_SECURITY_NUMBER_COLUMNS_IDENTIFIER",
      "displayName": "Social Security Number Columns Identifier",
      "description": "This identifier recognizes column names that match the defined regex pattern.",
      "type": "columnNameRegex",
      "config": {
        "tags": [
          "Discovered.Social Security Numbers"
        ],
        "columnNameRegex": "ssn|social ?security"
      },
      "id": 67,
      "createdAt": "2021-10-20T17:57:17.930Z",
      "updatedAt": "2021-10-20T17:57:17.930Z"
    }
  ]
}

Save the template payload in a .json file. Use the tabs below to see different examples of templates.

{
  "name": "ACCOUNT_NUMBERS_TEMPLATE",
  "displayName": "Account Numbers Template",
  "description": "This template contains the identifier that recognizes account numbers.",
  "classifiers": [
    {
      "name": "ACCOUNT_NUMBER_IDENTIFIER"
    }
  ],
  "sampleSize": 100
}

{
  "name": "EMPLOYEE_DESK_LOCATION_TEMPLATE",
  "displayName": "Employee Desk Location Template",
  "description": "This template contains the identifier that detects when the name of the room an employee's desk is in appears in a dataset.",
  "classifiers": [
    {
      "name": "EMPLOYEE_DESK_LOCATION_IDENTIFIER"
    }
  ],
  "sampleSize": 100
}

{
  "name": "SOCIAL_SECURITY_NUMBERS_TEMPLATE",
  "displayName": "Social Security Numbers Template",
  "description": "This template contains the identifier that matches social security number column names with the defined regex.",
  "classifiers": [
    {
      "name": "SOCIAL_SECURITY_NUMBER_COLUMNS_IDENTIFIER"
    }
  ],
  "sampleSize": 100
}

{
  "name": "STUDENT_LOCATION_TEMPLATE",
  "displayName": "Student Location Template",
  "description": "This template contains the identifier that detects when a student's residence hall, floor, or room appears in a dataset.",
  "classifiers": [
    {
      "name": "STUDENT_LOCATION_IDENTIFIER"
    }
  ],
  "sampleSize": 100
}

Create the template:

Immuta CLI

immuta api sdd/template -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/template

If the request is successful, you will receive a response that contains details about the template. Use the tabs below to see different responses for different templates.

{
  "name": "ACCOUNT_NUMBERS_TEMPLATE",
  "displayName": "Account Numbers Template",
  "description": "This template contains the identifier that recognizes account numbers.",
  "sampleSize": 100,
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "id": 1,
  "createdAt": "2021-10-21T19:12:22.092Z",
  "updatedAt": "2021-10-21T19:12:22.092Z",
  "classifiers": [
    {
      "name": "ACCOUNT_NUMBER_IDENTIFIER",
      "overrides": {}
    }
  ]
}

After the template is applied to data sources and sensitive data discovery is run, the Discovered.account-number tag will be applied to columns that Immuta identifies with 50% confidence, as configured in the identifier.

{
  "name": "EMPLOYEE_DESK_LOCATION_TEMPLATE",
  "displayName": "Employee Desk Location Template",
  "description": "This template contains the identifier that detects when the name of the room an employee's desk is in appears in a dataset.",
  "sampleSize": 100,
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "id": 1,
  "createdAt": "2021-10-21T18:03:58.967Z",
  "updatedAt": "2021-10-21T18:03:58.967Z",
  "classifiers": [{
    "name": "EMPLOYEE_DESK_LOCATION_IDENTIFIER",
    "overrides": {}
  }]
}

After the template is applied to data sources and sensitive data discovery is run, the Discovered.desk-location tag will be applied to columns when Immuta detects the values Research Lab, Blue Room or Purple Room with 60% confidence, as configured in the identifier.

{
  "name": "SOCIAL_SECURITY_NUMBERS_TEMPLATE",
  "displayName": "Social Security Numbers Template",
  "description": "This template contains the identifier that matches social security number column names with the defined regex.",
  "sampleSize": 100,
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "id": 2,
  "createdAt": "2021-10-21T19:12:22.092Z",
  "updatedAt": "2021-10-21T19:12:22.092Z",
  "classifiers": [
    {
      "name": "SOCIAL_SECURITY_NUMBER_COLUMNS_IDENTIFIER",
      "overrides": {}
    }
  ]
}

After the template is applied to data sources and sensitive data discovery is run, the Discovered.social-security-number tag will be applied to columns that have a name that match the ssn|social ?security regex, such as ssn, socialsecurity, or social security.

{
  "name": "STUDENT_LOCATION_TEMPLATE",
  "displayName": "Student Location Template",
  "description": "This template contains the identifier that detects when a student's residence hall, floor, or room appears in a dataset.",
  "sampleSize": 100,
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "id": 1,
  "createdAt": "2021-10-21T18:03:58.967Z",
  "updatedAt": "2021-10-21T18:03:58.967Z",
  "classifiers": [{
    "name": "STUDENT_LOCATION_IDENTIFIER",
    "overrides": {}
  }]
}

After the template is applied to data sources and sensitive data discovery is run, the Discovered.residence-hall tag will be applied to columns when Immuta detects values that match those listed in the Residence Halls data source with 70% confidence, as configured in the identifier.

Apply a template to data sources

Attributes of all custom identifiers and templates are provided on the Sensitive data discovery API page. However, attributes specific to this section are outlined in the table below.

Attribute

Description

template

string The name of the template to apply to the data sources; null clears the current template.

sources

string The name of the data sources to apply the template to.

Find templates to apply to your data sources:

Immuta CLI

immuta api sdd/template

HTTP API

curl \
    --request GET \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    https://your-immuta-url.immuta.com/sdd/template

If the request was successful, you will receive a list of available templates.

{
  "count": 3,
  "hits": [
    {
      "name": "ACCOUNT_NUMBERS_TEMPLATE",
      "displayName": "Account Numbers Template",
      "description": "This template contains the identifier that recognizes account numbers.",
      "sampleSize": 100,
      "createdBy": {
        "id": 1,
        "name": "John",
        "email": "john@example.com"
      },
      "id": 2,
      "createdAt": "2021-10-20T19:13:35.319Z",
      "updatedAt": "2021-10-20T19:13:35.319Z",
      "classifiers": [
        {
          "name": "ACCOUNT_NUMBER_IDENTIFIER",
          "overrides": {}
        }
      ]
    },
    {
      "name": "EMPLOYEE_DESK_LOCATION_TEMPLATE",
      "displayName": "Employee Desk Location Template",
      "description": "Contains identifier that detects when the name of a room a desk is in appears in a dataset.",
      "sampleSize": 100,
      "createdBy": {
        "id": 1,
        "name": "John",
        "email": "john@example.com"
      },
      "id": 1,
      "createdAt": "2021-10-20T18:03:58.967Z",
      "updatedAt": "2021-10-20T18:03:58.967Z",
      "classifiers": [
        {
          "name": "EMPLOYEE_DESK_LOCATION_IDENTIFIER",
          "overrides": {}
        }
      ]
    },
    {
      "name": "SOCIAL_SECURITY_NUMBERS_TEMPLATE",
      "displayName": "Social Security Numbers Template",
      "description": "Contains identifier that matches ssn column names with the defined regex.",
      "sampleSize": 100,
      "createdBy": {
        "id": 1,
        "name": "John",
        "email": "john@example.com"
      },
      "id": 3,
      "createdAt": "2021-10-20T19:13:58.359Z",
      "updatedAt": "2021-10-20T19:13:58.359Z",
      "classifiers": [
        {
          "name": "SOCIAL_SECURITY_NUMBER_COLUMNS_IDENTIFIER",
          "overrides": {}
        }
      ]
    }
  ]
}

Select an appropriate template to apply to your data sources, and save the payload in a .json file:

{
  "template": "ACCOUNT_NUMBERS_TEMPLATE",
  "sources": [
    "Insurance Data"
  ]
}

Apply the template to your data source(s):

Immuta CLI

immuta api sdd/template/apply -X PUT --input ./example-payload.json

HTTP API

curl \
    --request PUT \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/template/apply

You will receive a response that indicates whether or not the template was successfully applied to your data sources.

{
  "success": true
}

Additional tutorials

Clone a template

Users cannot modify templates created by other data owners, but they can clone templates and make changes to the clone.

Get a list of templates to determine the template you want to clone using one of these methods:

Immuta CLI

immuta api sdd/sdd/template?sortField=name&sortOrder=asc&offset=5&limit=5

HTTP API

curl \
    --request GET \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    https://your-immuta-url.immuta.com/sdd/template?sortField=name&sortOrder=asc&offset=5&limit=5

Save the template clone name and details in a .json file.

{
  "name": "INSURANCE_ACCOUNT_NUMBERS",
  "displayName": "Insurance Account Numbers",
  "description": "This template is specific to insurance accounts."
}

Clone the template:

Immuta CLI

immuta api sdd/template/ACCOUNT_NUMBERS_TEMPLATE/clone -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/template/ACCOUNT_NUMBERS_TEMPLATE/clone

If the request was successful, you will receive a response that provides details about the template clone.

{
  "name": "INSURANCE_ACCOUNT_NUMBERS",
  "displayName": "Insurance Account Numbers",
  "description": "This template is specific to insurance accounts.",
  "sampleSize": 100,
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "id": 4,
  "createdAt": "2021-10-20T20:48:37.805Z",
  "updatedAt": "2021-10-20T20:48:37.805Z",
  "classifiers": [
    {
      "name": "ACCOUNT_NUMBER_IDENTIFIER",
      "overrides": {}
    }
  ]
}

You can now modify the template, such as changing the identifiers (classifiers) included and the sampleSize.

Configure entity tags and confidence

To disable entity tags from being set, you can create a template to that configures the identifier that contains that tag.

For example, the built-in PERSON_NAME identifier contains the following tags: Discovered.PHI, Discovered.PII, Discovered.Entity.Person Name, and Discovered.Identifier Indirect. However, your organization doesn't have any health data, so you don't want the PHI tag to be applied to your data sources but you do want all the other tags within that identifier.

To override the Discovered.PHI tag, you would create a template that includes the PERSON_NAME identifier and removes the Discovered.PHI from the list of tags in the template payload.

View the details about the PERSON_NAME identifier so you know what to include in your template using one of these methods:

Immuta CLI

immuta api sdd/classifier?sortField=name&sortOrder=asc&limit=25&searchText=PERSON_NAME

HTTP API

curl \
    --request GET \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    https://your-immuta-url.immuta.com/sdd/classifier?sortField=name&sortOrder=asc&limit=25&searchText=PERSON_NAME

If the request was successful, the response will include details about the PERSON_NAME identifier.

{
  "createdBy": {
    "id": 21,
    "name": "Immuta System Account",
    "email": "immuta_system@immuta.com"
  },
  "name": "PERSON_NAME",
  "displayName": "Person Name",
  "description": "Detects strings consistent with a dictionary of people's names.",
  "type": "builtIn",
  "config": {
    "tags": [
      "Discovered.PHI",
      "Discovered.PII",
      "Discovered.Entity.Person Name",
      "Discovered.Identifier Indirect"
    ],
    "minConfidence": 0.3
  },
  "id": 54,
  "createdAt": "2021-10-21T07:35:14.416Z",
  "updatedAt": "2021-10-21T12:57:43.919Z"
}

Remove the Discovered.PHI tag from the list of tags in the identifier config, and save the template payload in a .json file.

{
  "name": "PERSON_NAME_OVERRIDE",
  "displayName": "Person Name Override",
  "description": "This template removes the PHI tag from the PERSON_NAME identifier.",
  "classifiers": [
    {
      "name": "PERSON_NAME",
        "overrides": {
          "tags": [
            "Discovered.PII",
            "Discovered.Entity.Person Name",
            "Discovered.Identifier Indirect"
          ]
        }
      }
    ],
  "sampleSize": 100
}

Create the template:

Immuta CLI

immuta api sdd/template -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/template

If the request is successful, you will receive a response that details the new template:

{
  "name": "PERSON_NAME_OVERRIDE",
  "displayName": "Person Name Override",
  "description": "This template removes the PHI tag from the PERSON_NAME identifier.",
  "sampleSize": 100,
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "id": 1,
  "createdAt": "2021-10-21T17:11:18.057Z",
  "updatedAt": "2021-10-21T17:11:18.057Z",
  "classifiers": [
    {
      "name": "PERSON_NAME",
      "overrides": {
        "tags": [
          "Discovered.PII",
          "Discovered.Entity.Person Name",
          "Discovered.Identifier Indirect"
        ]
      }
    }
  ]
}

What's next

Now that you've created a template, continue to one of the following tutorials:

SDD global settings: Opt to add your template to the SDD global settings so that Immuta will use this template to run SDD for all data sources.
Run sensitive data discovery on a data source

Create a Custom Identifier

Create a Regex Identifier

Use case: Custom regex identifier

Scenario: You've for sensitive data discovery, but you discover there is no identifier that can automatically identify and tag columns that contain account numbers in your database.

A regular expression (regex) custom identifier allows you to create your own rules that enable Immuta's sensitive data discovery to find matches based on a regex pattern. For example, if a table contains account numbers in the form of xxxxxxxxx-xxx-x, you could define a regex pattern in a custom identifier to identify and tag these columns. The tutorial below uses this scenario to illustrate creating this identifier.

Attributes of the custom regex identifier

Attributes of all custom identifiers are provided on the . However, attributes specific to the custom regex identifier are outlined in the table below.

Attribute

Description

Required

Create a custom regex identifier

Generate your API key on the and save the API key somewhere secure. You will include this API key in the authorization header when you make a request to the Immuta API or use it to .

Save the custom regex identifier payload in a .json file.

{
  "name": "ACCOUNT_NUMBER_IDENTIFIER",
  "displayName": "Account Number Identifier",
  "description": "This identifier recognizes account numbers using a regex",
  "type": "regex",
  "config": {
    "regex": "^[0-9]{9}-[0-9]{3}-[0-9]{1}$",
    "minConfidence": 0.5,
    "tags": ["Discovered.account-number"]
  }
}

Create the identifier using one of these methods:

Immuta CLI

immuta api sdd/classifier -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/classifier

If the request is successful, you will receive a response that contains details about the identifier.

{
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "name": "ACCOUNT_NUMBER_IDENTIFIER",
  "displayName": "Account Number Identifier",
  "description": "This identifier recognizes account numbers using a regex",
  "type": "regex",
  "config": {
    "tags": [
      "Discovered.account-number"
    ],
    "regex": "[0-9]{9}-[0-9]{3}-[0-9]{1}",
    "minConfidence": 0.5
  },
  "id": 1,
  "createdAt": "2021-10-14T18:48:56.289Z",
  "updatedAt": "2021-10-14T18:48:56.289Z"
}

What's next

Continue to one of the following tutorials:

Create a Column Name Regex Identifier

Use case: Custom column name regex identifier

Scenario: You've listed Immuta's built-in identifiers for sensitive data discovery, but you discover there is no identifier that can automatically detect and tag columns that contain account numbers in your database.

A custom column name regular expression (regex) identifier allows you to create your own detectors that enable Immuta's sensitive data discovery to find column name matches based on a regex pattern. For example, if your database contains tables with social security numbers, you could define a regex pattern to match against the names of the column instead of the values within the column. The tutorial below uses this scenario to illustrate creating this identifier.

Attributes of the custom column name regex identifier

Attributes of all custom identifiers are provided on the Sensitive data discovery API page. However, attributes specific to the custom column name regex identifier are outlined in the table below.

Attribute

Description

Required

name

string Unique, request-friendly identifier name.

Yes

displayName

string Unique, human-readable identifier name.

Yes

description

string The identifier description.

Yes

type

string The type of identifier: columnNameRegex.

Yes

config

object Includes config.columnNameRegex and config.tags. *See descriptions for these below.

Yes

tags*

array[string] The name of the tags to apply to the data source. Note: All tags must start with Discovered..

Yes

columnNameRegex*

string A case-insensitive regular expression to match against column names.

Yes

Create a custom column name regex identifier

Generate your API key on the API Keys tab on your profile page and save the API key somewhere secure. You will include this API key in the authorization header when you make a request to the Immuta API or use it to configure your instance with the Immuta CLI.

Save the custom column name regex identifier payload in a .json file. The regex ^ssn|social ?security$ looks for column names that match ssn, socialsecurity, or social security.

{
  "name": "SOCIAL_SECURITY_NUMBER_COLUMNS_IDENTIFIER",
  "displayName": "Social Security Number Columns Identifier",
  "description": "This identifier recognizes column names that match the defined regex pattern.",
  "type": "columnNameRegex",
  "config": {
    "columnNameRegex": "^ssn|social ?security$",
    "tags": ["Discovered.Social Security Numbers"]
  }
}

Create the identifier using one of these methods:

Immuta CLI

immuta api sdd/classifier -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/classifier

If the request is successful, you will receive a response that contains details about the identifier.

{
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "name": "SOCIAL_SECURITY_NUMBER_COLUMNS_IDENTIFIER",
  "displayName": "Social Security Number Columns Identifier",
  "description": "This identifier recognizes column names that match the defined regex pattern.",
  "type": "columnNameRegex",
  "config": {
    "tags": [
      "Discovered.Social Security Number"
    ],
    "columnNameRegex": "^ssn|social ?security$"
  },
  "id": 2,
  "createdAt": "2021-10-14T18:48:56.289Z",
  "updatedAt": "2021-10-14T18:48:56.289Z"
}

What's Next

Continue to one of the following tutorials:

Run sensitive data discovery on data sources: Trigger SDD to run on specified data sources.
Create a template: Although only data governors can create identifiers, data owners can add identifiers to templates, which they then apply to their data sources to override minConfidence or tags for identifiers within the template.

Create a Dictionary Identifier

Use case: Custom dictionary identifier

Scenario: You have data that includes the names of the rooms employees' desks are in across your organization. Although these locations may be considered sensitive in particular datasets, they would not be recognized by Immuta's .

A custom dictionary identifier allows you to create your own rules that enable Immuta's sensitive data discovery to match a list of room names to values in the dataset. The tutorial below uses this scenario to illustrate creating this identifier.

Attributes of the custom dictionary identifier

Attributes of all custom identifiers are provided on the . However, attributes specific to the custom dictionary identifier are outlined in the table below.

Attribute

Description

Create a custom dictionary identifier

Generate your API key on the and save the API key somewhere secure. You will include this API key in the authorization header when you make a request to the Immuta API or use it to .

Save the custom dictionary identifier payload in a .json file. The dictionary below contains the words Research Lab, Blue Room, and Purple Room.

{
  "name": "EMPLOYEE_DESK_LOCATION_IDENTIFIER",
  "displayName": "Employee Desk Location Identifier",
  "description": "This identifier detects when an employee's desk location appears in a dataset.",
  "type": "dictionary",
  "config": {
    "values": ["Research Lab", "Blue Room", "Purple Room"],
    "caseSensitive": false,
    "minConfidence": 0.6,
    "tags": ["Discovered.desk-location"]
  }
}

Create the identifier using one of these methods:

Immuta CLI

immuta api sdd/classifier -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/classifier

If the request is successful, you will receive a response that contains details about the identifier.

{
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "name": "EMPLOYEE_DESK_LOCATION_IDENTIFIER",
  "displayName": "Employee Desk Location Identifier",
  "description": "This identifier detects when an employee's desk location appears in a dataset.",
  "type": "dictionary",
  "config": {
    "tags": [
      "Discovered.desk-location"
    ],
    "values": [
      "Research Lab",
      "Blue Room",
      "Purple Room"
    ],
    "caseSensitive": false,
    "minConfidence": 0.6
  },
  "id": 68,
  "createdAt": "2021-10-20T17:57:51.696Z",
  "updatedAt": "2021-10-20T17:57:51.696Z"
}

What's next

Continue to one of the following tutorials:

List Built-In Identifier

Attributes

Attributes of identifiers and templates are provided on the . However, attributes specific to listing identifiers are outlined in the table below.

Attribute

Description

Response details

The response lists all built-in identifiers that are currently supported in Immuta SDD and their details, including their name and description. For example,

{
  "count": 67,
  "hits": [
    {
      "createdBy": {
        "id": 21,
        "name": "Immuta System Account",
        "email": "immuta_system@immuta.com"
      },
      "name": "AGE",
      "displayName": "Age",
      "description": "Detects numeric strings between 10 and 199, provided the column header contains text such as `age`, `year`, `years`, `yr`, or `yrs`.",
      "type": "builtIn",
      "config": {
        "minConfidence": 0.7,
        "tags": [
          "Discovered.PII",
          "Discovered.Identifier Indirect",
          "Discovered.PHI",
          "Discovered.Entity.Age"
        ],
        "conditionalTags": {}
      },
      "id": 3,
      "createdAt": "2021-10-28T07:34:58.761Z",
      "updatedAt": "2021-10-28T07:34:58.761Z"
    }
  ]
}

List built-in identifiers

List built-in identifiers using one of these methods:

Immuta CLI

immuta api sdd/classifier?sortField=name&sortOrder=asc&limit=100&type=builtIn

HTTP API

curl \
    --request GET \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    https://your-immuta-url.immuta.com/sdd/classifier?sortField=name&sortOrder=asc&limit=100&type=builtIn

What's next

Create a custom identifier: Data governors can create custom identifiers to define their own regular expressions, dictionaries, and tags that SDD will use to discover and tag data.

Reference Guides

Built-In Discovered Tags

Immuta is pre-configured with a set of tags that can be used to write global policies before data sources even exist. See a list of the built-in Discovered tags below and the Built-in identifier reference page for information about where these tags will be applied.

Country tags

All the tags below belong to the Country parent. For example, the full tag name will appear as Discovered . Country . Argentina.

Child tag name

Description

Argentina

This tag is applied to data recognized as specific to Argentina (e.g., an Argentina National Identity Number).

Australia

This tag is applied to data recognized as specific to Australia (e.g., an Australian Medicare number, Australian passport number, or Australian Tax File number).

Belgium

This tag is applied to data recognized as specific to Belgium (e.g., a Belgium National ID card).

Brazil

This tag is applied to data recognized as specific to Brazil (e.g., a Brazil CPF number).

Canada

This tag is applied to data recognized as specific to Canada (e.g., a British Columbia PHN, Canadian driver's license number, OHIP string, Canadian passport number, Quebec's HIN, or Canadian Social Insurance number).

Chile

This tag is for data specific to Chile.

China

This tag is for data specific to China.

Colombia

This tag is for data specific to Colombia.

Denmark

This tag is applied to data recognized as specific to Denmark (e.g., a Denmark CPR or Person-number).

Finland

This tag is applied to data recognized as specific to Finland (e.g., a Finland National ID number).

France

This tag is applied to data recognized as specific to France (e.g., a French National ID card number, France National ID number, or French passport number).

Germany

This tag is applied to data recognized as specific to Germany (e.g., a German driver's license number or a Germany Identity Card number).

Hong Kong

This tag is for data specific to Hong Kong.

India

This tag is for data specific to India.

Indonesia

This tag is for data specific to Indonesia.

Japan

This tag is for data specific to Japan.

Korea

This tag is for data specific to Korea.

Mexico

This tag is for data specific to Mexico.

Netherlands

This tag is for data specific to Netherlands.

Norway

This tag is for data specific to Norway.

Paraguay

This tag is for data specific to Paraguay.

Peru

This tag is for data specific to Peru.

Poland

This tag is for data specific to Poland.

Singapore

This tag is for data specific to Singapore.

Spain

This tag is applied to data recognized as specific to Spain (e.g., a Spanish driver's license number, Spain Foreigner Identification number, Spain Tax Identification number, or Spanish passport number).

Sweden

This tag is applied to data recognized as specific to Sweden (e.g., a Sweden National ID number or Swedish passport number).

Taiwan

This tag is for data specific to Taiwan.

Thailand

This tag is applied to data recognized as specific to Thailand (e.g., a Thailand National ID number).

Turkey

This tag is for data specific to Turkey.

This tag is applied to data recognized as specific to United Kingdom (e.g., a United Kingdom driver's license number, United Kingdom National Insurance number, United Kingdom passport number, or United Kingdom Taxpayer Reference number).

Uruguay

This tag is for data specific to Uruguay.

This tag is applied to data recognized as specific to the U.S. (e.g., an FDA code, United States ATIN, ABA routing number, DEA number, United States driver's license number, United States EIN, United States NPI number, United States ITIN, United States passport number, United States Preparer Taxpayer ID number, United States SSN, United States territory or state, or United States toll-free phone number).

Venezuela

This tag is for data specific to Venezuela.

Entity tags

All the tags below belong to the Entity parent. For example, the full tag name will appear as Discovered . Entity . Aadhaar Individual.

Child tag name

Description

Aadhaar Individual

This tag is for Aadhaar Individual numbers.

Adoption Taxpayer ID Number

This tag is applied to data recognized as a United States Adoption Taxpayer Identification number.

Age

This tag is applied to data recognized as an age.

Bank Account

This tag is for bank account numbers.

Bank Routing MICR

This tag is applied to data recognized as an American Bankers Association routing number.

Bankers CUSIP ID

This tag is for CUSP identification numbers for stocks and bonds.

British Columbia Health Network Number

This tag is applied to data recognized as British Columbia's Personal Health Number.

BSN Number

This tag is for Netherlands citizen service number.

BSN Number

This tag is for Netherlands citizen service numbers.

CDC Number

This tag is for CDC numbers.

CDI Number

This tag is for CDI numbers.

CIC Number

This tag is for CIC numbers.

CNI

This tag is applied to data recognized as a French National ID card number.

CPF Number

This tag is applied to data recognized as Brazil's CPF number.

CPR Number

This tag is applied to data recognized as Denmark's Personal Identification number.

Credit Card Number

This tag is applied to data recognized as a credit card number.

CURP Number

This tag is for Mexican CURP numbers.

CRYPTO

This tag is applied to data recognized as a Bitcoin Invoice Address.

Date

This tag is applied to data recognized as a date.

Date of Birth

This tag is applied to data recognized as a date of birth.

DEA Number

This tag is applied to data recognized as the DEA number of a healthcare provider.

DNI Number

This tag is applied to data recognized as an Argentina National Identity number.

Domain Name

This tag is applied to data recognized as a domain.

Driver's License Number

This tag is applied to data recognized as driver's licenses numbers from Canada, Germany, Spain, United Kingdom, or the United States.

Electronic Mail Address

This tag is applied to data recognized as an email address.

Employer ID Number

This tag is applied to data recognized as an Employer Identification number from the United States.

Ethnic Group

This tag is applied to data recognized as an ethnic group.

FDA Code

This tag is applied to data recognized as the code of a drug or ingredient registered with the FDA.

Gender

This tag is applied to data recognized as a gender.

GST Individual

This tag is for Indian GST individual numbers.

Healthcare NPI

This tag is applied to data recognized as a United States National Provider Identifier number.

IBAN Code

This tag is applied to data recognized as an International Bank Account number.

ICD10 Code

This tag is applied to data recognized as an ICD10 code from the International Statistical Classification of Diseases and Related Health Problems.

ICD9 Code

This tag is for ICD9 codes from the International Statistical Classification of Diseases and Related Health Problems.

ID Number

This tag is for any ID number.

Identity Card Number

This tag is applied to data recognized as an identity card number from Germany.

IMEI

This tag is applied to data recognized as an International Mobile Equipment Identity number.

Individual Number

This tag is for any individual number.

Individual Taxpayer ID Number

This tag is applied to data recognized as a United States Individual Taxpayer Identification Number.

IP Address

This tag is applied to data recognized as an IP address.

Location

This tag is applied to data recognized as a country, state, address, or municipality.

MAC Address

This tag is applied to data recognized as a Media Access Control address.

MAC Address Local

This tag is applied to data recognized as a local Media Access Control address.

Medicare Number

This tag is applied to data recognized as a Medicare number from Australia.

National Health Service Number

This tag is for national health service numbers.

National ID Card Number

This tag is applied to data recognized as a national ID card number from Belgium.

National ID Number

This tag is applied to data recognized as a national ID number from Finland, Sweden, and Thailand.

National Insurance Number

This tag is applied to data recognized as a United Kingdom national insurance number.

National Registration ID Number

This tag is for national registration ID numbers.

NI Number

This tag is for Norway NI numbers.

NIE Number

This tag is applied to data recognized as a Spanish Foreigner Identification number.

NIF Number

This tag is applied to data recognized as a Spanish Tax Identification number.

NIK Number

This tag is applied to data recognized as an Indonesian personal identification number (NIK).

NIR

This tag is applied to data recognized as France's National ID number.

Ontario Health Insurance Number

This tag is applied to data recognized as part of an Ontario Health Insurance Plan string.

PAN Individual

This tag is for PAN Individual numbers.

Passport

This tag is applied to data recognized as a passport number from Australia, Canada, France, Spain, Sweden, United Kingdom, and United States.

Person Name

This tag is applied to data recognized as people's names.

PESEL Number

This tag is for Poland PESEL numbers.

Postal Code

This tag is applied to data recognized as a United States zip code.

Preparer Taxpayer ID Number

This tag is applied to data recognized as a Preparer Taxpayer ID number.

Quebec Health Insurance Number

This tag is applied to data recognized as a Quebec Health Insurance Number.

Resident ID Number

This tag is for China Resident ID numbers.

RRN

This tag is for Korea Resident Registration numbers.

Social Insurance Number

This tag is applied to data recognized as a Canadian Social Insurance number.

Social Security Number

This tag is applied to data recognized as a United States Social Security Number.

State

This tag is applied to data recognized as a state of the United States.

Swift Code

This tag is applied to data recognized as a SWIFT code.

Tax File Number

This tag is applied to data recognized as an Australian Tax File number.

Taxpayer ID Number

This tag is applied to data recognized as Taxpayer ID numbers from the United States.

Taxpayer Reference

This tag is applied to data recognized as United Kingdom Taxpayer Reference numbers.

Telephone Number

This tag is applied to data recognized as a phone number.

Tollfree Telephone Number

This tag is applied to data recognized as a United States toll-free phone number.

URL

This tag is applied to data recognized as a URL.

Vehicle Identifier or Serial Number

This tag is applied to data recognized as a VIN.

Identifier tags

None of the tags below have an additional parent or child tag. For example, the full tag name will appear as Discovered . Identifier Direct.

Tag name

Description

Identifier Direct

This tag is applied to data recognized as a direct identifier that can be uniquely associated with an individual. Examples of direct identifiers include: name, username, email, official individual identification numbers such as passport or identity card numbers, or privately issued individual identification numbers such as a student ID.

Identifier Indirect

This tag is applied to data recognized as an indirect identifier that is not uniquely associated with an individual. However this indirect identifier could become distinguishable when combined with other attributes. Examples of indirect identifiers include: age and affinity.

Identifier Undetermined

This tag is applied to data which could be an identifier associated with an individual.

Personal information tags

None of the tags below have an additional parent or child tag. For example, the full tag name will appear as Discovered . PCI.

Tag name

Description

PCI

This tag is applied to data recognized as payment card information.

PHI

This tag is applied to data recognized as personal health data.

PII

This tag is applied to data recognized as personally identifiable information.

Built-In Identifiers

Identifier

Description

AGE

Matches numeric strings between 10 and 199, provided the column header contains text such as age, year, years, yr, or yrs. Tags include Discovered.PII, Discovered.Identifier, Indirect Discovered.PHI, Discovered.Entity.Age.

ARGENTINA_DNI_NUMBER

Matches strings consistent with Argentina National Identity (DNI) Number. Requires an eight-digit number with optional periods between the second and third and fifth and sixth digit. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.Argentina, Discovered.PHI, Discovered.Entity.DNI Number.

AUSTRALIA_MEDICARE_NUMBER

Matches numeric strings consistent with Australian Medicare number. Requires a ten- or eleven-digit number. The starting digit must be between 2 and 6, inclusive. Optional spaces can be placed between the fourth and fifth and ninth and tenth digit. The optional 11th digit separated by a / can be present. A checksum is required. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.Australia, Discovered.PHI, Discovered.Entity.Medicare Number.

AUSTRALIA_PASSPORT

Matches strings consistent with Australian Passport number. An 8- or 9-character string is required, with a starting upper case character (N, E, D, F, A, C, U, X) or a two-character starting character (P followed by A, B, C, D, E, F, U, W, X, or Z) followed by seven digits. Tags include Discovered.PII, Discovered.Identifier Direct Discovered.Country.Australia, Discovered.PHI, Discovered.Entity.Passport.

AUSTRALIA_TAX_FILE_NUMBER

Matches strings consistent with Australian Tax File number. Requires a nine-digit number with optional spaces between the third and fourth and sixth and seventh digits. A checksum is required. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.Australia, Discovered.PHI, Discovered.Entity.Tax File Number.

BELGIUM_NATIONAL_ID_CARD_NUMBER

Matches numeric strings consistent with Belgium's National ID card. Requires a twelve-digit number with hyphen (-) between the third and fourth digit and tenth and eleventh digits. A two checksum is required. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.Belgium, Discovered.PHI, Discovered.Entity.National ID Card Number.

BITCOIN_INVOICE_ADDRESS

Matches strings consistent with the following Bitcoin Invoice Address formats: P2PKH, P2SH, and Bech32. P2PKH and P2SH must start with a 1 or a 3, respectively, followed by 25 - 34 alphanumeric characters, excluding l, I, O, and 0. Bech32 formats must begin with bc1 and be followed by 39 characters. To be identified, any addresses must have a valid checksum. Tags include Discovered.Entity.CRYPTO, Discovered.PCI.

BRAZIL_CPF_NUMBER

Matches a numeric string consistent with Brazil's CPF (Cadastro Pessoal de Pessoa Física) number. An eleven-digit numeric string with non-numeric separators after the third, sixth, and ninth digits. A two digit checksum is required. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.Brazil, Discovered.PHI, Discovered.Entity.CPF Number.

CANADA_BC_PHN

Matches numeric strings consistent with British Columbia's Personal Health Number (PHN). Requires a ten-digit numeric string with optional hyphen (-) or spaces after the fourth and seventh digits. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.Canada, Discovered.PHI, Discovered.Entity.British Columbia Health Network Number.

CANADA_DRIVERS_LICENSE_NUMBER

Matches strings consistent with Canadian driver's license numbers from each province. Looks for strings to be consistent with at least one of seven patterns. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.Canada, Discovered.PHI, Discovered.Entity.Drivers License Number.

CANADA_OHIP

Matches alphanumeric strings consistent with Ontario's Health Insurance Plan (OHIP). Requires a twelve-digit alphanumeric code. Optional hyphens (-) or spaces can appear after the fourth, seventh, and tenth digits. The final two characters are a checksum. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.Canada, Discovered.PHI, Discovered.Entity.Ontario Health Insurance Number.

CANADA_PASSPORT

CANADA_QUEBEC_HIN

Matches alphanumeric strings consistent with Quebec's Health Insurance Number (HIN). Requires four alphabetic characters followed by an optional space or hyphen (-), and then eight digits with an optional hyphen or space after the fourth digit. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.Canada, Discovered.PHI, Discovered.Entity.Quebec Health Insurance Number.

CANADA_SOCIAL_INSURANCE_NUMBER

Matches numeric strings consistent with the Canadian Social Insurance number format. Requires a nine-digit numeric string with optional hyphens or spaces after the third and sixth digit. The last digit is a checksum. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.Canada, Discovered.PHI, Discovered.Entity.Social Insurance Number.

CREDIT_CARD_NUMBER

Matches strings consistent with a credit card number. Must include a valid checksum. Tags include Discovered.PCI, Discovered.Entity.Credit Card Number.

DATE

Matches strings consistent with dates. These can include days of the week, dates, and date times. Tags include Discovered.Entity.Date.

DATE_OF_BIRTH

Matches date strings as Date of Birth if the column heading is dob, birth, etc. Tags include Discovered.PII, Discovered.Identifier Indirect, Discovered.PHI, Discovered.Entity.Date of Birth.

DENMARK_CPR_NUMBER

Matches numeric strings consistent with Personal Identification Number (CPR-number or Person-number). Requires a ten-digit number with either a DDMMYY-SSSS or DDMMYYSSSS format. The first six digits are an individual's birth date in Day, Month, Year format. The final four digits comprise the sequence number. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.Denmark, Discovered.PHI, Discovered.Entity.CPR Number.

DOMAIN_NAME

Matches domain names using a very broad pattern. Tags include Discovered.Entity.Domain Name

EMAIL_ADDRESS

Matches strings consistent with an email address. Usernames are required to be fewer than 255 characters, follow by @a, a domain of fewer than 255 characters, and a top level domain of between 2 and 20 characters. Tags include Discovered.PHI, Discovered.Entity.Electronic Mail Address, Discovered.Identifier Direct.

ETHNIC_GROUP

Matches strings consistent with the US Census race designations. Tags include Discovered.PII, Discovered.Entity.Ethnic Group.

FDA_CODE

Matches a string consistent with a drug or ingredient registered with Food and Drug Administration (FDA). Must start with between 4 to 6 digits, followed by a hyphen, followed by 3 to 4 digits, followed by a hyphen, and finishing with one to two digits. Tags include Discovered.Country.US, Discovered.Entity.FDA Code.

FINLAND_NATIONAL_ID_NUMBER

Matches a string consistent with Finland's National ID number. Requires an eleven-character string in a DDMMYYCZZZQ format. The first six digits are an individual's birth date in Day, Month, Year format. The C character is a century of birth indicator (+ for the years 1800-1899, - for years 1900-1999, and A for years 2000-2099). ZZZ is an individual ID number, and Q is a required checksum. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.Finland, Discovered.PHI, Discovered.Entity.National ID Number.

FRANCE_CNI

Matches numeric strings consistent with the French National ID card number (carte nationale d'identité). Requires a twelve-digit numeric string. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.France, Discovered.PHI, Discovered.Entity.CNI.

FRANCE_NIR

Matches numeric strings consistent with France's National ID number (Numéro d'Inscription au Répertoire). Requires a fifteen-digit numeric string. An optional hyphen (-) or space can appear after the 13th digit. The 14th and 15th digits act as a checksum. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.France, Discovered.PHI, Discovered.Entity.NIR.

FRANCE_PASSPORT

Matches alphanumeric strings consistent with the French Passport number. Requires two numbers followed by two upper case letters and ends with 5 digits. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.France, Discovered.PHI, Discovered.Entity.Passport.

GENDER

Matches strings consistent with gender. Tags include Discovered.PII, Discovered.Identifier Indirect, Discovered.PHI, Discovered.Entity.Gender.

GERMANY_DRIVERS_LICENSE_NUMBER

Matches alphanumeric strings consistent with Germany's Driver's License number. Requires an eleven-element string, with a digit or a letter followed by two digits, 6 digits or letters, one digit, and one digit or letter. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.Germany, Discovered.PHI, Discovered.Entity.Drivers License Number.

GERMANY_IDENTITY_CARD_NUMBER

Matches alphanumeric strings consistent with Germany's Identity Card number. Requires a single letter followed by eight digits. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.Germany, Discovered.PHI, Discovered.Entity.Identity Card Number.

IBAN_CODE

Matches strings consistent with an International Bank Account Number (IBAN). Must contain a valid country code. Tags include Discovered.Entity.IBAN Code.

ICD10_CODE

Matches strings consistent with codes from the International Statistical Classification of Diseases and Related Health Problems (ICD), as drawn from the Clinical Modification lexicon from the year 2020. Tags include Discovered.Entity.ICD10 Code.

IMEI_HARDWARE_ID

Matches strings consistent with an International Mobile Equipment Identity (IMEI) number. Must contain 15 digits with optional hyphens or spaces after the second, 8th, and 14th digits. Tags include Discovered.Entity.IMEI.

IP_ADDRESS

Matches IP Addresses in the V4 and V6 formats. Tags include Discovered.Entity.IP Address.

LOCATION

Matches strings consistent with Countries, States, Addresses, or Municipalities. By default focuses on locations in the United States. Tags include Discovered.Entity.Location.

MAC_ADDRESS

Matches strings consistent with a Media Access Control (MAC) address. Must contain twelve hexadecimal digits, with every two digits separated by a colon. Tags include Discovered.Entity.MAC Address.

MAC_ADDRESS_LOCAL

Matches strings consistent with a local Media Access Control (MAC) address. Tags include Discovered.Entity.MAC Address Local.

PERSON_NAME

Matches strings consistent with a dictionary of people's names. US person names are drawn from the US Social Security database. Tags include Discovered.PII, Discovered.PHI, Discovered.Entity.Person Name, Discovered.Identifier Indirect.

PHONE_NUMBER

Matches strings consistent with telephone numbers. Primarily looks for strings consistent with the United States telephone numbers naming convention. Tags include Discovered.Entity.Telephone Number.

POSTAL_CODE

Matches strings consistent with a valid US zip code with an optional +4. Only valid 5 digit zip codes are matched. Tags include Discovered.Entity.Postal Code.

SPAIN_DRIVERS_LICENSE_NUMBER

Matches alphanumeric strings consistent with Spain's Driver's License number. Requires eight digits followed by a single letter or digit. The final digit acts as a checksum. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.Spain, Discovered.PHI, Discovered.Entity.Drivers License Number.

SPAIN_NIE_NUMBER

Matches strings consistent with Spain's Foreigner Identification number. Requires an eight-character string. The initial character must be X, Y, or Z, followed by seven digits, then by an optional hyphen or space and a single checksum character. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.Spain, Discovered.PHI, Discovered.Entity.NIE Number.

SPAIN_NIF_NUMBER

Matches strings consistent with Spain's Tax Identification number. Requires an eight-character string. Requires eight digits followed by an optional hyphen or space and a single checksum character. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.Spain, Discovered.PHI, Discovered.Entity.NIF Number.

SPAIN_PASSPORT

Matches strings consistent with Spain's Passport number. Requires an eight- or nine-character string, starting with either two or three letters followed by six digits. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.Spain, Discovered.PHI, Discovered.Entity.Passport.

STREET_ADDRESS

Matches strings consistent with street addresses. Primarily looks for strings consistent with the United States street naming convention. Tags include Discovered.Entity.Location.

SWEDEN_NATIONAL_ID_NUMBER

Matches numeric strings consistent with Sweden's Nation ID number. Requires a ten- or twelve-digit string that must start with a date in either the YYMMDD or YYYYMMDD formats. An optional - or + character then separates four ending digits. The final digit is a checksum. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.Sweden, Discovered.PHI, Discovered.Entity.National ID Number.

SWEDEN_PASSPORT

Matches numeric strings consistent with Sweden's Passport number. Requires an 8-digit number. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.Sweden, Discovered.PHI, Discovered.Entity.Passport.

SWIFT_CODE

Matches alphanumeric strings consistent with a SWIFT code (or Bank Identifier Code (BIC)) format. Tags include Discovered.Entity.Swift Code.

THAILAND_NATIONAL_ID_NUMBER

Matches strings consistent with Thailand's National ID number. Requires a 13-digit number with optional spaces or hyphens (-) after the first, fifth, tenth, and twelfth digits. The final digit is a checksum. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.Thailand, Discovered.PHI, Discovered.Entity.National ID Number.

TIME

Matches strings consistent with times. Can contain both date and time pieces. Tags include Discovered.Entity.Date.

UK_DRIVERS_LICENSE_NUMBER

Matches alphanumeric strings consistent with the United Kingdom's Driver's License number. Requires either a 16- or 18-character string. The first five characters represent the driver's surname, padded with 9s, followed by a single digit for decade of birth, two digits for month of birth (incremented by 50 for female drivers), two digits for day of birth, one digit for year of birth, two letters, an arbitrary digit, and two digits. Two additional digits can be present for each license issuance. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.UK, Discovered.PHI, Discovered.Entity.Drivers License Number.

UK_NATIONAL_INSURANCE_NUMBER

Matches alphanumeric strings consistent with the United Kingdom's National Insurance number. Requires a nine-character string. The first two digits must be letters, followed by an optional space, then six digits with optional spaces or hyphens (-) every two digits, ending with a letter. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.UK, Discovered.PHI, Discovered.Entity.National Insurance Number.

UK_PASSPORT

Matches numeric strings consistent with the United Kingdom's passport number. Requires a nine-digit numeric string. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.UK, Discovered.PHI, Discovered.Entity.Passport.

UK_TAXPAYER_REFERENCE

Matches ten-digit numeric strings consistent with UK Taxpayer Reference (UTR) numbers. The final digit is a checksum. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.UK, Discovered.PHI, Discovered.Entity.Taxpayer Reference.

URL

Matches string consistent with a Uniform Resource Locator (URL). String must begin with http://, https://, ftp://, file:///, or mailto:, followed by a string and ending with a top level domain of no more than 128 characters. Tags include Discovered.Entity.URL.

US_ADOPTION_TAXPAYER_IDENTIFICATION_NUMBER

Matches a numeric string consistent United States Adoption Taxpayer Identification Number (ATIN). Requires a string similar in format to a US Social Security Number, but starting with a 9 in the Area Number and having 93 as an allowed Group Number. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.US, Discovered.PHI, Discovered.Entity.Adoption Taxpayer ID Number.

US_BANK_ROUTING_MICR

Matches numeric string consistent with an American Bankers Association (ABA) Routing Number. Must be a nine-digit number starting with 0, 1, 2, 3, 6, or 7, followed by eight digits. The final digit is a checksum. Tags include Discovered.Country.US, Discovered.Entity.Bank Routing MICR.

US_DEA_NUMBER

Matches alphanumeric strings consistent with a Drug Enforcement Administration (DEA) number that is assigned to a health care provider. Must be a length of nine characters. The first two digits must be alphanumeric, and the last seven digits must be digits. The final digit is a checksum. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.US, Discovered.Entity.DEA Number.

US_DRIVERS_LICENSE_NUMBER

Matches strings consistent with some US Driver's license numbers. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.US, Discovered.PHI, Discovered.Entity.Drivers License Number.

US_EMPLOYER_IDENTIFICATION_NUMBER

Matches numeric string consistent United States Employer Identification Number (EIN). Strings must contain nine digits with a hyphen after the second digit. Tags include Discovered.Country.US, Discovered.Entity.Employer ID Number.

US_HEALTHCARE_NPI

Matches numeric strings consistent with US National Provider Identifier (NPI). Strings must be either 10 or 15 digits with the final digit being a valid checksum. Tags include Discovered.PII, Discovered.Country.US, Discovered.Entity.Healthcare NPI, Discovered.Identifier Undetermined.

US_INDIVIDUAL_TAXPAYER_IDENTIFICATION_NUMBER

Matches a numeric string consistent United States Individual Taxpayer Identification Number (ITIN). Requires a string similar in format to a US Social Security Number, but starting with a 9 in the Area Number and having a limited set of allowed Group Numbers. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.US, Discovered.PHI, Discovered.Entity.Individual Taxpayer ID Number.

US_PASSPORT

Matches numeric strings consistent with United States Passport number. Strings must contain nine digits. Columns should have a name or label consistent with a passport. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.US, Discovered.PHI, Discovered.Entity.Passport.

US_PREPARER_TAXPAYER_IDENTIFICATION_NUMBER

Matches strings consistent with a Preparer Taxpayer ID number. Strings must have nine characters, starting with a P that is followed by 8 digits. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.US, Discovered.Entity.Preparer Taxpayer ID Number.

US_SOCIAL_SECURITY_NUMBER

Matches strings consistent with a US Social Security Number. Strings must contain nine digits and comprise three parts: the three left-most digits designating the area number, the middle two digits designating the group number, and the four right-most digits designating the serial number. For a column to be tagged, none of these parts can contain all zeroes, and area numbers must not be 666 or in the range of 900-999. Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.US, Discovered.PHI, Discovered.Entity.Social Security Number.

US_STATE

Matches strings consistent with either a full name or two-letter abbreviation of a US state or territory. Tags include Discovered.Country.US, Discovered.Entity.State.

US_TOLLFREE_PHONE_NUMBER

Matches strings consistent with a US toll-free telephone number. Allowed area codes are 800, 88+any digit, or 899. Tags include Discovered.Country.US, Discovered.Entity.Tollfree Telephone Number.

VEHICLE_IDENTIFICATION_NUMBER

Matches strings consistent with Vehicle Identification Numbers. A checksum is required as well as a valid World Manufacturer Identifier. Tags include Discovered.Country.US, Discovered.Entity.Vehicle Identifier or Serial Number.

Create and Apply a Template to a Data Source

Create a template

Generate your API key on the API Keys tab on your profile page and save the API key somewhere secure. You will include this API key in the authorization header when you make a request to the Immuta API.
Find identifiers to include in your template using one of these methods:

Immuta CLI

immuta api sdd/classifier?sortField=name&sortOrder=asc&limit=25&searchText=IDENTIFIER

HTTP API

curl \
    --request GET \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    https://your-immuta-url.immuta.com/sdd/classifier?sortField=name&sortOrder=asc&limit=25&searchText=IDENTIFIER

If the request was successful, you will receive a list of available identifiers.

{
  "count": 3,
  "hits": [
    {
      "createdBy": {
        "id": 1,
        "name": "John",
        "email": "john@example.com"
      },
      "name": "ACCOUNT_NUMBER_IDENTIFIER",
      "displayName": "Account Number Identifier",
      "description": "This identifier recognizes account numbers using a regex",
      "type": "regex",
      "config": {
        "tags": [
          "Discovered.account-number"
        ],
        "regex": "[0-9]{9}-[0-9]{3}-[0-9]{1}",
        "minConfidence": 0.5
      },
      "id": 104,
      "createdAt": "2021-10-20T19:12:24.889Z",
      "updatedAt": "2021-10-20T19:12:24.889Z"
    },
    {
      "createdBy": {
        "id": 1,
        "name": "John",
        "email": "john@example.com"
      },
      "name": "EMPLOYEE_DESK_LOCATION_IDENTIFIER",
      "displayName": "Employee Desk Location Identifier",
      "description": "This identifier detects when an employee's desk location appears in a dataset.",
      "type": "dictionary",
      "config": {
        "tags": [
          "Discovered.desk-location"
        ],
        "values": [
          "Research Lab",
          "Blue Room",
          "Purple Room"
        ],
        "caseSensitive": false,
        "minConfidence": 0.6
      },
      "id": 68,
      "createdAt": "2021-10-20T17:57:51.696Z",
      "updatedAt": "2021-10-20T17:57:51.696Z"
    },
    {
      "createdBy": {
        "id": 1,
        "name": "John",
        "email": "john@example.com"
      },
      "name": "SOCIAL_SECURITY_NUMBER_COLUMNS_IDENTIFIER",
      "displayName": "Social Security Number Columns Identifier",
      "description": "This identifier recognizes column names that match the defined regex pattern.",
      "type": "columnNameRegex",
      "config": {
        "tags": [
          "Discovered.Social Security Numbers"
        ],
        "columnNameRegex": "ssn|social ?security"
      },
      "id": 67,
      "createdAt": "2021-10-20T17:57:17.930Z",
      "updatedAt": "2021-10-20T17:57:17.930Z"
    }
  ]
}

Save the template payload in a .json file. Use the tabs below to see different examples of templates.

{
  "name": "ACCOUNT_NUMBERS_TEMPLATE",
  "displayName": "Account Numbers Template",
  "description": "This template contains the identifier that recognizes account numbers.",
  "classifiers": [
    {
      "name": "ACCOUNT_NUMBER_IDENTIFIER"
    }
  ],
  "sampleSize": 100
}

{
  "name": "EMPLOYEE_DESK_LOCATION_TEMPLATE",
  "displayName": "Employee Desk Location Template",
  "description": "This template contains the identifier that detects when the name of the room an employee's desk is in appears in a dataset.",
  "classifiers": [
    {
      "name": "EMPLOYEE_DESK_LOCATION_IDENTIFIER"
    }
  ],
  "sampleSize": 100
}

{
  "name": "SOCIAL_SECURITY_NUMBERS_TEMPLATE",
  "displayName": "Social Security Numbers Template",
  "description": "This template contains the identifier that matches social security number column names with the defined regex.",
  "classifiers": [
    {
      "name": "SOCIAL_SECURITY_NUMBER_COLUMNS_IDENTIFIER"
    }
  ],
  "sampleSize": 100
}

{
  "name": "STUDENT_LOCATION_TEMPLATE",
  "displayName": "Student Location Template",
  "description": "This template contains the identifier that detects when a student's residence hall, floor, or room appears in a dataset.",
  "classifiers": [
    {
      "name": "STUDENT_LOCATION_IDENTIFIER"
    }
  ],
  "sampleSize": 100
}

Create the template:

Immuta CLI

immuta api sdd/template -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/template

If the request is successful, you will receive a response that contains details about the template. Use the tabs below to see different responses for different templates.

{
  "name": "ACCOUNT_NUMBERS_TEMPLATE",
  "displayName": "Account Numbers Template",
  "description": "This template contains the identifier that recognizes account numbers.",
  "sampleSize": 100,
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "id": 1,
  "createdAt": "2021-10-21T19:12:22.092Z",
  "updatedAt": "2021-10-21T19:12:22.092Z",
  "classifiers": [
    {
      "name": "ACCOUNT_NUMBER_IDENTIFIER",
      "overrides": {}
    }
  ]
}

{
  "name": "EMPLOYEE_DESK_LOCATION_TEMPLATE",
  "displayName": "Employee Desk Location Template",
  "description": "This template contains the identifier that detects when the name of the room an employee's desk is in appears in a dataset.",
  "sampleSize": 100,
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "id": 1,
  "createdAt": "2021-10-21T18:03:58.967Z",
  "updatedAt": "2021-10-21T18:03:58.967Z",
  "classifiers": [{
    "name": "EMPLOYEE_DESK_LOCATION_IDENTIFIER",
    "overrides": {}
  }]
}

{
  "name": "SOCIAL_SECURITY_NUMBERS_TEMPLATE",
  "displayName": "Social Security Numbers Template",
  "description": "This template contains the identifier that matches social security number column names with the defined regex.",
  "sampleSize": 100,
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "id": 2,
  "createdAt": "2021-10-21T19:12:22.092Z",
  "updatedAt": "2021-10-21T19:12:22.092Z",
  "classifiers": [
    {
      "name": "SOCIAL_SECURITY_NUMBER_COLUMNS_IDENTIFIER",
      "overrides": {}
    }
  ]
}

{
  "name": "STUDENT_LOCATION_TEMPLATE",
  "displayName": "Student Location Template",
  "description": "This template contains the identifier that detects when a student's residence hall, floor, or room appears in a dataset.",
  "sampleSize": 100,
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "id": 1,
  "createdAt": "2021-10-21T18:03:58.967Z",
  "updatedAt": "2021-10-21T18:03:58.967Z",
  "classifiers": [{
    "name": "STUDENT_LOCATION_IDENTIFIER",
    "overrides": {}
  }]
}

Apply a template to data sources

Attributes of all custom identifiers and templates are provided on the Sensitive data discovery API page. However, attributes specific to this section are outlined in the table below.

Attribute

Description

template

string The name of the template to apply to the data sources; null clears the current template.

sources

string The name of the data sources to apply the template to.

Find templates to apply to your data sources:

Immuta CLI

immuta api sdd/template

HTTP API

curl \
    --request GET \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    https://your-immuta-url.immuta.com/sdd/template

If the request was successful, you will receive a list of available templates.

{
  "count": 3,
  "hits": [
    {
      "name": "ACCOUNT_NUMBERS_TEMPLATE",
      "displayName": "Account Numbers Template",
      "description": "This template contains the identifier that recognizes account numbers.",
      "sampleSize": 100,
      "createdBy": {
        "id": 1,
        "name": "John",
        "email": "john@example.com"
      },
      "id": 2,
      "createdAt": "2021-10-20T19:13:35.319Z",
      "updatedAt": "2021-10-20T19:13:35.319Z",
      "classifiers": [
        {
          "name": "ACCOUNT_NUMBER_IDENTIFIER",
          "overrides": {}
        }
      ]
    },
    {
      "name": "EMPLOYEE_DESK_LOCATION_TEMPLATE",
      "displayName": "Employee Desk Location Template",
      "description": "Contains identifier that detects when the name of a room a desk is in appears in a dataset.",
      "sampleSize": 100,
      "createdBy": {
        "id": 1,
        "name": "John",
        "email": "john@example.com"
      },
      "id": 1,
      "createdAt": "2021-10-20T18:03:58.967Z",
      "updatedAt": "2021-10-20T18:03:58.967Z",
      "classifiers": [
        {
          "name": "EMPLOYEE_DESK_LOCATION_IDENTIFIER",
          "overrides": {}
        }
      ]
    },
    {
      "name": "SOCIAL_SECURITY_NUMBERS_TEMPLATE",
      "displayName": "Social Security Numbers Template",
      "description": "Contains identifier that matches ssn column names with the defined regex.",
      "sampleSize": 100,
      "createdBy": {
        "id": 1,
        "name": "John",
        "email": "john@example.com"
      },
      "id": 3,
      "createdAt": "2021-10-20T19:13:58.359Z",
      "updatedAt": "2021-10-20T19:13:58.359Z",
      "classifiers": [
        {
          "name": "SOCIAL_SECURITY_NUMBER_COLUMNS_IDENTIFIER",
          "overrides": {}
        }
      ]
    }
  ]
}

Select an appropriate template to apply to your data sources, and save the payload in a .json file:

{
  "template": "ACCOUNT_NUMBERS_TEMPLATE",
  "sources": [
    "Insurance Data"
  ]
}

Apply the template to your data source(s):

Immuta CLI

immuta api sdd/template/apply -X PUT --input ./example-payload.json

HTTP API

curl \
    --request PUT \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/template/apply

You will receive a response that indicates whether or not the template was successfully applied to your data sources.

{
  "success": true
}

Additional tutorials

Clone a template

Users cannot modify templates created by other data owners, but they can clone templates and make changes to the clone.

Get a list of templates to determine the template you want to clone using one of these methods:

Immuta CLI

immuta api sdd/sdd/template?sortField=name&sortOrder=asc&offset=5&limit=5

HTTP API

curl \
    --request GET \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    https://your-immuta-url.immuta.com/sdd/template?sortField=name&sortOrder=asc&offset=5&limit=5

Save the template clone name and details in a .json file.

{
  "name": "INSURANCE_ACCOUNT_NUMBERS",
  "displayName": "Insurance Account Numbers",
  "description": "This template is specific to insurance accounts."
}

Clone the template:

Immuta CLI

immuta api sdd/template/ACCOUNT_NUMBERS_TEMPLATE/clone -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/template/ACCOUNT_NUMBERS_TEMPLATE/clone

If the request was successful, you will receive a response that provides details about the template clone.

{
  "name": "INSURANCE_ACCOUNT_NUMBERS",
  "displayName": "Insurance Account Numbers",
  "description": "This template is specific to insurance accounts.",
  "sampleSize": 100,
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "id": 4,
  "createdAt": "2021-10-20T20:48:37.805Z",
  "updatedAt": "2021-10-20T20:48:37.805Z",
  "classifiers": [
    {
      "name": "ACCOUNT_NUMBER_IDENTIFIER",
      "overrides": {}
    }
  ]
}

You can now modify the template, such as changing the identifiers (classifiers) included and the sampleSize.

Configure entity tags and confidence

To disable entity tags from being set, you can create a template to that configures the identifier that contains that tag.

To override the Discovered.PHI tag, you would create a template that includes the PERSON_NAME identifier and removes the Discovered.PHI from the list of tags in the template payload.

View the details about the PERSON_NAME identifier so you know what to include in your template using one of these methods:

Immuta CLI

immuta api sdd/classifier?sortField=name&sortOrder=asc&limit=25&searchText=PERSON_NAME

HTTP API

curl \
    --request GET \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    https://your-immuta-url.immuta.com/sdd/classifier?sortField=name&sortOrder=asc&limit=25&searchText=PERSON_NAME

If the request was successful, the response will include details about the PERSON_NAME identifier.

{
  "createdBy": {
    "id": 21,
    "name": "Immuta System Account",
    "email": "immuta_system@immuta.com"
  },
  "name": "PERSON_NAME",
  "displayName": "Person Name",
  "description": "Detects strings consistent with a dictionary of people's names.",
  "type": "builtIn",
  "config": {
    "tags": [
      "Discovered.PHI",
      "Discovered.PII",
      "Discovered.Entity.Person Name",
      "Discovered.Identifier Indirect"
    ],
    "minConfidence": 0.3
  },
  "id": 54,
  "createdAt": "2021-10-21T07:35:14.416Z",
  "updatedAt": "2021-10-21T12:57:43.919Z"
}

Remove the Discovered.PHI tag from the list of tags in the identifier config, and save the template payload in a .json file.

{
  "name": "PERSON_NAME_OVERRIDE",
  "displayName": "Person Name Override",
  "description": "This template removes the PHI tag from the PERSON_NAME identifier.",
  "classifiers": [
    {
      "name": "PERSON_NAME",
        "overrides": {
          "tags": [
            "Discovered.PII",
            "Discovered.Entity.Person Name",
            "Discovered.Identifier Indirect"
          ]
        }
      }
    ],
  "sampleSize": 100
}

Create the template:

Immuta CLI

immuta api sdd/template -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/template

If the request is successful, you will receive a response that details the new template:

{
  "name": "PERSON_NAME_OVERRIDE",
  "displayName": "Person Name Override",
  "description": "This template removes the PHI tag from the PERSON_NAME identifier.",
  "sampleSize": 100,
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "id": 1,
  "createdAt": "2021-10-21T17:11:18.057Z",
  "updatedAt": "2021-10-21T17:11:18.057Z",
  "classifiers": [
    {
      "name": "PERSON_NAME",
      "overrides": {
        "tags": [
          "Discovered.PII",
          "Discovered.Entity.Person Name",
          "Discovered.Identifier Indirect"
        ]
      }
    }
  ]
}

What's next

Now that you've created a template, continue to one of the following tutorials:

SDD global settings: Opt to add your template to the SDD global settings so that Immuta will use this template to run SDD for all data sources.
Run sensitive data discovery on a data source

Built-In Identifiers

Identifier

Description

AGE

ARGENTINA_DNI_NUMBER

AUSTRALIA_MEDICARE_NUMBER

AUSTRALIA_PASSPORT

AUSTRALIA_TAX_FILE_NUMBER

BELGIUM_NATIONAL_ID_CARD_NUMBER

BITCOIN_INVOICE_ADDRESS

BRAZIL_CPF_NUMBER

CANADA_BC_PHN

CANADA_DRIVERS_LICENSE_NUMBER

CANADA_OHIP

CANADA_PASSPORT

Matches strings consistent with the Canadian Passport Number format as . Tags include Discovered.PII, Discovered.Identifier Direct, Discovered.Country.Canada, Discovered.PHI, Discovered.Entity.Passport.

CANADA_QUEBEC_HIN

CANADA_SOCIAL_INSURANCE_NUMBER

CREDIT_CARD_NUMBER

Matches strings consistent with a credit card number. Must include a valid checksum. Tags include Discovered.PCI, Discovered.Entity.Credit Card Number.

DATE

Matches strings consistent with dates. These can include days of the week, dates, and date times. Tags include Discovered.Entity.Date.

DATE_OF_BIRTH

DENMARK_CPR_NUMBER

DOMAIN_NAME

Matches domain names using a very broad pattern. Tags include Discovered.Entity.Domain Name

EMAIL_ADDRESS

ETHNIC_GROUP

Matches strings consistent with the US Census race designations. Tags include Discovered.PII, Discovered.Entity.Ethnic Group.

FDA_CODE

FINLAND_NATIONAL_ID_NUMBER

FRANCE_CNI

FRANCE_NIR

FRANCE_PASSPORT

GENDER

Matches strings consistent with gender. Tags include Discovered.PII, Discovered.Identifier Indirect, Discovered.PHI, Discovered.Entity.Gender.

GERMANY_DRIVERS_LICENSE_NUMBER

GERMANY_IDENTITY_CARD_NUMBER

IBAN_CODE

Matches strings consistent with an International Bank Account Number (IBAN). Must contain a valid country code. Tags include Discovered.Entity.IBAN Code.

ICD10_CODE

IMEI_HARDWARE_ID

IP_ADDRESS

Matches IP Addresses in the V4 and V6 formats. Tags include Discovered.Entity.IP Address.

LOCATION

Matches strings consistent with Countries, States, Addresses, or Municipalities. By default focuses on locations in the United States. Tags include Discovered.Entity.Location.

MAC_ADDRESS

Matches strings consistent with a Media Access Control (MAC) address. Must contain twelve hexadecimal digits, with every two digits separated by a colon. Tags include Discovered.Entity.MAC Address.

MAC_ADDRESS_LOCAL

Matches strings consistent with a local Media Access Control (MAC) address. Tags include Discovered.Entity.MAC Address Local.

PERSON_NAME

PHONE_NUMBER

POSTAL_CODE

Matches strings consistent with a valid US zip code with an optional +4. Only valid 5 digit zip codes are matched. Tags include Discovered.Entity.Postal Code.

SPAIN_DRIVERS_LICENSE_NUMBER

SPAIN_NIE_NUMBER

SPAIN_NIF_NUMBER

SPAIN_PASSPORT

STREET_ADDRESS

Matches strings consistent with street addresses. Primarily looks for strings consistent with the United States street naming convention. Tags include Discovered.Entity.Location.

SWEDEN_NATIONAL_ID_NUMBER

SWEDEN_PASSPORT

SWIFT_CODE

Matches alphanumeric strings consistent with a SWIFT code (or Bank Identifier Code (BIC)) format. Tags include Discovered.Entity.Swift Code.

THAILAND_NATIONAL_ID_NUMBER

TIME

Matches strings consistent with times. Can contain both date and time pieces. Tags include Discovered.Entity.Date.

UK_DRIVERS_LICENSE_NUMBER

UK_NATIONAL_INSURANCE_NUMBER

UK_PASSPORT

UK_TAXPAYER_REFERENCE

URL

US_ADOPTION_TAXPAYER_IDENTIFICATION_NUMBER

US_BANK_ROUTING_MICR

US_DEA_NUMBER

US_DRIVERS_LICENSE_NUMBER

US_EMPLOYER_IDENTIFICATION_NUMBER

US_HEALTHCARE_NPI

US_INDIVIDUAL_TAXPAYER_IDENTIFICATION_NUMBER

US_PASSPORT

US_PREPARER_TAXPAYER_IDENTIFICATION_NUMBER

US_SOCIAL_SECURITY_NUMBER

US_STATE

Matches strings consistent with either a full name or two-letter abbreviation of a US state or territory. Tags include Discovered.Country.US, Discovered.Entity.State.

US_TOLLFREE_PHONE_NUMBER

Matches strings consistent with a US toll-free telephone number. Allowed area codes are 800, 88+any digit, or 899. Tags include Discovered.Country.US, Discovered.Entity.Tollfree Telephone Number.

VEHICLE_IDENTIFICATION_NUMBER

Tags

Navigating tags

Sensitive data discovery

Manage Tags

Create Tags

Create tags

View data source tags

View all tags

Import tags from an external catalog

Link an external catalog to a data source

Custom REST catalog

Add Tags to Data Sources and Projects

Add tags to data sources

Verify discovered tags

Disable discovered tags from the data dictionary

Remove tags from data sources

Manage data dictionary tags

Add tags to the data dictionary

Remove tags from the data dictionary

Manage project tags

Add tags to projects

Remove tags from projects

Sensitive Data Discovery

Architecture

Components

Identifier

Supported identifier types

Templates

Considerations

Configure and customize SDD

SDD Pre-Configuration Details

Configurable global settings

Global template

Sample size

Tag mutability

Dry run

SDD workflow

Workflow 1: Apply a global template to all data sources

Workflow 2: Apply a template to a specific data source

Customize Sensitive Data Discovery

Enable and Manage Global SDD Settings

Enable Sensitive Data Discovery

Configure a Global Template

Configure a Default Sample Size

Run Sensitive Data Discovery on Data Sources

Attributes overview

Run SDD on data sources

Additional tutorials

Test SDD on a data source

Trigger SDD in the Immuta UI

What's next

Create and Apply a Template to a Data Source

Create a template

Apply a template to data sources

Additional tutorials

Clone a template

Configure entity tags and confidence

What's next

Create a Custom Identifier

Create a Regex Identifier

Attributes of the custom regex identifier

Create a custom regex identifier

What's next

Create a Column Name Regex Identifier

Attributes of the custom column name regex identifier

Create a custom column name regex identifier

What's Next

Create a Dictionary Identifier

Attributes of the custom dictionary identifier

Create a custom dictionary identifier

What's next

List Built-In Identifier

Attributes

Response details

List built-in identifiers

What's next

Reference Guides

Built-In Discovered Tags

Country tags

Entity tags