1 of 4

Create a Custom Identifier

Create a Regex Identifier

In previous documentation, identifier is referred to as classifier. The language is being updated to identifier to be more accurate and not conflate meaning with the Immuta data classification and frameworks feature.

Use case: Custom regex identifier

Scenario: You've listed Immuta's built-in identifiers for sensitive data discovery, but you discover there is no identifier that can automatically identify and tag columns that contain account numbers in your database.

A regular expression (regex) custom identifier allows you to create your own rules that enable Immuta's sensitive data discovery to find matches based on a regex pattern. For example, if a table contains account numbers in the form of xxxxxxxxx-xxx-x, you could define a regex pattern in a custom identifier to identify and tag these columns. The tutorial below uses this scenario to illustrate creating this identifier.

Attributes of the custom regex identifier

Attributes of all custom identifiers are provided on the Sensitive data discovery API page. However, attributes specific to the custom regex identifier are outlined in the table below.

Attribute

Description

Required

Create a custom regex identifier

Generate your API key on the API Keys tab on your profile page and save the API key somewhere secure. You will include this API key in the authorization header when you make a request to the Immuta API or use it to configure your instance with the Immuta CLI.

Save the custom regex identifier payload in a .json file.

{
  "name": "ACCOUNT_NUMBER_IDENTIFIER",
  "displayName": "Account Number Identifier",
  "description": "This identifier recognizes account numbers using a regex",
  "type": "regex",
  "config": {
    "regex": "^[0-9]{9}-[0-9]{3}-[0-9]{1}$",
    "minConfidence": 0.5,
    "tags": ["Discovered.account-number"]
  }
}

Create the identifier using one of these methods:

Immuta CLI

immuta api sdd/classifier -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/classifier

If the request is successful, you will receive a response that contains details about the identifier.

{
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "name": "ACCOUNT_NUMBER_IDENTIFIER",
  "displayName": "Account Number Identifier",
  "description": "This identifier recognizes account numbers using a regex",
  "type": "regex",
  "config": {
    "tags": [
      "Discovered.account-number"
    ],
    "regex": "[0-9]{9}-[0-9]{3}-[0-9]{1}",
    "minConfidence": 0.5
  },
  "id": 1,
  "createdAt": "2021-10-14T18:48:56.289Z",
  "updatedAt": "2021-10-14T18:48:56.289Z"
}

What's next

Continue to one of the following tutorials:

Run sensitive data discovery on data sources: Trigger SDD to run on specified data sources.
Create a template: Although only data governors can create identifiers, data owners can add identifiers to templates, which they then apply to their data sources to override minConfidence or tags for identifiers within the template.

Create a Column Name Regex Identifier

Use case: Custom column name regex identifier

Scenario: You've listed Immuta's built-in identifiers for sensitive data discovery, but you discover there is no identifier that can automatically detect and tag columns that contain account numbers in your database.

A custom column name regular expression (regex) identifier allows you to create your own detectors that enable Immuta's sensitive data discovery to find column name matches based on a regex pattern. For example, if your database contains tables with social security numbers, you could define a regex pattern to match against the names of the column instead of the values within the column. The tutorial below uses this scenario to illustrate creating this identifier.

Attributes of the custom column name regex identifier

Attributes of all custom identifiers are provided on the Sensitive data discovery API page. However, attributes specific to the custom column name regex identifier are outlined in the table below.

Attribute

Description

Required

Create a custom column name regex identifier

Generate your API key on the API Keys tab on your profile page and save the API key somewhere secure. You will include this API key in the authorization header when you make a request to the Immuta API or use it to configure your instance with the Immuta CLI.

Save the custom column name regex identifier payload in a .json file. The regex ^ssn|social ?security$ looks for column names that match ssn, socialsecurity, or social security.

{
  "name": "SOCIAL_SECURITY_NUMBER_COLUMNS_IDENTIFIER",
  "displayName": "Social Security Number Columns Identifier",
  "description": "This identifier recognizes column names that match the defined regex pattern.",
  "type": "columnNameRegex",
  "config": {
    "columnNameRegex": "^ssn|social ?security$",
    "tags": ["Discovered.Social Security Numbers"]
  }
}

Create the identifier using one of these methods:

Immuta CLI

immuta api sdd/classifier -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/classifier

If the request is successful, you will receive a response that contains details about the identifier.

{
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "name": "SOCIAL_SECURITY_NUMBER_COLUMNS_IDENTIFIER",
  "displayName": "Social Security Number Columns Identifier",
  "description": "This identifier recognizes column names that match the defined regex pattern.",
  "type": "columnNameRegex",
  "config": {
    "tags": [
      "Discovered.Social Security Number"
    ],
    "columnNameRegex": "^ssn|social ?security$"
  },
  "id": 2,
  "createdAt": "2021-10-14T18:48:56.289Z",
  "updatedAt": "2021-10-14T18:48:56.289Z"
}

What's Next

Continue to one of the following tutorials:

Run sensitive data discovery on data sources: Trigger SDD to run on specified data sources.
Create a template: Although only data governors can create identifiers, data owners can add identifiers to templates, which they then apply to their data sources to override minConfidence or tags for identifiers within the template.

Create a Dictionary Identifier

Use case: Custom dictionary identifier

Scenario: You have data that includes the names of the rooms employees' desks are in across your organization. Although these locations may be considered sensitive in particular datasets, they would not be recognized by Immuta's built-in identifiers.

A custom dictionary identifier allows you to create your own rules that enable Immuta's sensitive data discovery to match a list of room names to values in the dataset. The tutorial below uses this scenario to illustrate creating this identifier.

Attributes of the custom dictionary identifier

Attributes of all custom identifiers are provided on the Sensitive data discovery API page. However, attributes specific to the custom dictionary identifier are outlined in the table below.

Attribute

Description

Create a custom dictionary identifier

Generate your API key on the API Keys tab on your profile page and save the API key somewhere secure. You will include this API key in the authorization header when you make a request to the Immuta API or use it to configure your instance with the Immuta CLI.

Save the custom dictionary identifier payload in a .json file. The dictionary below contains the words Research Lab, Blue Room, and Purple Room.

{
  "name": "EMPLOYEE_DESK_LOCATION_IDENTIFIER",
  "displayName": "Employee Desk Location Identifier",
  "description": "This identifier detects when an employee's desk location appears in a dataset.",
  "type": "dictionary",
  "config": {
    "values": ["Research Lab", "Blue Room", "Purple Room"],
    "caseSensitive": false,
    "minConfidence": 0.6,
    "tags": ["Discovered.desk-location"]
  }
}

Create the identifier using one of these methods:

Immuta CLI

immuta api sdd/classifier -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/classifier

If the request is successful, you will receive a response that contains details about the identifier.

{
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "name": "EMPLOYEE_DESK_LOCATION_IDENTIFIER",
  "displayName": "Employee Desk Location Identifier",
  "description": "This identifier detects when an employee's desk location appears in a dataset.",
  "type": "dictionary",
  "config": {
    "tags": [
      "Discovered.desk-location"
    ],
    "values": [
      "Research Lab",
      "Blue Room",
      "Purple Room"
    ],
    "caseSensitive": false,
    "minConfidence": 0.6
  },
  "id": 68,
  "createdAt": "2021-10-20T17:57:51.696Z",
  "updatedAt": "2021-10-20T17:57:51.696Z"
}

What's next

Continue to one of the following tutorials:

Run sensitive data discovery on data sources: Trigger SDD to run on specified data sources.
Create a template: Although only data governors can create identifiers, data owners can add identifiers to templates, which they then apply to their data sources to override minConfidence or tags for identifiers within the template.

Create a Dictionary Identifier

Use case: Custom dictionary identifier

Attributes of the custom dictionary identifier

Attributes of all custom identifiers are provided on the Sensitive data discovery API page. However, attributes specific to the custom dictionary identifier are outlined in the table below.

Attribute

Description

Create a custom dictionary identifier

Generate your API key on the API Keys tab on your profile page and save the API key somewhere secure. You will include this API key in the authorization header when you make a request to the Immuta API or use it to configure your instance with the Immuta CLI.

Save the custom dictionary identifier payload in a .json file. The dictionary below contains the words Research Lab, Blue Room, and Purple Room.

{
  "name": "EMPLOYEE_DESK_LOCATION_IDENTIFIER",
  "displayName": "Employee Desk Location Identifier",
  "description": "This identifier detects when an employee's desk location appears in a dataset.",
  "type": "dictionary",
  "config": {
    "values": ["Research Lab", "Blue Room", "Purple Room"],
    "caseSensitive": false,
    "minConfidence": 0.6,
    "tags": ["Discovered.desk-location"]
  }
}

Create the identifier using one of these methods:

Immuta CLI

immuta api sdd/classifier -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/classifier

If the request is successful, you will receive a response that contains details about the identifier.

{
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "name": "EMPLOYEE_DESK_LOCATION_IDENTIFIER",
  "displayName": "Employee Desk Location Identifier",
  "description": "This identifier detects when an employee's desk location appears in a dataset.",
  "type": "dictionary",
  "config": {
    "tags": [
      "Discovered.desk-location"
    ],
    "values": [
      "Research Lab",
      "Blue Room",
      "Purple Room"
    ],
    "caseSensitive": false,
    "minConfidence": 0.6
  },
  "id": 68,
  "createdAt": "2021-10-20T17:57:51.696Z",
  "updatedAt": "2021-10-20T17:57:51.696Z"
}

What's next

Continue to one of the following tutorials:

Run sensitive data discovery on data sources: Trigger SDD to run on specified data sources.
Create a template: Although only data governors can create identifiers, data owners can add identifiers to templates, which they then apply to their data sources to override minConfidence or tags for identifiers within the template.

Create a Column Name Regex Identifier

Use case: Custom column name regex identifier

Attributes of the custom column name regex identifier

Attributes of all custom identifiers are provided on the Sensitive data discovery API page. However, attributes specific to the custom column name regex identifier are outlined in the table below.

Attribute

Description

Required

Create a custom column name regex identifier

Generate your API key on the API Keys tab on your profile page and save the API key somewhere secure. You will include this API key in the authorization header when you make a request to the Immuta API or use it to configure your instance with the Immuta CLI.

Save the custom column name regex identifier payload in a .json file. The regex ^ssn|social ?security$ looks for column names that match ssn, socialsecurity, or social security.

{
  "name": "SOCIAL_SECURITY_NUMBER_COLUMNS_IDENTIFIER",
  "displayName": "Social Security Number Columns Identifier",
  "description": "This identifier recognizes column names that match the defined regex pattern.",
  "type": "columnNameRegex",
  "config": {
    "columnNameRegex": "^ssn|social ?security$",
    "tags": ["Discovered.Social Security Numbers"]
  }
}

Create the identifier using one of these methods:

Immuta CLI

immuta api sdd/classifier -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/classifier

If the request is successful, you will receive a response that contains details about the identifier.

{
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "name": "SOCIAL_SECURITY_NUMBER_COLUMNS_IDENTIFIER",
  "displayName": "Social Security Number Columns Identifier",
  "description": "This identifier recognizes column names that match the defined regex pattern.",
  "type": "columnNameRegex",
  "config": {
    "tags": [
      "Discovered.Social Security Number"
    ],
    "columnNameRegex": "^ssn|social ?security$"
  },
  "id": 2,
  "createdAt": "2021-10-14T18:48:56.289Z",
  "updatedAt": "2021-10-14T18:48:56.289Z"
}

What's Next

Continue to one of the following tutorials:

Run sensitive data discovery on data sources: Trigger SDD to run on specified data sources.
Create a template: Although only data governors can create identifiers, data owners can add identifiers to templates, which they then apply to their data sources to override minConfidence or tags for identifiers within the template.

Create a Regex Identifier

Use case: Custom regex identifier

Attributes of the custom regex identifier

Attributes of all custom identifiers are provided on the Sensitive data discovery API page. However, attributes specific to the custom regex identifier are outlined in the table below.

Attribute

Description

Required

Create a custom regex identifier

Generate your API key on the API Keys tab on your profile page and save the API key somewhere secure. You will include this API key in the authorization header when you make a request to the Immuta API or use it to configure your instance with the Immuta CLI.

Save the custom regex identifier payload in a .json file.

{
  "name": "ACCOUNT_NUMBER_IDENTIFIER",
  "displayName": "Account Number Identifier",
  "description": "This identifier recognizes account numbers using a regex",
  "type": "regex",
  "config": {
    "regex": "^[0-9]{9}-[0-9]{3}-[0-9]{1}$",
    "minConfidence": 0.5,
    "tags": ["Discovered.account-number"]
  }
}

Create the identifier using one of these methods:

Immuta CLI

immuta api sdd/classifier -X POST --input ./example-payload.json

HTTP API

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: 12345678900000" \
    --data @example-payload.json \
    https://your-immuta-url.immuta.com/sdd/classifier

If the request is successful, you will receive a response that contains details about the identifier.

{
  "createdBy": {
    "id": 1,
    "name": "John",
    "email": "john@example.com"
  },
  "name": "ACCOUNT_NUMBER_IDENTIFIER",
  "displayName": "Account Number Identifier",
  "description": "This identifier recognizes account numbers using a regex",
  "type": "regex",
  "config": {
    "tags": [
      "Discovered.account-number"
    ],
    "regex": "[0-9]{9}-[0-9]{3}-[0-9]{1}",
    "minConfidence": 0.5
  },
  "id": 1,
  "createdAt": "2021-10-14T18:48:56.289Z",
  "updatedAt": "2021-10-14T18:48:56.289Z"
}

What's next

Continue to one of the following tutorials:

Run sensitive data discovery on data sources: Trigger SDD to run on specified data sources.
Create a template: Although only data governors can create identifiers, data owners can add identifiers to templates, which they then apply to their data sources to override minConfidence or tags for identifiers within the template.