Skip to content

You are viewing documentation for Immuta version 2023.1.

For the latest version, view our documentation for Immuta SaaS or the latest self-hosted version.

Create a Regex Identifier

Note

In previous documentation, identifier is referred to as classifier. The language is being updated to identifier to be more accurate and not conflate meaning with the Immuta data classification and frameworks feature.

Use case: Custom regex identifier

Scenario: You've listed Immuta's built-in identifiers for sensitive data discovery, but you discover there is no identifier that can automatically detect and tag columns that contain account numbers in your database.

A regular expression (regex) custom identifier allows you to create your own detectors that enable Immuta's sensitive data discovery to find matches based on a regex pattern. For example, if a table contains account numbers in the form of xxxxxxxxx-xxx-x, you could define a regex pattern in a custom identifier to identify and tag these columns. The tutorial below uses this scenario to illustrate creating this identifier.

Attributes of the custom regex identifier

Attributes of all custom identifiers are provided on the Sensitive data discovery API page. However, attributes specific to the custom regex identifier are outlined in the table below.

Attribute Description Required
name string Unique, request-friendly identifier name. Yes
displayName string Unique, human-readable identifier name. Yes
description string The identifier description. Yes
type string The type of identifier: regex. Yes
config object Includes config.minConfidence, config.tags, and config.regex. *See descriptions for these below. Yes
minConfidence* number When the detection confidence is at least this percentage, tags are applied. Yes
tags* array[string] The name of the tags to apply to the data source. Note: All tags must start with Discovered.. Yes
regex* string A case-insensitive regular expression to match against column values. Yes

Create a custom regex identifier

  1. Generate your API key on the API Keys tab on your profile page and save the API key somewhere secure. You will include this API key in the authorization header when you make a request to the Immuta API or use it to configure your instance with the Immuta CLI.

  2. Save the custom regex identifier payload in a .json file.

    {
      "name": "ACCOUNT_NUMBER_IDENTIFIER",
      "displayName": "Account Number Identifier",
      "description": "This identifier recognizes account numbers using a regex",
      "type": "regex",
      "config": {
        "regex": "^[0-9]{9}-[0-9]{3}-[0-9]{1}$",
        "minConfidence": 0.5,
        "tags": ["Discovered.account-number"]
      }
    }
    
  3. Create the identifier using one of these methods:

    Immuta CLI

    immuta api sdd/classifier -X POST --input ./example-payload.json
    

    HTTP API

    curl \
        --request POST \
        --header "Content-Type: application/json" \
        --header "Authorization: 12345678900000" \
        --data @example-payload.json \
        https://your-immuta-url.immuta.com/sdd/classifier
    
  4. If the request is successful, you will receive a response that contains details about the identifier.

    {
      "createdBy": {
        "id": 1,
        "name": "John",
        "email": "john@example.com"
      },
      "name": "ACCOUNT_NUMBER_IDENTIFIER",
      "displayName": "Account Number Identifier",
      "description": "This identifier recognizes account numbers using a regex",
      "type": "regex",
      "config": {
        "tags": [
          "Discovered.account-number"
        ],
        "regex": "[0-9]{9}-[0-9]{3}-[0-9]{1}",
        "minConfidence": 0.5
      },
      "id": 1,
      "createdAt": "2021-10-14T18:48:56.289Z",
      "updatedAt": "2021-10-14T18:48:56.289Z"
    }
    

What's next

Continue to one of the following tutorials:

  • Run sensitive data discovery on data sources: Trigger SDD to run on specified data sources.
  • Create a template: Although only data governors can create identifiers, data owners can add identifiers to templates, which they then apply to their data sources to override minConfidence or tags for identifiers within the template.