Skip to content

Classifier Reference

Audience: Data Owners and Governors

Content Summary: Immuta's Sensitive Data Discovery comes with built-in classifiers that are used to detect and apply tags to sensitive data. This page defines these built-in classifier references, which you can view by country or by data type. Note: Many, but not all, classifiers appear in both sections.

Classifier Descriptions: By Country

This section organizes Immuta's built-in classifiers by the country they're specific to.

Argentina

Classifier Description
ARGENTINA_DNI_NUMBER Detects strings consistent with Argentina National Identity (DNI) Number. Requires an eight-digit number with optional periods between the second and third and fifth and sixth digit.

Australia

Classifier Description
AUSTRALIA_MEDICARE_NUMBER Detects numeric strings consistent with Australian Medicare number. Requires a ten- or eleven-digit number. The starting digit must be between 2 and 6, inclusive. Optional spaces can be placed between the fourth and fifth and ninth and tenth digit. The optional 11th digit separated by a / can be present. A checksum is required.
AUSTRALIA_PASSPORT Detects strings consistent with Australian Passport number. An 8- or 9-character string is required, with a starting upper case character (N, E, D, F, A, C, U, X) or a two-character starting character (P followed by A, B, C, D, E, F, U, W, X, or Z) followed by seven digits.
AUSTRALIA_TAX_FILE_NUMBER Detects strings consistent with Australia Tax File number. Requires a nine-digit number with optional spaces between the third and fourth and sixth and seventh digits. A checksum is required.

Belgium

Classifier Description
BELGIUM_NATIONAL_ID_CARD_NUMBER Detects numeric strings consistent with Belgium's National ID card. Requires a twelve-digit number with hyphen (-) between the third and fourth digit and tenth and eleventh digits. A two checksum is required.

Brazil

Classifier Description
BRAZIL_CPF_NUMBER Detects a numeric string consistent with Brazil's CPF (Cadastro Pessoal de Pessoa Física) number. An eleven-digit numeric string with non-numeric separators after the third, sixth, and ninth digits. A two digit checksum is required.

Canada

Classifier Description
CANADA_BC_PHN Detects numeric strings consistent with British Columbia's Personal Health Number (PHN). Requires a ten-digit numeric string with optional hyphen (-) or spaces after the fourth and seventh digits.
CANADA_DRIVERS_LICENSE_NUMBER Detects strings consistent with Canadian driver's license numbers from each province. Looks for strings to be consistent with at least one of seven patterns.
CANADA_OHIP Detects alphanumeric strings consistent with Ontario's Health Insurance Plan (OHIP). Requires a twelve-digit alphanumeric code. Optional hyphens (-) or spaces can appear after the fourth, seventh, and tenth digits. The final two characters are a checksum.
CANADA_PASSPORT Detects strings consistent with the Canadian Passport Number format as described here.
CANADA_SOCIAL_INSURANCE_NUMBER Detects numeric strings consistent with the Canadian Social Insurance number format. Requires a nine-digit numeric string with optional hyphens or spaces after the third and sixth digit. The last digit is a checksum.
CANADA_QUEBEC_HIN Detects alphanumeric strings consistent with Quebec's Health Insurance Number (HIN). Requires four alphabetic characters followed by an optional space or hyphen (-), and then eight digits with an optional hyphen or space after the fourth digit.

Denmark

Classifier Description
DENMARK_CPR_NUMBER Detects numeric strings consistent with Personal Identification Number (CPR-number or Person-number). Requires a ten-digit number with either a DDMMYY-SSSS or DDMMYYSSSS format. The first six digits are an individual's birth date in Day, Month, Year format. The final four digits comprise the sequence number.

Finland

Classifier Description
FINLAND_NATIONAL_ID_NUMBER Detects a string consistent with Finland's National ID number. Requires an eleven-character string in a DDMMYYCZZZQ format. The first six digits are an individual's birth date in Day, Month, Year format. The C character is a century of birth indicator (+ for the years 1800-1899, - for years 1900-1999, and A for years 2000-2099). ZZZ is an individual ID number, and Q is a required checksum.

France

Classifier Description
FRANCE_CNI Detects numeric strings consistent with the French National ID card number (carte nationale d'identité). Requires a twelve-digit numeric string.
FRANCE_NIR Detects numeric strings consistent with France's National ID number (Numéro d'Inscription au Répertoire). Requires a fifteen-digit numeric string. An optional hyphen (-) or space can appear after the 13th digit. The 14th and 15th digits act as a checksum.
FRANCE_PASSPORT Detects alphanumeric strings consistent with the French Passport number. Requires two numbers followed by two upper case letters and ends with 5 digits.

Germany

Classifier Description
GERMANY_DRIVERS_LICENSE_NUMBER Detects alphanumeric strings consistent with Germany's Driver's License number. Requires an eleven-element string, with a digit or a letter followed by two digits, 6 digits or letters, one digit, and one digit or letter.
GERMANY_IDENTITY_CARD_NUMBER Detects alphanumeric strings consistent with Germany's Identity Card number. Requires a single letter followed by eight digits.

Spain

Classifier Description
SPAIN_DRIVERS_LICENSE_NUMBER Detects alphanumeric strings consistent with Spain's Driver's License number. Requires eight digits followed by a single letter or digit. The final digit acts as a checksum.
SPAIN_NIE_NUMBER Detects strings consistent with Spain's Foreigner Identification number. Requires an eight-character string. The initial character must be X, Y, or Z, followed by seven digits, then by an optional hyphen or space and a single checksum character.
SPAIN_NIF_NUMBER Detects strings consistent with Spain's Tax Identification number. Requires an eight-character string. Requires eight digits followed by an optional hyphen or space and a single checksum character.
SPAIN_PASSPORT Detects strings consistent with Spain's Passport number. Requires an eight- or nine-character string, starting with either two or three letters followed by six digits.

Sweden

Classifier Description
SWEDEN_NATIONAL_ID_NUMBER Detects numeric strings consistent with Sweden's Nation ID number. Requires a ten- or twelve-digit string that must start with a date in either the YYMMDD or YYYYMMDD formats. An optional - or + character then separates four ending digits. The final digit is a checksum.
SWEDEN_PASSPORT Detects numeric strings consistent with Sweden's Passport number. Requires an 8-digit number.

Thailand

Classifier Description
THAILAND_NATIONAL_ID_NUMBER Detects strings consistent with Thailand's National ID number. Requires a 13-digit number with optional spaces or hyphens (-) after the first, fifth, tenth, and twelfth digits. The final digit is a checksum.

United Kingdom

Classifier Description
UK_DRIVERS_LICENSE_NUMBER Detects alphanumeric strings consistent with the United Kingdom's Driver's License number. Requires either a 16- or 18-character string. The first five characters represent the driver's surname, padded with 9s, followed by a single digit for decade of birth, two digits for month of birth (incremented by 50 for female drivers), two digits for day of birth, one digit for year of birth, two letters, an arbitrary digit, and two digits. Two additional digits can be present for each license issuance.
UK_NATIONAL_INSURANCE_NUMBER Detects alphanumeric strings consistent with the United Kingdom's National Insurance number. Requires a nine-character string. The first two digits must be letters, followed by an optional space, then six digits with optional spaces or hyphens (-) every two digits, ending with a letter.
UK_PASSPORT Detects numeric strings consistent with the United Kingdom's passport number. Requires a nine-digit numeric string.
UK_TAXPAYER_REFERENCE Detects ten-digit numeric strings consistent with UK Taxpayer Reference (UTR) numbers. The final digit is a checksum.

United States

Classifier Description
US_ADOPTION_TAXPAYER_IDENTIFICATION_NUMBER Detects a numeric string consistent United States Adoption Taxpayer Identification Number (ATIN). Requires a string similar in format to a US Social Security Number, but starting with a 9 in the Area Number and having 93 as an allowed Group Number.
US_BANK_ROUTING_MICR Detects numeric string consistent with an American Bankers Association (ABA) Routing Number. Must be a nine-digit number starting with 0, 1, 2, 3, 6, or 7, followed by eight digits. The final digit is a checksum.
US_DEA_NUMBER Detects alphanumeric strings consistent with a Drug Enforcement Administration (DEA) number that is assigned to a health care provider. Must be a length of nine characters. The first two digits must be alphanumeric, and the last seven digits must be digits. The final digit is a checksum.
US_DRIVERS_LICENSE_NUMBER Detects strings consistent with some US Driver's license numbers.
US_EMPLOYER_IDENTIFICATION_NUMBER Detects numeric string consistent United States Employer Identification Number (EIN). Strings must contain nine digits with a hyphen after the second digit.
US_HEALTHCARE_NPI Detects numeric strings consistent with US National Provider Identifier (NPI). Strings must be either 10 or 15 digits with the final digit being a valid checksum.
US_INDIVIDUAL_TAXPAYER_IDENTIFICATION_NUMBER Detects a numeric string consistent United States Individual Taxpayer Identification Number (ITIN). Requires a string similar in format to a US Social Security Number, but starting with a 9 in the Area Number and having a limited set of allowed Group Numbers.
US_PASSPORT Detects numeric strings consistent with United States Passport number. Strings must contain nine digits. Columns should have a name or label consistent with a passport.
US_PREPARER_TAXPAYER_IDENTIFICATION_NUMBER Detects strings consistent with a Preparer Taxpayer ID number. Strings must have nine characters, starting with a P that is followed by 8 digits.
US_SOCIAL_SECURITY_NUMBER Detects strings consistent with a US Social Security Number.
US_STATE Detects strings consistent with either a full name or two-letter abbreviation of a US state or territory.
US_TOLLFREE_PHONE_NUMBER Detects strings consistent with a US toll-free telephone number. Allowed area codes are 800, 88+any digit, or 899.
US_VEHICLE_IDENTIFICATION_NUMBER Detects strings consistent with Vehicle Identification Numbers. A checksum is required as well as a valid World Manufacturer Identifier.

Classifier Descriptions: By Data Type

This section organizes Immuta's built-in classifiers by the type of data they represent.

Address and Phone Number

Classifier Description
US_STATE Detects strings consistent with either a full name or two-letter abbreviation of a US state or territory.
US_TOLLFREE_PHONE_NUMBER Detects strings consistent with a US toll-free telephone number. Allowed area codes are 800, 88+any digit, or 899.

Banking and Taxes

Classifier Description
AUSTRALIA_TAX_FILE_NUMBER Detects strings consistent with Australia Tax File number. Requires a nine-digit number with optional spaces between the third and fourth and sixth and seventh digits. A checksum is required.
UK_TAXPAYER_REFERENCE Detects ten-digit numeric strings consistent with UK Taxpayer Reference (UTR) numbers. The final digit is a checksum.
US_ADOPTION_TAXPAYER_IDENTIFICATION_NUMBER Detects a numeric string consistent United States Adoption Taxpayer Identification Number (ATIN). Requires a string similar in format to a US Social Security Number, but starting with a 9 in the Area Number and having 93 as an allowed Group Number.
US_BANK_ROUTING_MICR Detects numeric string consistent with an American Bankers Association (ABA) Routing Number. Must be a nine-digit number starting with 0, 1, 2, 3, 6, or 7, followed by eight digits. The final digit is a checksum.
US_EMPLOYER_IDENTIFICATION_NUMBER Detects numeric string consistent United States Employer Identification Number (EIN). Strings must contain nine digits with a hyphen after the second digit.
US_INDIVIDUAL_TAXPAYER_IDENTIFICATION_NUMBER Detects a numeric string consistent United States Individual Taxpayer Identification Number (ITIN). Requires a string similar in format to a US Social Security Number, but starting with a 9 in the Area Number and having a limited set of allowed Group Numbers.
US_PREPARER_TAXPAYER_IDENTIFICATION_NUMBER Detects strings consistent with a Preparer Taxpayer ID number. Strings must have nine characters, starting with a P that is followed by 8 digits.

Crypto

Classifier Description
BITCOIN_INVOICE_ADDRESS Detects strings consistent with the following Bitcoin Invoice Address formats: P2PKH, P2SH, and Bech32. P2PKH and P2SH must start with a 1 or a 3, respectively, followed by 25 - 34 alphanumeric characters, excluding l, I, O, and 0. Bech32 formats must begin with bc1 and be followed by 39 characters. To be identified, any addresses must have a valid checksum.

Drivers License and Vehicle Identification

Classifier Description
CANADA_DRIVERS_LICENSE_NUMBER Detects strings consistent with Canadian driver's license numbers from each province. Looks for strings to be consistent with at least one of seven patterns.
GERMANY_DRIVERS_LICENSE_NUMBER Detects alphanumeric strings consistent with Germany's Driver's License number. Requires an eleven-element string, with a digit or a letter followed by two digits, 6 digits or letters, one digit, and one digit or letter.
SPAIN_DRIVERS_LICENSE_NUMBER Detects alphanumeric strings consistent with Spain's Driver's License number. Requires eight digits followed by a single letter or digit. The final digit acts as a checksum.
UK_DRIVERS_LICENSE_NUMBER Detects alphanumeric strings consistent with the United Kingdom's Driver's License number. Requires either a 16- or 18-character string. The first five characters represent the driver's surname, padded with 9s, followed by a single digit for decade of birth, two digits for month of birth (incremented by 50 for female drivers), two digits for day of birth, one digit for year of birth, two letters, an arbitrary digit, and two digits. Two additional digits can be present for each license issuance.
US_DRIVERS_LICENSE_NUMBER Detects strings consistent with some US Driver's license numbers.
US_VEHICLE_IDENTIFICATION_NUMBER Detects strings consistent with Vehicle Identification Numbers. A checksum is required as well as a valid World Manufacturer Identifier.

Healthcare

Classifier Description
AUSTRALIA_MEDICARE_NUMBER Detects numeric strings consistent with Australian Medicare number. Requires a ten- or eleven-digit number. The starting digit must be between 2 and 6, inclusive. Optional spaces can be placed between the fourth and fifth and ninth and tenth digit. The optional 11th digit separated by a / can be present. A checksum is required.
CANADA_BC_PHN Detects numeric strings consistent with British Columbia's Personal Health Number (PHN). Requires a ten-digit numeric string with optional hyphen (-) or spaces after the fourth and seventh digits.
CANADA_OHIP Detects alphanumeric strings consistent with Ontario's Health Insurance Plan (OHIP). Requires a twelve-digit alphanumeric code. Optional hyphens (-) or spaces can appear after the fourth, seventh, and tenth digits. The final two characters are a checksum.
CANADA_QUEBEC_HIN Detects alphanumeric strings consistent with Quebec's Health Insurance Number (HIN). Requires four alphabetic characters followed by an optional space or hyphen (-), and then eight digits with an optional hyphen or space after the fourth digit.
US_DEA_NUMBER Detects alphanumeric strings consistent with a Drug Enforcement Administration (DEA) number that is assigned to a health care provider. Must be a length of nine characters. The first two digits must be alphanumeric, and the last seven digits must be digits. The final digit is a checksum.
US_HEALTHCARE_NPI Detects numeric strings consistent with US National Provider Identifier (NPI). Strings must be either 10 or 15 digits with the final digit being a valid checksum.

Insurance

Classifier Description
CANADA_SOCIAL_INSURANCE_NUMBER Detects numeric strings consistent with the Canadian Social Insurance number format. Requires a nine-digit numeric string with optional hyphens or spaces after the third and sixth digit. The last digit is a checksum.
UK_NATIONAL_INSURANCE_NUMBER Detects alphanumeric strings consistent with the United Kingdom's National Insurance number. Requires a nine-character string. The first two digits must be letters, followed by an optional space, then six digits with optional spaces or hyphens (-) every two digits, ending with a letter.

Passport

Classifier Description
AUSTRALIA_PASSPORT Detects strings consistent with Australian Passport number. An 8- or 9-character string is required, with a starting upper case character (N, E, D, F, A, C, U, X) or a two-character starting character (P followed by A, B, C, D, E, F, U, W, X, or Z) followed by seven digits.
CANADA_PASSPORT Detects strings consistent with the Canadian Passport Number format as described here.
FRANCE_PASSPORT Detects alphanumeric strings consistent with the French Passport number. Requires two numbers followed by two upper case letters and ends with 5 digits.
SPAIN_PASSPORT Detects strings consistent with Spain's Passport number. Requires an eight- or nine-character string, starting with either two or three letters followed by six digits.
SWEDEN_PASSPORT Detects numeric strings consistent with Sweden's Passport number. Requires an 8-digit number.
UK_PASSPORT Detects numeric strings consistent with the United Kingdom's passport number. Requires a nine-digit numeric string.
US_PASSPORT Detects numeric strings consistent with United States Passport number. Strings must contain nine digits. Columns should have a name or label consistent with a passport.

Personal Identification Number

Classifier Description
ARGENTINA_DNI_NUMBER Detects strings consistent with Argentina National Identity (DNI) Number. Requires an eight-digit number with optional periods between the second and third and fifth and sixth digit.
BELGIUM_NATIONAL_ID_CARD_NUMBER Detects numeric strings consistent with Belgium's National ID card. Requires a twelve-digit number with hyphen (-) between the third and fourth digit and tenth and eleventh digits. A two checksum is required.
BRAZIL_CPF_NUMBER Detects a numeric string consistent with Brazil's CPF (Cadastro Pessoal de Pessoa Física) number. An eleven-digit numeric string with non-numeric separators after the third, sixth, and ninth digits. A two digit checksum is required.
DENMARK_CPR_NUMBER Detects numeric strings consistent with Personal Identification Number (CPR-number or Person-number). Requires a ten-digit number with either a DDMMYY-SSSS or DDMMYYSSSS format. The first six digits are an individual's birth date in Day, Month, Year format. The final four digits comprise the sequence number.
FINLAND_NATIONAL_ID_NUMBER Detects a string consistent with Finland's National ID number. Requires an eleven-character string in a DDMMYYCZZZQ format. The first six digits are an individual's birth date in Day, Month, Year format. The C character is a century of birth indicator (+ for the years 1800-1899, - for years 1900-1999, and A for years 2000-2099). ZZZ is an individual ID number, and Q is a required checksum.
FRANCE_CNI Detects numeric strings consistent with the French National ID card number (carte nationale d'identité). Requires a twelve-digit numeric string.
FRANCE_NIR Detects numeric strings consistent with France's National ID number (Numéro d'Inscription au Répertoire). Requires a fifteen-digit numeric string. An optional hyphen (-) or space can appear after the 13th digit. The 14th and 15th digits act as a checksum.
GERMANY_IDENTITY_CARD_NUMBER Detects alphanumeric strings consistent with Germany's Identity Card number. Requires a single letter followed by eight digits.
SPAIN_NIE_NUMBER Detects strings consistent with Spain's Foreigner Identification number. Requires an eight-character string. The initial character must be X, Y, or Z, followed by seven digits, then by an optional hyphen or space and a single checksum character.
SPAIN_NIF_NUMBER Detects strings consistent with Spain's Tax Identification number. Requires an eight-character string. Requires eight digits followed by an optional hyphen or space and a single checksum character.
SWEDEN_NATIONAL_ID_NUMBER Detects numeric strings consistent with Sweden's Nation ID number. Requires a ten- or twelve-digit string that must start with a date in either the YYMMDD or YYYYMMDD formats. An optional - or + character then separates four ending digits. The final digit is a checksum.
THAILAND_NATIONAL_ID_NUMBER Detects strings consistent with Thailand's National ID number. Requires a 13-digit number with optional spaces or hyphens (-) after the first, fifth, tenth, and twelfth digits. The final digit is a checksum.
US_SOCIAL_SECURITY_NUMBER Detects strings consistent with a US Social Security Number.