Immuta is pre-configured with a set of tags that can be used to write global policies before data sources even exist. See a list of the built-in Discovered tags below and the Built-in identifier reference page for information about where these tags will be applied by the built-in identifiers.
All the tags below belong to the Country
parent. For example, the full tag name will appear as Discovered . Country . Argentina
.
Argentina
This tag is applied to data recognized as specific to Argentina (e.g., an Argentina National Identity Number).
Australia
This tag is applied to data recognized as specific to Australia (e.g., an Australian Medicare number or Australian passport number).
Belgium
This tag is applied to data recognized as specific to Belgium (e.g., a Belgium National ID card).
Brazil
This tag is applied to data recognized as specific to Brazil (e.g., a Brazil CPF number).
Canada
This tag is applied to data recognized as specific to Canada (e.g., a British Columbia PHN, OHIP string, Canadian passport number, or Quebec's HIN).
Chile
This tag is for data specific to Chile.
China
This tag is for data specific to China.
Colombia
This tag is for data specific to Colombia.
Denmark
This tag is applied to data recognized as specific to Denmark (e.g., a Denmark CPR or Person-number).
Finland
This tag is applied to data recognized as specific to Finland (e.g., a Finland National ID number).
France
This tag is applied to data recognized as specific to France (e.g., a French National ID card number, France National ID number, or French passport number).
Germany
This tag is applied to data recognized as specific to Germany (e.g., a German driver's license number or a Germany Identity Card number).
Hong Kong
This tag is for data specific to Hong Kong.
India
This tag is for data specific to India.
Indonesia
This tag is for data specific to Indonesia.
Japan
This tag is for data specific to Japan.
Korea
This tag is for data specific to Korea.
Mexico
This tag is for data specific to Mexico.
Netherlands
This tag is for data specific to Netherlands.
Norway
This tag is for data specific to Norway.
Paraguay
This tag is for data specific to Paraguay.
Peru
This tag is for data specific to Peru.
Poland
This tag is for data specific to Poland.
Singapore
This tag is for data specific to Singapore.
Spain
This tag is applied to data recognized as specific to Spain (e.g., Spain Foreigner Identification number, Spain Tax Identification number, or Spanish passport number).
Sweden
This tag is applied to data recognized as specific to Sweden (e.g., a Sweden National ID number or Swedish passport number).
Taiwan
This tag is for data specific to Taiwan.
Thailand
This tag is applied to data recognized as specific to Thailand (e.g., a Thailand National ID number).
Turkey
This tag is for data specific to Turkey.
UK
This tag is applied to data recognized as specific to the United Kingdom (e.g., a United Kingdom driver's license number, United Kingdom National Insurance number, or United Kingdom Taxpayer Reference number).
Uruguay
This tag is for data specific to Uruguay.
US
This tag is applied to data recognized as specific to the U.S. (e.g., an FDA code, United States ATIN, ABA routing number, DEA number, United States EIN, United States NPI number, United States ITIN, United States passport number, United States Preparer Taxpayer ID number, United States SSN, United States territory or state, or United States toll-free phone number).
Venezuela
This tag is for data specific to Venezuela.
All the tags below belong to the Entity
parent. For example, the full tag name will appear as Discovered . Entity . Aadhaar Individual
.
Aadhaar Individual
This tag is for Aadhaar Individual numbers.
Adoption Taxpayer ID Number
This tag is applied to data recognized as a United States Adoption Taxpayer Identification number.
Age
This tag is applied to data recognized as an age.
Bank Account
This tag is for bank account numbers.
Bank Routing MICR
This tag is applied to data recognized as an American Bankers Association routing number.
Bankers CUSIP ID
This tag is for CUSP identification numbers for stocks and bonds.
British Columbia Health Network Number
This tag is applied to data recognized as British Columbia's Personal Health Number.
BSN Number
This tag is for Netherlands citizen service number.
BSN Number
This tag is for Netherlands citizen service numbers.
CDC Number
This tag is for CDC numbers.
CDI Number
This tag is for CDI numbers.
CIC Number
This tag is for CIC numbers.
CNI
This tag is applied to data recognized as a French National ID card number.
CPF Number
This tag is applied to data recognized as Brazil's CPF number.
CPR Number
This tag is applied to data recognized as Denmark's Personal Identification number.
Credit Card Number
This tag is applied to data recognized as a credit card number.
CURP Number
This tag is for Mexican CURP numbers.
CRYPTO
This tag is applied to data recognized as a Bitcoin Invoice Address.
Date
This tag is applied to data recognized as a date.
Date of Birth
This tag is applied to data recognized as a date of birth.
DEA Number
This tag is applied to data recognized as the DEA number of a healthcare provider.
DNI Number
This tag is applied to data recognized as an Argentina National Identity number.
Domain Name
This tag is applied to data recognized as a domain.
Driver's License Number
This tag is applied to data recognized as driver's licenses numbers from Germany or the United Kingdom.
Electronic Mail Address
This tag is applied to data recognized as an email address.
Employer ID Number
This tag is applied to data recognized as an Employer Identification number from the United States.
Ethnic Group
This tag is applied to data recognized as an ethnic group.
FDA Code
This tag is applied to data recognized as the code of a drug or ingredient registered with the FDA.
Gender
This tag is applied to data recognized as a gender.
GST Individual
This tag is for Indian GST individual numbers.
Healthcare NPI
This tag is applied to data recognized as a United States National Provider Identifier number.
IBAN Code
This tag is applied to data recognized as an International Bank Account number.
ICD10 Code
This tag is applied to data recognized as an ICD10 code from the International Statistical Classification of Diseases and Related Health Problems.
ICD9 Code
This tag is for ICD9 codes from the International Statistical Classification of Diseases and Related Health Problems.
ID Number
This tag is for any ID number.
Identity Card Number
This tag is applied to data recognized as an identity card number from Germany.
IMEI
This tag is applied to data recognized as an International Mobile Equipment Identity number.
Individual Number
This tag is for any individual number.
Individual Taxpayer ID Number
This tag is applied to data recognized as a United States Individual Taxpayer Identification Number.
IP Address
This tag is applied to data recognized as an IP address.
Location
This tag is applied to data recognized as a country, state, address, or municipality.
MAC Address
This tag is applied to data recognized as a Media Access Control address.
MAC Address Local
This tag is applied to data recognized as a local Media Access Control address.
Medicare Number
This tag is applied to data recognized as a Medicare number from Australia.
National Health Service Number
This tag is for national health service numbers.
National ID Card Number
This tag is applied to data recognized as a national ID card number from Belgium.
National ID Number
This tag is applied to data recognized as a national ID number from Finland, Sweden, and Thailand.
National Insurance Number
This tag is applied to data recognized as a United Kingdom national insurance number.
National Registration ID Number
This tag is for national registration ID numbers.
NI Number
This tag is for Norway NI numbers.
NIE Number
This tag is applied to data recognized as a Spanish Foreigner Identification number.
NIF Number
This tag is applied to data recognized as a Spanish Tax Identification number.
NIK Number
This tag is applied to data recognized as an Indonesian personal identification number (NIK).
NIR
This tag is applied to data recognized as France's National ID number.
Ontario Health Insurance Number
This tag is applied to data recognized as part of an Ontario Health Insurance Plan string.
PAN Individual
This tag is for PAN Individual numbers.
Passport
This tag is applied to data recognized as a passport number from Australia, Canada, France, Spain, Sweden, and the United States.
Person Name
This tag is applied to data recognized as people's names.
PESEL Number
This tag is for Poland PESEL numbers.
Postal Code
This tag is applied to data recognized as a United States zip code.
Preparer Taxpayer ID Number
This tag is applied to data recognized as a Preparer Taxpayer ID number.
Quebec Health Insurance Number
This tag is applied to data recognized as a Quebec Health Insurance Number.
Resident ID Number
This tag is for China Resident ID numbers.
RRN
This tag is for Korea Resident Registration numbers.
Social Insurance Number
This tag is applied to data recognized as a social insurance number.
Social Security Number
This tag is applied to data recognized as a United States Social Security Number.
State
This tag is applied to data recognized as a state of the United States.
Swift Code
This tag is applied to data recognized as a SWIFT code.
Tax File Number
This tag is applied to data recognized as a tax file number.
Taxpayer ID Number
This tag is applied to data recognized as Taxpayer ID numbers from the United States.
Taxpayer Reference
This tag is applied to data recognized as United Kingdom Taxpayer Reference numbers.
Telephone Number
This tag is applied to data recognized as a phone number.
Tollfree Telephone Number
This tag is applied to data recognized as a United States toll-free phone number.
URL
This tag is applied to data recognized as a URL.
Vehicle Identifier or Serial Number
This tag is applied to data recognized as a VIN.
Deprecation notice
The following identifier tags have been deprecated. New SaaS tenants will not see these tags applied by SDD. Current tenants relying on these tags for policies should contact their Immuta representative for support before these tags are removed from the product.
None of the tags below have an additional parent or child tag. For example, the full tag name will appear as Discovered . Identifier Direct
.
Identifier Direct
This tag is applied to data recognized as a direct identifier that can be uniquely associated with an individual. Examples of direct identifiers include: name, username, email, official individual identification numbers such as passport or identity card numbers, or privately issued individual identification numbers such as a student ID.
Identifier Indirect
This tag is applied to data recognized as an indirect identifier that is not uniquely associated with an individual. However this indirect identifier could become distinguishable when combined with other attributes. Examples of indirect identifiers include: age and affinity.
Identifier Undetermined
This tag is applied to data which could be an identifier associated with an individual.
Deprecation notice
The following identifier tags have been deprecated. New SaaS tenants will not see these tags applied by SDD. Current tenants relying on these tags for policies should contact their Immuta representative for support before these tags are removed from the product.
None of the tags below have an additional parent or child tag. For example, the full tag name will appear as Discovered . PCI
.
PCI
This tag is applied to data recognized as payment card information.
PHI
This tag is applied to data recognized as personal health data.
PII
This tag is applied to data recognized as personally identifiable information.
Immuta comes with a set of built-in identifiers that look for common data types. These identifiers were written by Immuta's research and development team and cannot be deleted or edited by users. However, users can add these built-in identifiers to their own frameworks and edit the tags applied by them.
Identifiers must match at least 90% of the sampled data to be tagged, with two exceptions noted below. See the How competitive pattern analysis works guide for more information about sampling and thresholds.
Deprecation notice
The following Discovered tags have been deprecated:
Discovered.Identifier Direct
Discovered.Identifier Indirect
Discovered.Identifier Undetermined
Discovered.PCI
Discovered.PHI
Discovered.PII
New SaaS tenants will not see these tags applied by SDD. Current tenants relying on these tags for policies should contact their Immuta representative for support before these tags are removed from the product in December 2024.
Matches numeric strings between 10 and 199.
Discovered.Entity.Age
ARGENTINA_DNI_NUMBER
Matches strings consistent with Argentina National Identity (DNI) Number. Requires an eight-digit number with optional periods between the second and third and fifth and sixth digit.
Discovered.Country.Argentina
Discovered.Entity.DNI Number
AUSTRALIA_MEDICARE_NUMBER
Matches numeric strings consistent with Australian Medicare number. Requires a ten- or eleven-digit number. The starting digit must be between 2 and 6, inclusive. Optional spaces can be placed between the fourth and fifth and ninth and tenth digit. The optional 11th digit separated by a /
can be present. A checksum is required.
Discovered.Country.Australia
Discovered.Entity.Medicare Number
AUSTRALIA_PASSPORT
Matches strings consistent with Australian Passport number. An 8- or 9-character string is required, with a starting upper case character (N, E, D, F, A, C, U, X) or a two-character starting character (P followed by A, B, C, D, E, F, U, W, X, or Z) followed by seven digits.
Discovered.Country.Australia
Discovered.Entity.Passport
BELGIUM_NATIONAL_ID_CARD_NUMBER
Matches numeric strings consistent with Belgium's National ID card. Requires a twelve-digit number with hyphen (-
) between the third and fourth digit and tenth and eleventh digits. A two checksum is required.
Discovered.Country.Belgium
Discovered.Entity.National ID Card Number
BITCOIN_INVOICE_ADDRESS
Matches strings consistent with the following Bitcoin Invoice Address formats: P2PKH, P2SH, and Bech32. P2PKH and P2SH must start with a 1 or a 3, respectively, followed by 25 - 34 alphanumeric characters, excluding l, I, O, and 0. Bech32 formats must begin with bc1
and be followed by 39 characters. To be identified, any addresses must have a valid checksum.
Discovered.Entity.CRYPTO
BRAZIL_CPF_NUMBER
Matches a numeric string consistent with Brazil's CPF (Cadastro Pessoal de Pessoa Física) number. An eleven-digit numeric string with non-numeric separators after the third, sixth, and ninth digits. A two digit checksum is required.
Discovered.Country.Brazil
Discovered.Entity.CPF Number
CANADA_BC_PHN
Matches numeric strings consistent with British Columbia's Personal Health Number (PHN). Requires a ten-digit numeric string with optional hyphen (-
) or spaces after the fourth and seventh digits.
Discovered.Country.Canada
Discovered.Entity.British Columbia Health Network Number
CANADA_OHIP
Matches alphanumeric strings consistent with Ontario's Health Insurance Plan (OHIP). Requires a twelve-digit alphanumeric code. Optional hyphens (-
) or spaces can appear after the fourth, seventh, and tenth digits. The final two characters are a checksum.
Discovered.Country.Canada
Discovered.Entity.Ontario Health Insurance Number
CANADA_PASSPORT
Discovered.Country.Canada
Discovered.Entity.Passport
CANADA_QUEBEC_HIN
Matches alphanumeric strings consistent with Quebec's Health Insurance Number (HIN). Requires four alphabetic characters followed by an optional space or hyphen (-
), and then eight digits with an optional hyphen or space after the fourth digit.
Discovered.Country.Canada
Discovered.Entity.Quebec Health Insurance Number
CREDIT_CARD_NUMBER
Matches strings consistent with a credit card number with prefixes matching major credit card companies. Must include a valid checksum.
Discovered.Entity.Credit Card Number
DATE
Matches strings consistent with dates. These can include days of the week, dates, and date times.
Discovered.Entity.Date
Matches numeric strings consistent with Personal Identification Number (CPR-number or Person-number). Requires a ten-digit number with either a DDMMYY-SSSS
or DDMMYYSSSS
format. The first six digits are an individual's birth date in Day, Month, Year format. The final four digits comprise the sequence number.
Discovered.Country.Denmark
Discovered.Entity.CPR Number
DOMAIN_NAME
Matches domain names using a very broad pattern.
Discovered.Entity.Domain Name
EMAIL_ADDRESS
Detect strings consistent with an email address. Usernames are required to be fewer than 255 characters, follow by @a
, a domain of fewer than 255 characters, and a top level domain of between 2 and 20 characters.
Discovered.Entity.Electronic Mail Address
ETHNIC_GROUP
Matches strings consistent with the US Census race designations.
Discovered.Entity.Ethnic Group
FDA_CODE
Matches a string consistent with a drug or ingredient registered with Food and Drug Administration (FDA). Must start with between 4 to 6 digits, followed by a hyphen, followed by 3 to 4 digits, followed by a hyphen, and finishing with one to two digits.
Discovered.Country.US
Discovered.Entity.FDA Code
Matches a string consistent with Finland's National ID number. Requires an eleven-character string in a DDMMYYCZZZQ
format. The first six digits are an individual's birth date in Day, Month, Year format. The C
character is a century of birth indicator (+
for the years 1800-1899, -
for years 1900-1999, and A
for years 2000-2099). ZZZ
is an individual ID number, and Q
is a required checksum.
Discovered.Country.Finland
Discovered.PHI
Discovered.Entity.National ID Number
Matches numeric strings consistent with the French National ID card number (carte nationale d'identité). Requires a twelve-digit numeric string.
Discovered.Country.France
Discovered.Entity.CNI
FRANCE_NIR
Matches numeric strings consistent with France's National ID number (Numéro d'Inscription au Répertoire). Requires a fifteen-digit numeric string. An optional hyphen (-
) or space can appear after the 13th digit. The 14th and 15th digits act as a checksum.
Discovered.Country.France
Discovered.Entity.NIR
FRANCE_PASSPORT
Matches alphanumeric strings consistent with the French Passport number. Requires two numbers followed by two upper case letters and ends with 5 digits.
Discovered.Country.France
Discovered.Entity.Passport
GENDER
Matches strings consistent with gender or gender abbreviations.
Discovered.Entity.Gender
GERMANY_DRIVERS_LICENSE_NUMBER
Matches alphanumeric strings consistent with Germany's Driver's License number. Requires an eleven-element string, with a digit or a letter followed by two digits, 6 digits or letters, one digit, and one digit or letter.
Discovered.Country.Germany
Discovered.Entity.Drivers License Number
Matches alphanumeric strings consistent with Germany's Identity Card number. Requires a single letter followed by eight digits.
Discovered.Country.Germany
Discovered.Entity.Identity Card Number
IBAN_CODE
Matches strings consistent with an International Bank Account Number (IBAN). Must contain a valid country code.
Discovered.Entity.IBAN Code
ICD10_CODE
Matches strings consistent with codes from the International Statistical Classification of Diseases and Related Health Problems (ICD), as drawn from the Clinical Modification lexicon from the year 2020.
Discovered.Entity.ICD10 Code
IMEI_HARDWARE_ID
Matches strings consistent with an International Mobile Equipment Identity (IMEI) number. Must contain 15 digits with optional hyphens or spaces after the second, 8th, and 14th digits.
Discovered.Entity.IMEI
IP_ADDRESS
Matches IP Addresses in the V4 and V6 formats.
Discovered.Entity.IP Address
LOCATION
Matches strings consistent with Countries or Municipalities. By default focuses on locations in the United States.
Discovered.Entity.Location
MAC_ADDRESS
Matches strings consistent with a Media Access Control (MAC) address. Must contain twelve hexadecimal digits, with every two digits separated by a colon.
Discovered.Entity.MAC Address
MAC_ADDRESS_LOCAL
Matches strings consistent with a local Media Access Control (MAC) address.
Discovered.Entity.MAC Address Local
PERSON_NAME
Matches strings consistent with a dictionary of people's names. Names are drawn from the US Social Security database. This identifier must match at least 45% of the data sampled.
Discovered.Entity.Person Name
PHONE_NUMBER
Matches strings consistent with telephone numbers. Primarily looks for strings consistent with the United States telephone numbers naming convention.
Discovered.Entity.Telephone Number
POSTAL_CODE
Matches strings consistent with a valid US zip code with an optional +4. Only valid 5 digit zip codes are detected.
Discovered.Entity.Postal Code
Matches strings consistent with Spain's Foreigner Identification number. Requires an eight-character string. The initial character must be X, Y, or Z, followed by seven digits, then by an optional hyphen or space and a single checksum character.
Discovered.Country.Spain
Discovered.Entity.NIE Number
SPAIN_NIF_NUMBER
Matches strings consistent with Spain's Tax Identification number. Requires an eight-character string. Requires eight digits followed by an optional hyphen or space and a single checksum character.
Discovered.Country.Spain
Discovered.Entity.NIF Number
SPAIN_PASSPORT
Matches strings consistent with Spain's Passport number. Requires an eight- or nine-character string, starting with either two or three letters followed by six digits.
Discovered.Country.Spain
Discovered.Entity.Passport
STREET_ADDRESS
Matches strings consistent with street addresses. Primarily looks for strings consistent with the United States street naming convention. This identifier must match at least 80% of the data sampled.
Discovered.Entity.Location
Matches numeric strings consistent with Sweden's Nation ID number. Requires a ten- or twelve-digit string that must start with a date in either the YYMMDD
or YYYYMMDD
formats. An optional -
or +
character then separates four ending digits. The final digit is a checksum.
Discovered.Country.Sweden
Discovered.Entity.National ID Number
Matches numeric strings consistent with Sweden's Passport number. Requires an 8-digit number.
Discovered.Country.Sweden
Discovered.Entity.Passport
SWIFT_CODE
Matches alphanumeric strings consistent with a SWIFT code (or Bank Identifier Code (BIC)) format.
Discovered.Entity.Swift Code
Matches strings consistent with Thailand's National ID number. Requires a 13-digit number with optional spaces or hyphens (-
) after the first, fifth, tenth, and twelfth digits. The final digit is a checksum.
Discovered.Country.Thailand
Discovered.Entity.National ID Number
TIME
Matches strings consistent with times. Can contain both date and time pieces.
Discovered.Entity.Date
UK_DRIVERS_LICENSE_NUMBER
Matches alphanumeric strings consistent with the United Kingdom's Driver's License number. Requires either a 16- or 18-character string. The first five characters represent the driver's surname, padded with 9
s, followed by a single digit for decade of birth, two digits for month of birth (incremented by 50 for female drivers), two digits for day of birth, one digit for year of birth, two letters, an arbitrary digit, and two digits. Two additional digits can be present for each license issuance.
Discovered.Country.UK
Discovered.Entity.Drivers License Number
UK_NATIONAL_INSURANCE_NUMBER
Matches alphanumeric strings consistent with the United Kingdom's National Insurance number. Requires a nine-character string. The first two digits must be letters, followed by an optional space, then six digits with optional spaces or hyphens (-
) every two digits, ending with a letter.
Discovered.Country.UK
Discovered.Entity.National Insurance Number
Matches ten-digit numeric strings consistent with UK Taxpayer Reference (UTR) numbers. The final digit is a checksum.
Discovered.Country.UK
Discovered.Entity.Taxpayer Reference
URL
Matches string consistent with a Uniform Resource Locator (URL). String must begin with http://
, https://
, ftp://
, file:///
, or mailto:
, followed by a string and ending with a top level domain of no more than 128 characters.
Discovered.Entity.URL
US_ADOPTION_TAXPAYER_IDENTIFICATION_NUMBER
Matches a numeric string consistent United States Adoption Taxpayer Identification Number (ATIN). Requires a string similar in format to a US Social Security Number, but starting with a 9 in the Area Number and having 93 as an allowed Group Number.
Discovered.Country.US
Discovered.Entity.Adoption Taxpayer ID Number
Matches numeric string consistent with an American Bankers Association (ABA) Routing Number. Must be a nine-digit number starting with 0, 1, 2, 3, 6, or 7, followed by eight digits. The final digit is a checksum.
Discovered.Country.US
Discovered.Entity.Bank Routing MICR
US_DEA_NUMBER
Matches alphanumeric strings consistent with a Drug Enforcement Administration (DEA) number that is assigned to a health care provider. Must be a length of nine characters. The first two digits must be alphanumeric, and the last seven digits must be digits. The final digit is a checksum.
Discovered.Country.US
Discovered.Entity.DEA Number
US_EMPLOYER_IDENTIFICATION_NUMBER
Matches numeric string consistent United States Employer Identification Number (EIN). Strings must contain nine digits with a hyphen after the second digit.
Discovered.Country.US
Discovered.Entity.Employer ID Number
US_HEALTHCARE_NPI
Matches numeric strings consistent with US National Provider Identifier (NPI). Strings must be either 10 or 15 digits with the final digit being a valid checksum.
Discovered.Country.US
Discovered.Entity.Healthcare NPI
US_INDIVIDUAL_TAXPAYER_IDENTIFICATION_NUMBER
Matches a numeric string consistent United States Individual Taxpayer Identification Number (ITIN). Requires a string similar in format to a US Social Security Number, but starting with a 9 in the Area Number and having a limited set of allowed Group Numbers.
Discovered.Country.US
Discovered.Entity.Individual Taxpayer ID Number
Matches numeric strings consistent with United States Passport number. Strings must contain nine digits.
Discovered.Country.US
Discovered.Entity.Passport
US_PREPARER_TAXPAYER_IDENTIFICATION_NUMBER
Matches strings consistent with a Preparer Taxpayer ID number. Strings must have nine characters, starting with a P
that is followed by 8 digits.
Discovered.Country.US
Discovered.Entity.Preparer Taxpayer ID Number
US_SOCIAL_SECURITY_NUMBER
Matches strings consistent with a US Social Security Number. Strings must contain nine digits and comprise three parts: the three left-most digits designating the area number, the middle two digits designating the group number, and the four right-most digits designating the serial number. For a column to be tagged, none of these parts can contain all zeroes, and area numbers must not be 666 or in the range of 900-999.
Discovered.Country.US
Discovered.Entity.Social Security Number
US_STATE
Matches strings consistent with either a full name or two-letter abbreviation of a US state or territory.
Discovered.Country.US
Discovered.Entity.State
Matches strings consistent with a US toll-free telephone number. Allowed area codes are 800, 88+any digit, or 899.
Discovered.Country.US
Discovered.Entity.Tollfree Telephone Number
VEHICLE_IDENTIFICATION_NUMBER
Matches strings consistent with Vehicle Identification Numbers. A checksum is required as well as a valid World Manufacturer Identifier.
Discovered.Country.US
Discovered.Entity.Vehicle Identifier or Serial Number
AGE
Matches strings consistent with the Canadian Passport Number format as .
DENMARK_CPR_NUMBER
FINLAND_NATIONAL_ID_NUMBER
FRANCE_CNI
GERMANY_IDENTITY_CARD_NUMBER
SPAIN_NIE_NUMBER
SWEDEN_NATIONAL_ID_NUMBER
SWEDEN_PASSPORT
THAILAND_NATIONAL_ID_NUMBER
UK_TAXPAYER_REFERENCE
US_BANK_ROUTING_MICR
US_PASSPORT
US_TOLLFREE_PHONE_NUMBER
Public preview
This feature is available to all tenants. Reach out to your Immuta support professional to use this feature.
Immuta comes with a pack of built-in identifiers that look for common data types. And since the first pack was released, improvements have been made. These improvements are now available in this improved pack, which includes some unchanged identifiers, but also many new and improved versions of legacy identifiers. These identifiers were written by Immuta's research and development team and cannot be deleted or edited by users. However, users can add these built-in identifiers to their own frameworks and edit the tags applied by them.
Identifiers must match at least 90% of the sampled data to be tagged, with three exceptions noted below. See the How competitive pattern analysis works guide for more information about sampling and thresholds.
ARGENTINA_DNI_NUMBER
Detects strings consistent with Argentina's National Identity (DNI) Number. Requires an eight-digit number with periods after the second and fifth digits.
Discovered.Country.Argentina
Discovered.Entity.DNI Number
Discovered.Country.Australia
Discovered.Entity.Medicare Number
Discovered.Country.Australia
Discovered.Entity.Passport
BELGIUM_NATIONAL_ID_CARD_NUMBER
Detects numeric strings consistent with Belgium's National ID card. Requires a twelve-digit number with a required hyphen (-
) between the third and fourth digits. Allows for an optional hyphen between the tenth and eleventh digits.
Discovered.Country.Belgium
Discovered.Entity.National ID Card Number
BELGIUM_NATIONAL_REGISTRATION_NUMBER New
Discovered.Country.Belgium
Discovered.Entity.National Registration Number
BITCOIN_INVOICE_ADDRESS
Detects strings consistent with the following Bitcoin Invoice Address formats: P2PKH, P2SH, and Bech32.
Discovered.Entity.CRYPTO
Discovered.Country.Brazil
Discovered.Entity.CPF Number
CANADA_BC_PHN
Detects numeric strings consistent with British Columbia's Personal Health Number (PHN). Requires a ten-digit numeric string with hyphens (-
) or spaces after the fourth and seventh digits.
Discovered.Country.Canada
Discovered.Entity.British Columbia Health Network Number
CANADA_OHIP
Detects alphanumeric strings consistent with Ontario's Health Insurance Plan (OHIP). Requires a twelve-digit capitalized alphanumeric code. Optional hyphens (-
) or spaces can appear after the fourth, seventh, and tenth digits.
Discovered.Country.Canada
Discovered.Entity.Ontario Health Insurance Number
Discovered.Country.Canada
Discovered.Entity.Passport
CANADA_QUEBEC_HIN
Detects alphanumeric strings consistent with Quebec's Health Insurance Number (HIN). Requires four alphabetic characters followed by an optional space or hyphen (-
), and then eight digits with an optional hyphen or space after the fourth digit.
Discovered.Country.Canada
Discovered.Entity.Quebec Health Insurance Number
Detects strings consistent with the ISO 3166 format for countries and high population US and Canadian cities.
Discovered.Entity.Location
Detects strings consistent with a credit card number with prefixes matching major credit card companies.
Discovered.Entity.Credit Card Number
Discovered.Entity.Date
Detects strings that begin with a letter and are no more than 225 characters. A full domain can have one to four labels separated by a .
. Each label can be one to 63 alphanumeric characters long. And each label after the first must be in the dictionary list of possible labels.
Discovered.Entity.Domain Name
EMAIL_ADDRESS
Detect strings consistent with an email address. Usernames are required to be fewer than 255 characters, follow by @
, a domain of fewer than 255 characters, and a top level domain of between 2 and 20 characters.
Discovered.Entity.Electronic Mail Address
ETHNIC_GROUP
Discovered.Entity.Ethnic Group
Detects a string consistent with a drug or ingredient registered with the Food and Drug Administration (FDA). Must start with between 4 to 5 digits, followed by a hyphen, followed by 3 to 4 digits, followed by a hyphen, and finishing with 1 to 2 digits.
Discovered.Country.US
Discovered.Entity.FDA Code
Detects numeric strings consistent with France's National ID number (Numéro d'Inscription au Répertoire). Requires a fifteen-digit numeric string. An optional hyphen (-) or space can appear after the 13th digit.
Discovered.Country.France
Discovered.Entity.NIR
FRANCE_PASSPORT
Detects alphanumeric strings consistent with the French Passport number. Requires two numbers followed by two upper-case letters and ends with five digits.
Discovered.Country.France
Discovered.Entity.Passport
Discovered.Entity.Gender
GERMANY_DRIVERS_LICENSE_NUMBER
Detects alphanumeric strings consistent with Germany's driver's license number. Requires an eleven-element string of the format CDDCCCCCCDC where C is an upper-case Latin letter and D is a numeric digit.
Discovered.Country.Germany
Discovered.Entity.Drivers License Number
Discovered.Country.UK
Discovered.Entity.Drivers License Number
IBAN_CODE
Detects strings consistent with an International Bank Account Number (IBAN). Requires a string in the form ZZ-DD-BBAN, where ZZ is a country code, DD is two numeric digits, and BBAN is a Basic Bank Account Number comprising two to seven groups of three to five uppercase alphanumeric characters, optionally separated by space or dash, and optionally followed by a final group of length one to three. Identifier is case sensitive.
Discovered.Entity.IBAN Code
Detects strings consistent with codes from the International Statistical Classification of Diseases and Related Health Problems (ICD), as drawn from the Clinical Modification lexicon from the year 2025.
Discovered.Entity.ICD10 Code
ICD_10_PCS New
Discovered.Entity.ICD10 Procedure Code
Discovered.Entity.IMEI
IP_ADDRESS
Detects IP Addresses in the V4 and V6 formats.
Discovered.Entity.IP Address
Discovered.Entity.MAC Address
NAICS_CODE New
Discovered.Entity.NAICS Code
Detects strings consistent with a dictionary of people's names. The name dictionary is US-centric with person names drawn from the US Social Security database, covering 80% of the U.S. population. This identifier must match at least 45% of the data sampled.
Discovered.Entity.Person Name
Detects strings consistent with telephone numbers. Primarily looks for strings consistent with the United States telephone numbers naming convention. Optional area codes allowed.
Discovered.Entity.Telephone Number
Detects strings consistent with a valid US Zip code with an optional +4 separated by a dash. Only valid five-digit zip codes are detected.
Discovered.Entity.Postal Code
Discovered.Country.Spain
Discovered.Entity.NIF Number
SPAIN_PASSPORT
Detects string consistent with Spain's Passport Number. Requires a eight- or nine-character string starting with either two or three upper-case letters followed by six numeric digits.
Discovered.Country.Spain
Discovered.Entity.Passport
SWIFT_CODE
Detects alphanumeric strings consistent with a SWIFT code (or Bank Identifier Code (BIC)) format. Requires values consistent with AAAAAACCDDD, where A is an uppercase letter, C is an uppercase letter or numeric digit, and DDD is an optional three-character sequence of uppercase letters or numeric digits.
Discovered.Entity.Swift Code
Detects strings consistent with times in various formats or data type: time. If date is included in the time, it will not match. Use the DATE
identifier instead.
Discovered.Entity.Date
Detects alphanumeric strings consistent with the United Kingdom's National Insurance Number. Requires a nine-character string. The first two digits must be uppercase letters, followed by an optional space, then six digits with optional spaces or hyphens (-
) every two digits, ending with A, B, C, or D.
Discovered.Country.UK
Discovered.Entity.National Insurance Number
Detects string consistent with a URL. String must begin with a common schema, followed a string and ending with a top level domain of no more than 128 alphanumeric characters.
Discovered.Entity.URL
US_DEA_NUMBER
Detects alphanumeric strings consistent a Drug Enforcement Administration (DEA) number is assigned to a health care provider. It must have a length of nine characters. The first two digits must be uppercase alphanumeric characters, and the last seven characters are numeric digits. The first character may not be I
, N
, O
, Q
, V
, W
, Y
, or Z
.
Discovered.Country.US
Discovered.Entity.DEA Number
US_EMPLOYER_IDENTIFICATION_NUMBER
Detects numeric string consistent United States Employer Identification Number (EIN). Strings must contain nine digits with a hyphen after the second digit.
Discovered.Country.US
Discovered.Entity.Employer ID Number
Detects 10-digit numeric strings consistent with US National Provider Identifier (NPI). It must either start with 80840 followed by a 1 or 2, or it must begin with a 1 or 2.
Discovered.Country.US
Discovered.Entity.Healthcare NPI
US_PERSON_FULL_NAME New
Detects strings consistent with a person's {first name} space {last name}. Uses the same names from the PERSON_NAME identifier. This identifier must match at least 20% of the data sampled.
Discovered.Entity.Person Name
US_PREPARER_TAXPAYER_IDENTIFICATION_NUMBER
Detects strings consistent with a Preparer Taxpayer ID number. Strings must have nine characters, starting with a P
that is followed by eight digits.
Discovered.Country.US
Discovered.Entity.Preparer Taxpayer ID Number
Discovered.Country.US
Discovered.Entity.Social Security Number
Detects strings consistent with either a full name or two-letter abbreviation of a US state or territory.
Discovered.Country.US
Discovered.Entity.State
Detects strings consistent with U.S. street addresses. Requires the street naming convention of {address_number} {street_name} {unit number (optional)} with an optional road suffix after the street name. The maximum length for street name is 20 alphanumeric characters. This identifier must match at least 80% of the data sampled.
VEHICLE_IDENTIFICATION_NUMBER
Detects strings consistent with Vehicle Identification Numbers. A valid World Manufacturer Identifier is required.
Discovered.Country.US
Discovered.Entity.Vehicle Identifier or Serial Number
Of sensitive data discovery's three criteria options, regex and dictionary are competitive. This means that when assessing your data, if multiple identifiers could match, only one with competitive criteria will be chosen to tag the data. To better understand how Immuta executes this competition, read further.
Discover employs a three-phased competitive criteria analysis approach for sensitive data discovery (SDD):
Sampling: No data is moved, and Immuta checks the identifiers against a sample of data from the table.
Qualifying: Identifiers with a criteria match of less than a 90% match are filtered out.
Scoring: The remaining identifiers are compared with one another to find the most specific criteria that qualifies and matches the sample.
In the end, competitive criteria analysis aims to find a single identifier for each column that best describes the data format.
In the sampling process, no database contents are transmitted to Immuta; instead, Immuta receives only the column-wise hit rate (the number of times the criteria has matched a value in the column) information for each active identifier. To do this, Discover instructs a remote database to measure column-wise hit rate information for all active identifiers over a row sample.
The sample size is decided based on the number of identifiers and the data size, when available. In the most simplified case, the requested number of sampled rows depends only on the number of regex and dictionary criteria being run in the framework, not the data size. The sample size dependence on the number of identifiers is weak and will not exceed 13,000 rows.
5
7369 rows
50
9211 rows
500
11053 rows
5000
12895 rows
In practice, the number of sampled values for each column may be less than the requested number of rows. This happens when the target table has less than the requested number of rows, when many of the column values are null
, or because of technology-specific limitations.
Snowflake and Starburst (Trino): Discover implements native table sampling by row count.
Databricks and Redshift: Due to technology limitations and the inability to predict the size of the table, Discover implements a best-effort sampling strategy comprising a flat 10% row sample capped at the first 10,000 sampled rows. In particular, under-sampling may occur on tables with less than 100,000 rows. Moreover, the resulting sample is biased towards earlier records.
All platforms: Sampling from views can have significantly slower performance that varies by the performance of the query that defines the view.
During the qualification phase, identifiers that do not agree with the data are disqualified. An identifier agrees with the data if the hit rate on the remote sample exceeds the predefined threshold. This threshold is 90% match for most built-in identifiers; however, a few built-in identifiers have lower threshold . The 90% threshold is standard for all custom identifiers as well to ensure the criteria matches the data within the column and avoid false positives. If no identifiers qualify, then no identifier is assessed for scoring and the column is not tagged.
During the scoring phase, a machine inference is carried out among all qualified identifiers, combining criteria-derived complexity information with hit rate information to determine which identifier best describes the sample data. This process prefers the more restrictive of two competing identifiers since the ability to satisfy the more difficult-to-satisfy identifier itself serves as evidence that it is more likely. This phase ends by returning a single most likely identifier per the inference process.
Here are a set of regex identifiers and a sample of data:
Identifiers:
[a-zA-Z0-9]{3}
- This regex will match 3 character strings with the characters a-z, lowercase or uppercase, or digits 0-9.
[a-c]{3}
- This regex will match 3 character strings with the characters a-c, lowercase.
(a|b|d){3}
- This regex will match 3 character strings with the characters a, b, or d, lowercase.
dad
Yes
Yes
baa
Yes
Yes
add
Yes
Yes
add
Yes
Yes
cab
Yes
Yes
bad
Yes
Yes
aba
Yes
Yes
baa
Yes
Yes
dad
Yes
Yes
baa
Yes
Yes
When qualifying the identifiers, Identifier 1 and Identifier 3 both match 90% or more of the data. Identifier 2 does not, and is disqualified.
Then the qualified identifiers are scored. Here, Identifier 1, despite matching 100% of the data, is unspecific and could match over 200,000 values. On the other hand, Identifier 3 matches just at 90% but is very specific with only 27 available values.
Therefore, with the specificity taken into account, Identifier 3 would be the match for this column, and its tags would be applied to the data source in Immuta.
Dictionaries are part of the competitive process, while column-name regex are not.
Scoring ties are rare but can occur if the same criteria (either dictionary or regex) is specified more than once (even in different forms). Scoring ties are inconclusive, and the scoring phase will not return an identifier in the case of a tie.
Criteria complexity analysis is sensitive to the total number of strings an identifier accepts or, equivalently for dictionaries, the number of entries. Therefore, identifiers that accept much more than is necessary to describe the intended column data format may perform more poorly in the competitive analysis because they are easier to satisfy.
AUSTRALIA_MEDICARE_NUMBER
Detects numeric strings consistent with Australian Medicare Number. Requires a ten- or eleven-digit number. The starting digit must be between 2 and 6, inclusive. Spaces must be placed between the fourth and fifth and ninth and tenth digits. Optional eleventh digit separated by a /
or a space.
AUSTRALIA_PASSPORT
Detects strings consistent with the Australian Passport number. A string of 8 or 9 characters is required, with a starting upper-case character (A, B, C, D, E, F, G, H, J, L, M, N, R, X, or U) or a two-character alphabetic prefix (P followed by A, B, C, D, E, F, U, W, X, or Z) followed by seven numeric digits.
Detects numeric strings consistent with Belgium's National Registration Number. Requires 11 characters in the form YY.MM.DD-NNN-XX, where YY.MM.DD corresponds to birthdate, NNN is a number, and XX is a checksum digit.
BRAZIL_CPF_NUMBER
Detects a numeric string consistent with Brazil's CPF (Cadastro de Pessoas F\u00edsica) number. An eleven-digit numeric string with optional non-numeric separators (.
, -
, or space) after the third, sixth, and ninth digits.
CANADA_PASSPORT
Detects strings consistent with the Canadian Passport Number format. Allows for two formats. One format requires two capital letters followed by six digits. The other format requires one letter, followed by six digits, and ends in two letters.
COUNTRY
CREDIT_CARD_NUMBER
DATE
Detects strings consistent with dates in or data type: date, date+time, or timestamp.
DOMAIN_NAME
Detects strings consistent with the US Census . This case-insensitive identifier allows for dashes to be used in place of spaces.
FDA_CODE
FRANCE_NIR
GENDER
Detects strings consistent with and common abbreviations.
GREAT_BRITAIN_DRIVERS_LICENSE
Detects alphanumeric strings consistent with the United Kingdom's driver's license number. Requires either a 16- or 18-character string. The first five characters represent the driver's surname, padded with 9
s, followed by a single digit for decade of birth, two digits for month of birth (incremented by 50 for female drivers), two digits for day of birth, one digit for year of birth, two letters, an arbitrary digit, and two digits. Two additional digits can be present for each license issuance.
ICD10_CODE
Detects strings consistent with procedure codes from the International Statistical Classification of Diseases and Related Health Problems (ICD), as drawn from the Clinical Modification lexicon from 2020.
IMEI_HARDWARE_ID
Detects strings consistent with an International Mobile Equipment Identity (IMEI) number. Must contain 15 or 16 digits with optional hyphens or spaces after the 2nd, 8th, and 14th digits.
MAC_ADDRESS
Detects strings consistent with a Media Access Control (MAC) address. Must contain twelve hexadecimal digits, with every two digits separated by a colon or hyphen.
Detects strings consistent with North American Industry Classification System (NAICS). A two-digit number represents a basic sector and each preceding digit represents a more specific sub sector with a maximum of six digits.
PERSON_NAME
PHONE_NUMBER
POSTAL_CODE
SPAIN_NIF_NUMBER
Detects strings consistent with Spain's Tax Identification number. Requires a string with nine alphanumeric characters. Requires either eight digits followed by an optional hyphen or space and a single upper case letter or the initial character must be X, Y, or Z, followed by an optional dash or space, seven numeric digits, followed by an optional dash or space, and finally, by a single uppercase letter.
TIME
UK_NATIONAL_INSURANCE_NUMBER
URL
US_HEALTHCARE_NPI
US_SOCIAL_SECURITY_NUMBER
Detects strings consistent with a US Social Security Number. Strings must contain nine digits and comprise three parts: the three left-most digits designating the area number, the middle two digits designating the group number, and the four right-most digits designating the serial number. For a column to be tagged, none of these parts can contain all zeroes, and area numbers must not be 666 or in the range of 900-999.
US_STATE
US_STREET_ADDRESS