All pages
Powered by GitBook
1 of 4

Loading...

Loading...

Loading...

Loading...

Reference Guides

Built-in Discovered Tags Reference

Immuta is pre-configured with a set of tags that can be used to write global policies before data sources even exist. See a list of the built-in Discovered tags below and the Built-in pattern reference page for information about where these tags will be applied by the built-in rules

Country tags

All the tags below belong to the Country parent. For example, the full tag name will appear as Discovered . Country . Argentina.

Child tag name
Description

Argentina

This tag is applied to data recognized as specific to Argentina (e.g., an Argentina National Identity Number).

Australia

This tag is applied to data recognized as specific to Australia (e.g., an Australian Medicare number or Australian passport number).

Belgium

This tag is applied to data recognized as specific to Belgium (e.g., a Belgium National ID card).

Brazil

This tag is applied to data recognized as specific to Brazil (e.g., a Brazil CPF number).

Canada

This tag is applied to data recognized as specific to Canada (e.g., a British Columbia PHN, OHIP string, Canadian passport number, or Quebec's HIN).

Chile

This tag is for data specific to Chile.

China

This tag is for data specific to China.

Colombia

This tag is for data specific to Colombia.

Denmark

This tag is applied to data recognized as specific to Denmark (e.g., a Denmark CPR or Person-number).

Finland

This tag is applied to data recognized as specific to Finland (e.g., a Finland National ID number).

France

This tag is applied to data recognized as specific to France (e.g., a French National ID card number, France National ID number, or French passport number).

Germany

This tag is applied to data recognized as specific to Germany (e.g., a German driver's license number or a Germany Identity Card number).

Hong Kong

This tag is for data specific to Hong Kong.

India

This tag is for data specific to India.

Indonesia

This tag is for data specific to Indonesia.

Japan

This tag is for data specific to Japan.

Korea

This tag is for data specific to Korea.

Mexico

This tag is for data specific to Mexico.

Netherlands

This tag is for data specific to Netherlands.

Norway

This tag is for data specific to Norway.

Paraguay

This tag is for data specific to Paraguay.

Peru

This tag is for data specific to Peru.

Poland

This tag is for data specific to Poland.

Singapore

This tag is for data specific to Singapore.

Spain

This tag is applied to data recognized as specific to Spain (e.g., Spain Foreigner Identification number, Spain Tax Identification number, or Spanish passport number).

Sweden

This tag is applied to data recognized as specific to Sweden (e.g., a Sweden National ID number or Swedish passport number).

Taiwan

This tag is for data specific to Taiwan.

Thailand

This tag is applied to data recognized as specific to Thailand (e.g., a Thailand National ID number).

Turkey

This tag is for data specific to Turkey.

UK

This tag is applied to data recognized as specific to the United Kingdom (e.g., a United Kingdom driver's license number, United Kingdom National Insurance number, or United Kingdom Taxpayer Reference number).

Uruguay

This tag is for data specific to Uruguay.

US

This tag is applied to data recognized as specific to the U.S. (e.g., an FDA code, United States ATIN, ABA routing number, DEA number, United States EIN, United States NPI number, United States ITIN, United States passport number, United States Preparer Taxpayer ID number, United States SSN, United States territory or state, or United States toll-free phone number).

Venezuela

This tag is for data specific to Venezuela.

Entity tags

All the tags below belong to the Entity parent. For example, the full tag name will appear as Discovered . Entity . Aadhaar Individual.

Child tag name
Description

Aadhaar Individual

This tag is for Aadhaar Individual numbers.

Adoption Taxpayer ID Number

This tag is applied to data recognized as a United States Adoption Taxpayer Identification number.

Age

This tag is applied to data recognized as an age.

Bank Account

This tag is for bank account numbers.

Bank Routing MICR

This tag is applied to data recognized as an American Bankers Association routing number.

Bankers CUSIP ID

This tag is for CUSP identification numbers for stocks and bonds.

British Columbia Health Network Number

This tag is applied to data recognized as British Columbia's Personal Health Number.

BSN Number

This tag is for Netherlands citizen service number.

BSN Number

This tag is for Netherlands citizen service numbers.

CDC Number

This tag is for CDC numbers.

CDI Number

This tag is for CDI numbers.

CIC Number

This tag is for CIC numbers.

CNI

This tag is applied to data recognized as a French National ID card number.

CPF Number

This tag is applied to data recognized as Brazil's CPF number.

CPR Number

This tag is applied to data recognized as Denmark's Personal Identification number.

Credit Card Number

This tag is applied to data recognized as a credit card number.

CURP Number

This tag is for Mexican CURP numbers.

CRYPTO

This tag is applied to data recognized as a Bitcoin Invoice Address.

Date

This tag is applied to data recognized as a date.

Date of Birth

This tag is applied to data recognized as a date of birth.

DEA Number

This tag is applied to data recognized as the DEA number of a healthcare provider.

DNI Number

This tag is applied to data recognized as an Argentina National Identity number.

Domain Name

This tag is applied to data recognized as a domain.

Driver's License Number

This tag is applied to data recognized as driver's licenses numbers from Germany or the United Kingdom.

Electronic Mail Address

This tag is applied to data recognized as an email address.

Employer ID Number

This tag is applied to data recognized as an Employer Identification number from the United States.

Ethnic Group

This tag is applied to data recognized as an ethnic group.

FDA Code

This tag is applied to data recognized as the code of a drug or ingredient registered with the FDA.

Gender

This tag is applied to data recognized as a gender.

GST Individual

This tag is for Indian GST individual numbers.

Healthcare NPI

This tag is applied to data recognized as a United States National Provider Identifier number.

IBAN Code

This tag is applied to data recognized as an International Bank Account number.

ICD10 Code

This tag is applied to data recognized as an ICD10 code from the International Statistical Classification of Diseases and Related Health Problems.

ICD9 Code

This tag is for ICD9 codes from the International Statistical Classification of Diseases and Related Health Problems.

ID Number

This tag is for any ID number.

Identity Card Number

This tag is applied to data recognized as an identity card number from Germany.

IMEI

This tag is applied to data recognized as an International Mobile Equipment Identity number.

Individual Number

This tag is for any individual number.

Individual Taxpayer ID Number

This tag is applied to data recognized as a United States Individual Taxpayer Identification Number.

IP Address

This tag is applied to data recognized as an IP address.

Location

This tag is applied to data recognized as a country, state, address, or municipality.

MAC Address

This tag is applied to data recognized as a Media Access Control address.

MAC Address Local

This tag is applied to data recognized as a local Media Access Control address.

Medicare Number

This tag is applied to data recognized as a Medicare number from Australia.

National Health Service Number

This tag is for national health service numbers.

National ID Card Number

This tag is applied to data recognized as a national ID card number from Belgium.

National ID Number

This tag is applied to data recognized as a national ID number from Finland, Sweden, and Thailand.

National Insurance Number

This tag is applied to data recognized as a United Kingdom national insurance number.

National Registration ID Number

This tag is for national registration ID numbers.

NI Number

This tag is for Norway NI numbers.

NIE Number

This tag is applied to data recognized as a Spanish Foreigner Identification number.

NIF Number

This tag is applied to data recognized as a Spanish Tax Identification number.

NIK Number

This tag is applied to data recognized as an Indonesian personal identification number (NIK).

NIR

This tag is applied to data recognized as France's National ID number.

Ontario Health Insurance Number

This tag is applied to data recognized as part of an Ontario Health Insurance Plan string.

PAN Individual

This tag is for PAN Individual numbers.

Passport

This tag is applied to data recognized as a passport number from Australia, Canada, France, Spain, Sweden, and the United States.

Person Name

This tag is applied to data recognized as people's names.

PESEL Number

This tag is for Poland PESEL numbers.

Postal Code

This tag is applied to data recognized as a United States zip code.

Preparer Taxpayer ID Number

This tag is applied to data recognized as a Preparer Taxpayer ID number.

Quebec Health Insurance Number

This tag is applied to data recognized as a Quebec Health Insurance Number.

Resident ID Number

This tag is for China Resident ID numbers.

RRN

This tag is for Korea Resident Registration numbers.

Social Insurance Number

This tag is applied to data recognized as a social insurance number.

Social Security Number

This tag is applied to data recognized as a United States Social Security Number.

State

This tag is applied to data recognized as a state of the United States.

Swift Code

This tag is applied to data recognized as a SWIFT code.

Tax File Number

This tag is applied to data recognized as a tax file number.

Taxpayer ID Number

This tag is applied to data recognized as Taxpayer ID numbers from the United States.

Taxpayer Reference

This tag is applied to data recognized as United Kingdom Taxpayer Reference numbers.

Telephone Number

This tag is applied to data recognized as a phone number.

Tollfree Telephone Number

This tag is applied to data recognized as a United States toll-free phone number.

URL

This tag is applied to data recognized as a URL.

Vehicle Identifier or Serial Number

This tag is applied to data recognized as a VIN.

Identifier tags

None of the tags below have an additional parent or child tag. For example, the full tag name will appear as Discovered . Identifier Direct.

Tag name
Description

Identifier Direct

This tag is applied to data recognized as a direct identifier that can be uniquely associated with an individual. Examples of direct identifiers include: name, username, email, official individual identification numbers such as passport or identity card numbers, or privately issued individual identification numbers such as a student ID.

Identifier Indirect

This tag is applied to data recognized as an indirect identifier that is not uniquely associated with an individual. However this indirect identifier could become distinguishable when combined with other attributes. Examples of indirect identifiers include: age and affinity.

Identifier Undetermined

This tag is applied to data which could be an identifier associated with an individual.

Personal information tags

None of the tags below have an additional parent or child tag. For example, the full tag name will appear as Discovered . PCI.

Tag name
Description

PCI

This tag is applied to data recognized as payment card information.

PHI

This tag is applied to data recognized as personal health data.

PII

This tag is applied to data recognized as personally identifiable information.

How Competitive Pattern Analysis Works

Of sensitive data discovery's three pattern options, regex and dictionary are competitive. This means that when assessing your data, if multiple patterns could match, only one of the competitive patterns will be chosen and tag the data. To better understand how Immuta executes this competition, read further.

Discover employs a three-phased competitive pattern analysis approach for sensitive data discovery (SDD):

  1. Sampling: No data is moved, and Immuta checks the patterns against a sample of data from the table.

  2. Qualifying: Patterns that have less than a 90% match are filtered out.

  3. Scoring: The remaining patterns are compared with one another to find the most specific pattern that qualifies and matches the sample.

In the end, competitive pattern analysis aims to find a single pattern for each column that best describes the data format.

Sampling

In the sampling process, no database contents are transmitted to Immuta; instead, Immuta receives only the column-wise hit rate (the number of times the pattern has matched a value in the column) information for each active pattern. To do this, Discover instructs a remote database to measure column-wise hit rate information for all active patterns over a row sample.

The sample size is decided based on the number of patterns and the data size, when available. In the most simplified case, the requested number of sampled rows depends only on the number of regex and dictionary patterns being run in the framework, not the data size. The sample size dependence on the number of patterns is weak and will not exceed 13,000 rows.

Number of patterns
Sample size

5

7369 rows

50

9211 rows

500

11053 rows

5000

12895 rows

Sampling considerations

In practice, the number of sampled values for each column may be less than the requested number of rows. This happens when the target table has less than the requested number of rows, when many of the column values are null, or because of technology-specific limitations.

  • Snowflake and Starburst (Trino): Discover implements native table sampling by row count.

  • Databricks and Redshift: Due to technology limitations and the inability to predict the size of the table, Discover implements a best-effort sampling strategy comprising a flat 10% row sample capped at the first 10,000 sampled rows. In particular, under-sampling may occur on tables with less than 100,000 rows. Moreover, the resulting sample is biased towards earlier records.

  • All platforms: Sampling from views can have significantly slower performance that varies by the performance of the query that defines the view.

Qualifying

During the qualification phase, patterns that do not agree with the data are disqualified. A pattern agrees with the data if the hit rate on the remote sample exceeds the predefined threshold. This threshold is 90% match for most built-in patterns; however, two built-in patterns have lower threshold requirements. The 90% threshold is standard for all custom patterns as well to ensure the pattern matches the data within the column and avoid false positives. If no patterns qualify, then no pattern is assessed for scoring and the column is not tagged.

Scoring

During the scoring phase, a machine inference is carried out among all qualified patterns, combining pattern-derived complexity information with hit rate information to determine which pattern best describes the sample data. This process prefers the more restrictive of two competing patterns since the ability to satisfy the more difficult-to-satisfy pattern itself serves as evidence that it is more likely. This phase ends by returning a single most likely pattern per the inference process.

Example

Here are a set of regex patterns and a sample of data:

Patterns:

  1. [a-zA-Z0-9]{3} - This pattern will match 3 character strings with the characters a-z, lowercase or uppercase, or digits 0-9.

  2. [a-c]{3} - This pattern will match 3 character strings with the characters a-c, lowercase.

  3. (a|b|d){3} - This pattern will match 3 character strings with the characters a, b, or d, lowercase.

Sample data
Matches Pattern 1
Matches Pattern 2
Matches Pattern 3

dad

Yes

❌

Yes

baa

Yes

❌

Yes

add

Yes

❌

Yes

add

Yes

❌

Yes

cab

Yes

Yes

❌

bad

Yes

❌

Yes

aba

Yes

❌

Yes

baa

Yes

❌

Yes

dad

Yes

❌

Yes

baa

Yes

❌

Yes

When qualifying the patterns, Pattern 1 and Pattern 3 both match 90% or more of the data. Pattern 2 does not, and is disqualified.

Then the qualified patterns are scored. Here, Pattern 1, despite matching 100% of the data, is unspecific and could match over 200,000 values. On the other hand, Pattern 3 matches just at 90% but is very specific with only 27 available values.

Therefore, with the specificity taken into account, Pattern 3 would be the match for this column, and its tags would be applied to the data source in Immuta.

Important notes

  • Dictionaries are considered patterns by Immuta and are part of the competitive process, while column-name regex patterns are not.

  • Scoring ties are rare but can occur if the same pattern is specified more than once (even in different forms). Scoring ties are inconclusive, and the scoring phase will not return a pattern in the case of a tie.

  • Pattern complexity analysis is sensitive to the total number of strings a pattern accepts or, equivalently for dictionaries, the number of entries. Therefore, patterns that accept much more than is necessary to describe the intended column data format may perform more poorly in the competitive analysis because they are easier to satisfy.

Built-in Pattern Reference

In previous documentation, rule and pattern are referred to as classifier or identifier. The language is being updated to rule to be more accurate and not conflate meaning with .

Immuta comes with a set of built-in patterns that look for common data types. These patterns were written by Immuta's research and development team and cannot be deleted or edited by users. However, users can build their own rules using these built-in patterns, which will customize the resulting tags based on the organization's needs.

When using SDD with , it is recommended to use the default resulting tags listed in the table below for these built-in patterns. This ensures that the framework rules apply sensitivity tags as intended.

Pattern descriptions and default resulting tags

Pattern
Description
Resulting tags from the default rules

AGE

Matches numeric strings between 10 and 199.

  • Discovered.PII

  • Discovered.Identifier Indirect

  • Discovered.PHI

  • Discovered.Entity.Age

ARGENTINA_DNI_NUMBER

Matches strings consistent with Argentina National Identity (DNI) Number. Requires an eight-digit number with optional periods between the second and third and fifth and sixth digit.

  • Discovered.PII

  • Discovered.Identifier Direct

  • Discovered.Country.Argentina

  • Discovered.PHI

  • Discovered.Entity.DNI Number

AUSTRALIA_MEDICARE_NUMBER

Matches numeric strings consistent with Australian Medicare number. Requires a ten- or eleven-digit number. The starting digit must be between 2 and 6, inclusive. Optional spaces can be placed between the fourth and fifth and ninth and tenth digit. The optional 11th digit separated by a / can be present. A checksum is required.

  • Discovered.PII

  • Discovered.Identifier Direct

  • Discovered.Country.Australia

  • Discovered.PHI

  • Discovered.Entity.Medicare Number

AUSTRALIA_PASSPORT

Matches strings consistent with Australian Passport number. An 8- or 9-character string is required, with a starting upper case character (N, E, D, F, A, C, U, X) or a two-character starting character (P followed by A, B, C, D, E, F, U, W, X, or Z) followed by seven digits.

  • Discovered.PII

  • Discovered.Identifier Direct

  • Discovered.Country.Australia

  • Discovered.PHI

  • Discovered.Entity.Passport

BELGIUM_NATIONAL_ID_CARD_NUMBER

Matches numeric strings consistent with Belgium's National ID card. Requires a twelve-digit number with hyphen (-) between the third and fourth digit and tenth and eleventh digits. A two checksum is required.

  • Discovered.PII

  • Discovered.Identifier Direct

  • Discovered.Country.Belgium

  • Discovered.PHI

  • Discovered.Entity.National ID Card Number

BITCOIN_INVOICE_ADDRESS

Matches strings consistent with the following Bitcoin Invoice Address formats: P2PKH, P2SH, and Bech32. P2PKH and P2SH must start with a 1 or a 3, respectively, followed by 25 - 34 alphanumeric characters, excluding l, I, O, and 0. Bech32 formats must begin with bc1 and be followed by 39 characters. To be identified, any addresses must have a valid checksum.

  • Discovered.Entity.CRYPTO

  • Discovered.PCI

BRAZIL_CPF_NUMBER

Matches a numeric string consistent with Brazil's CPF (Cadastro Pessoal de Pessoa Física) number. An eleven-digit numeric string with non-numeric separators after the third, sixth, and ninth digits. A two digit checksum is required.

  • Discovered.PII

  • Discovered.Identifier Direct

  • Discovered.Country.Brazil

  • Discovered.PHI

  • Discovered.Entity.CPF Number

CANADA_BC_PHN

Matches numeric strings consistent with British Columbia's Personal Health Number (PHN). Requires a ten-digit numeric string with optional hyphen (-) or spaces after the fourth and seventh digits.

  • Discovered.PII

  • Discovered.Identifier Direct

  • Discovered.Country.Canada

  • Discovered.PHI

  • Discovered.Entity.British Columbia Health Network Number

CANADA_OHIP

Matches alphanumeric strings consistent with Ontario's Health Insurance Plan (OHIP). Requires a twelve-digit alphanumeric code. Optional hyphens (-) or spaces can appear after the fourth, seventh, and tenth digits. The final two characters are a checksum.

  • Discovered.PII

  • Discovered.Identifier Direct

  • Discovered.Country.Canada

  • Discovered.PHI

  • Discovered.Entity.Ontario Health Insurance Number

CANADA_PASSPORT

Matches strings consistent with the Canadian Passport Number format as described here.

  • Discovered.PII

  • Discovered.Identifier Direct

  • Discovered.Country.Canada

  • Discovered.PHI

  • Discovered.Entity.Passport

CANADA_QUEBEC_HIN

Matches alphanumeric strings consistent with Quebec's Health Insurance Number (HIN). Requires four alphabetic characters followed by an optional space or hyphen (-), and then eight digits with an optional hyphen or space after the fourth digit.

  • Discovered.PII

  • Discovered.Identifier Direct

  • Discovered.Country.Canada

  • Discovered.PHI

  • Discovered.Entity.Quebec Health Insurance Number

CREDIT_CARD_NUMBER

Matches strings consistent with a credit card number with prefixes matching major credit card companies. Must include a valid checksum.

  • Discovered.PCI

  • Discovered.Entity.Credit Card Number

DATE

Matches strings consistent with dates. These can include days of the week, dates, and date times.

  • Discovered.Entity.Date

DENMARK_CPR_NUMBER

Matches numeric strings consistent with Personal Identification Number (CPR-number or Person-number). Requires a ten-digit number with either a DDMMYY-SSSS or DDMMYYSSSS format. The first six digits are an individual's birth date in Day, Month, Year format. The final four digits comprise the sequence number.

  • Discovered.PII

  • Discovered.Identifier Direct

  • Discovered.Country.Denmark

  • Discovered.PHI

  • Discovered.Entity.CPR Number

DOMAIN_NAME

Matches domain names using a very broad pattern.

  • Discovered.Entity.Domain Name

EMAIL_ADDRESS

Detect strings consistent with an email address. Usernames are required to be fewer than 255 characters, follow by @a, a domain of fewer than 255 characters, and a top level domain of between 2 and 20 characters.

  • Discovered.PHI

  • Discovered.Entity.Electronic Mail Address

  • Discovered.Identifier Direct

ETHNIC_GROUP

Matches strings consistent with the US Census race designations.

  • Discovered.PII

  • Discovered.Entity.Ethnic Group

FDA_CODE

Matches a string consistent with a drug or ingredient registered with Food and Drug Administration (FDA). Must start with between 4 to 6 digits, followed by a hyphen, followed by 3 to 4 digits, followed by a hyphen, and finishing with one to two digits.

  • Discovered.Country.US

  • Discovered.Entity.FDA Code

FINLAND_NATIONAL_ID_NUMBER

Matches a string consistent with Finland's National ID number. Requires an eleven-character string in a DDMMYYCZZZQ format. The first six digits are an individual's birth date in Day, Month, Year format. The C character is a century of birth indicator (+ for the years 1800-1899, - for years 1900-1999, and A for years 2000-2099). ZZZ is an individual ID number, and Q is a required checksum.

  • Discovered.PII

  • Discovered.Identifier Direct

  • Discovered.Country.Finland

  • Discovered.PHI

  • Discovered.Entity.National ID Number

FRANCE_CNI

Matches numeric strings consistent with the French National ID card number (carte nationale d'identité). Requires a twelve-digit numeric string.

  • Discovered.PII

  • Discovered.Identifier Direct

  • Discovered.Country.France

  • Discovered.PHI

  • Discovered.Entity.CNI

FRANCE_NIR

Matches numeric strings consistent with France's National ID number (Numéro d'Inscription au Répertoire). Requires a fifteen-digit numeric string. An optional hyphen (-) or space can appear after the 13th digit. The 14th and 15th digits act as a checksum.

  • Discovered.PII

  • Discovered.Identifier Direct

  • Discovered.Country.France

  • Discovered.PHI

  • Discovered.Entity.NIR

FRANCE_PASSPORT

Matches alphanumeric strings consistent with the French Passport number. Requires two numbers followed by two upper case letters and ends with 5 digits.

  • Discovered.PII

  • Discovered.Identifier Direct

  • Discovered.Country.France

  • Discovered.PHI

  • Discovered.Entity.Passport

GENDER

Matches strings consistent with gender or gender abbreviations.

  • Discovered.PII

  • Discovered.Identifier Indirect

  • Discovered.PHI

  • Discovered.Entity.Gender

GERMANY_DRIVERS_LICENSE_NUMBER

Matches alphanumeric strings consistent with Germany's Driver's License number. Requires an eleven-element string, with a digit or a letter followed by two digits, 6 digits or letters, one digit, and one digit or letter.

  • Discovered.PII

  • Discovered.Identifier Direct

  • Discovered.Country.Germany

  • Discovered.PHI

  • Discovered.Entity.Drivers License Number

GERMANY_IDENTITY_CARD_NUMBER

Matches alphanumeric strings consistent with Germany's Identity Card number. Requires a single letter followed by eight digits.

  • Discovered.PII

  • Discovered.Identifier Direct

  • Discovered.Country.Germany

  • Discovered.PHI

  • Discovered.Entity.Identity Card Number

IBAN_CODE

Matches strings consistent with an International Bank Account Number (IBAN). Must contain a valid country code.

  • Discovered.Entity.IBAN Code

ICD10_CODE

Matches strings consistent with codes from the International Statistical Classification of Diseases and Related Health Problems (ICD), as drawn from the Clinical Modification lexicon from the year 2020.

  • Discovered.Entity.ICD10 Code

IMEI_HARDWARE_ID

Matches strings consistent with an International Mobile Equipment Identity (IMEI) number. Must contain 15 digits with optional hyphens or spaces after the second, 8th, and 14th digits.

  • Discovered.Entity.IMEI

IP_ADDRESS

Matches IP Addresses in the V4 and V6 formats.

  • Discovered.Entity.IP Address

LOCATION

Matches strings consistent with Countries, States, Addresses, or Municipalities. By default focuses on locations in the United States.

  • Discovered.Entity.Location

MAC_ADDRESS

Matches strings consistent with a Media Access Control (MAC) address. Must contain twelve hexadecimal digits, with every two digits separated by a colon.

  • Discovered.Entity.MAC Address

MAC_ADDRESS_LOCAL

Matches strings consistent with a local Media Access Control (MAC) address.

  • Discovered.Entity.MAC Address Local

PERSON_NAME

Matches strings consistent with a dictionary of people's names. Names are drawn from the US Social Security database.

  • Discovered.PII

  • Discovered.PHI

  • Discovered.Entity.Person Name

  • Discovered.Identifier Indirect

PHONE_NUMBER

Matches strings consistent with telephone numbers. Primarily looks for strings consistent with the United States telephone numbers naming convention.

  • Discovered.Entity.Telephone Number

POSTAL_CODE

Matches strings consistent with a valid US zip code with an optional +4. Only valid 5 digit zip codes are detected.

  • Discovered.Entity.Postal Code

SPAIN_NIE_NUMBER

Matches strings consistent with Spain's Foreigner Identification number. Requires an eight-character string. The initial character must be X, Y, or Z, followed by seven digits, then by an optional hyphen or space and a single checksum character.

  • Discovered.PII

  • Discovered.Identifier Direct

  • Discovered.Country.Spain

  • Discovered.PHI

  • Discovered.Entity.NIE Number

SPAIN_NIF_NUMBER

Matches strings consistent with Spain's Tax Identification number. Requires an eight-character string. Requires eight digits followed by an optional hyphen or space and a single checksum character.

  • Discovered.PII

  • Discovered.Identifier Direct

  • Discovered.Country.Spain

  • Discovered.PHI

  • Discovered.Entity.NIF Number

SPAIN_PASSPORT

Matches strings consistent with Spain's Passport number. Requires an eight- or nine-character string, starting with either two or three letters followed by six digits.

  • Discovered.PII

  • Discovered.Identifier Direct

  • Discovered.Country.Spain

  • Discovered.PHI

  • Discovered.Entity.Passport

STREET_ADDRESS

Matches strings consistent with street addresses. Primarily looks for strings consistent with the United States street naming convention.

  • Discovered.Entity.Location

SWEDEN_NATIONAL_ID_NUMBER

Matches numeric strings consistent with Sweden's Nation ID number. Requires a ten- or twelve-digit string that must start with a date in either the YYMMDD or YYYYMMDD formats. An optional - or + character then separates four ending digits. The final digit is a checksum.

  • Discovered.PII

  • Discovered.Identifier Direct

  • Discovered.Country.Sweden

  • Discovered.PHI

  • Discovered.Entity.National ID Number

SWEDEN_PASSPORT

Matches numeric strings consistent with Sweden's Passport number. Requires an 8-digit number.

  • Discovered.PII

  • Discovered.Identifier Direct

  • Discovered.Country.Sweden

  • Discovered.PHI

  • Discovered.Entity.Passport

SWIFT_CODE

Matches alphanumeric strings consistent with a SWIFT code (or Bank Identifier Code (BIC)) format.

  • Discovered.Entity.Swift Code

THAILAND_NATIONAL_ID_NUMBER

Matches strings consistent with Thailand's National ID number. Requires a 13-digit number with optional spaces or hyphens (-) after the first, fifth, tenth, and twelfth digits. The final digit is a checksum.

  • Discovered.PII

  • Discovered.Identifier Direct

  • Discovered.Country.Thailand

  • Discovered.PHI

  • Discovered.Entity.National ID Number

TIME

Matches strings consistent with times. Can contain both date and time pieces.

  • Discovered.Entity.Date

UK_DRIVERS_LICENSE_NUMBER

Matches alphanumeric strings consistent with the United Kingdom's Driver's License number. Requires either a 16- or 18-character string. The first five characters represent the driver's surname, padded with 9s, followed by a single digit for decade of birth, two digits for month of birth (incremented by 50 for female drivers), two digits for day of birth, one digit for year of birth, two letters, an arbitrary digit, and two digits. Two additional digits can be present for each license issuance.

  • Discovered.PII

  • Discovered.Identifier Direct,

  • Discovered.Country.UK

  • Discovered.PHI

  • Discovered.Entity.Drivers License Number

UK_NATIONAL_INSURANCE_NUMBER

Matches alphanumeric strings consistent with the United Kingdom's National Insurance number. Requires a nine-character string. The first two digits must be letters, followed by an optional space, then six digits with optional spaces or hyphens (-) every two digits, ending with a letter.

  • Discovered.PII

  • Discovered.Identifier Direct

  • Discovered.Country.UK

  • Discovered.PHI

  • Discovered.Entity.National Insurance Number

UK_TAXPAYER_REFERENCE

Matches ten-digit numeric strings consistent with UK Taxpayer Reference (UTR) numbers. The final digit is a checksum.

  • Discovered.PII

  • Discovered.Identifier Direct

  • Discovered.Country.UK

  • Discovered.PHI

  • Discovered.Entity.Taxpayer Reference

URL

Matches string consistent with a Uniform Resource Locator (URL). String must begin with http://, https://, ftp://, file:///, or mailto:, followed by a string and ending with a top level domain of no more than 128 characters.

  • Discovered.Entity.URL

US_ADOPTION_TAXPAYER_IDENTIFICATION_NUMBER

Matches a numeric string consistent United States Adoption Taxpayer Identification Number (ATIN). Requires a string similar in format to a US Social Security Number, but starting with a 9 in the Area Number and having 93 as an allowed Group Number.

  • Discovered.PII

  • Discovered.Identifier Direct

  • Discovered.Country.US

  • Discovered.PHI

  • Discovered.Entity.Adoption Taxpayer ID Number

US_BANK_ROUTING_MICR

Matches numeric string consistent with an American Bankers Association (ABA) Routing Number. Must be a nine-digit number starting with 0, 1, 2, 3, 6, or 7, followed by eight digits. The final digit is a checksum.

  • Discovered.Country.US

  • Discovered.Entity.Bank Routing MICR

US_DEA_NUMBER

Matches alphanumeric strings consistent with a Drug Enforcement Administration (DEA) number that is assigned to a health care provider. Must be a length of nine characters. The first two digits must be alphanumeric, and the last seven digits must be digits. The final digit is a checksum.

  • Discovered.PII

  • Discovered.Identifier Direct

  • Discovered.Country.US

  • Discovered.Entity.DEA Number

US_EMPLOYER_IDENTIFICATION_NUMBER

Matches numeric string consistent United States Employer Identification Number (EIN). Strings must contain nine digits with a hyphen after the second digit.

  • Discovered.Country.US

  • Discovered.Entity.Employer ID Number

US_HEALTHCARE_NPI

Matches numeric strings consistent with US National Provider Identifier (NPI). Strings must be either 10 or 15 digits with the final digit being a valid checksum.

  • Discovered.PII

  • Discovered.Country.US

  • Discovered.Entity.Healthcare NPI

  • Discovered.Identifier Undetermined

US_INDIVIDUAL_TAXPAYER_IDENTIFICATION_NUMBER

Matches a numeric string consistent United States Individual Taxpayer Identification Number (ITIN). Requires a string similar in format to a US Social Security Number, but starting with a 9 in the Area Number and having a limited set of allowed Group Numbers.

  • Discovered.PII

  • Discovered.Identifier Direct

  • Discovered.Country.US

  • Discovered.PHI

  • Discovered.Entity.Individual Taxpayer ID Number

US_PASSPORT

Matches numeric strings consistent with United States Passport number. Strings must contain nine digits.

  • Discovered.PII

  • Discovered.Identifier Direct

  • Discovered.Country.US

  • Discovered.PHI

  • Discovered.Entity.Passport

US_PREPARER_TAXPAYER_IDENTIFICATION_NUMBER

Matches strings consistent with a Preparer Taxpayer ID number. Strings must have nine characters, starting with a P that is followed by 8 digits.

  • Discovered.PII

  • Discovered.Identifier Direct

  • Discovered.Country.US

  • Discovered.Entity.Preparer Taxpayer ID Number

US_SOCIAL_SECURITY_NUMBER

Matches strings consistent with a US Social Security Number. Strings must contain nine digits and comprise three parts: the three left-most digits designating the area number, the middle two digits designating the group number, and the four right-most digits designating the serial number. For a column to be tagged, none of these parts can contain all zeroes, and area numbers must not be 666 or in the range of 900-999.

  • Discovered.PII

  • Discovered.Identifier Direct

  • Discovered.Country.US

  • Discovered.PHI

  • Discovered.Entity.Social Security Number

US_STATE

Matches strings consistent with either a full name or two-letter abbreviation of a US state or territory.

  • Discovered.Country.US

  • Discovered.Entity.State

US_TOLLFREE_PHONE_NUMBER

Matches strings consistent with a US toll-free telephone number. Allowed area codes are 800, 88+any digit, or 899.

  • Discovered.Country.US

  • Discovered.Entity.Tollfree Telephone Number

VEHICLE_IDENTIFICATION_NUMBER

Matches strings consistent with Vehicle Identification Numbers. A checksum is required as well as a valid World Manufacturer Identifier.

  • Discovered.Country.US

  • Discovered.Entity.Vehicle Identifier or Serial Number

Detect classification
classification frameworks