> For the complete documentation index, see [llms.txt](https://documentation.immuta.com/SaaS/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://documentation.immuta.com/SaaS/knowledge-base/implementation/data-metadata/implementing-identification-a-strategic-guide.md).

# Implementing Identification: A Strategic Guide

Organizations today manage more data than ever, yet many still struggle to understand what they have, where it lives, and how to protect it. With users ranging from analysts and AI agents to executives and external partners, access must be **fast**, **secure**, and **compliant**.

The key to enabling secure access at scale? **Metadata**. When data is accurately identified and tagged, you can apply dynamic protections, deliver tailored access, and confidently scale data sharing. Without trustworthy metadata, you risk slowing down access, or worse, exposing sensitive data.

That’s where Immuta’s [**identification**](/SaaS/configuration/tags/data-discovery.md) service comes in. Identification is the automated process of detecting and tagging sensitive data—such as personal identifiers, business-critical fields, or regulatory attributes—based on configurable patterns like column names, data values, or dictionaries. This provides the foundation for

* **Faster provisioning** of governed data
* **Automated policy enforcement**
* **Consistent masking and filtering** as data evolves
* **Audit-ready visibility** into data protections

Without identification, every new table or column requires manual review. With it, protections keep pace with your data so access remains both fast and secure.

## Key concepts <a href="#key-concepts" id="key-concepts"></a>

### Identification <a href="#identification" id="identification"></a>

Immuta’s Identification service automatically scans data sources to identify and tag data based on configurable criteria. These tags drive access controls, masking, and audit logging, helping you enforce policies at scale.

### Identifiers <a href="#identifiers" id="identifiers"></a>

An identifier defines the criteria and tags applied to data that matches those criteria. Immuta includes built-in identifiers for common data types, which you can use as-is, edit, or build upon with custom identifiers for your unique needs. Rules that define how Immuta detects specific types of data. These can be based on

* **Regex patterns** (e.g., for SSNs or phone numbers)
* **Dictionary matching** (e.g., known hospitals or countries)
* **Column name patterns** (e.g., a column named “patient notes” or “comments” that may contain unstructured text, but still often includes sensitive details like names, emails, or IDs)

Tags applied through identifiers can come from **Immuta’s built-in discovered hierarchy**, **external catalogs**, or **manual tagging,** all of which contribute to a consistent metadata layer across your environment.

There are two types of identifiers:

* **Reference identifiers -** A library of reusable identifiers that can be added to domains. Once added, they become domain-specific copies.
* **Domain-specific identifiers -** These identifiers exist only within a specific domain and apply only to that domain’s data. Edits to reference identifiers won’t affect domain-specific copies

## Getting started <a href="#getting-started" id="getting-started"></a>

### 1. Define what’s sensitive <a href="#id-1.-define-whats-sensitive" id="id-1.-define-whats-sensitive"></a>

Start by determining which data fields are sensitive and how they should be handled. Align this with compliance requirements (e.g., HIPAA, GDPR), business priorities (e.g., protecting IP), and access needs. Also identify **business-critical fields** that commonly drive access controls, such as region, department, or hospital name.

* Use **built-in identifiers** for common types like SSNs, names, and emails.
* Create **custom identifiers** for organization-specific fields like `customer_id`, `project_code`, or `hospital_name`.
* Define detection logic using **regex**, **dictionaries**, or **column name patterns**.
* Establish a **tagging hierarchy** based on your data model. Tags applied by identifiers can come from external catalogs, be manually created in Immuta, or originate from Immuta’s built-in discovered hierarchy.

### 2. Create and organize identifiers <a href="#id-2.-create-and-organize-identifiers" id="id-2.-create-and-organize-identifiers"></a>

Build a reusable library of **reference identifiers** for data types that require consistent tagging, such as SSNs, email addresses, or account IDs. Reference identifiers can be applied across multiple domains and adapted to fit different data contexts.

### 3. Assign identifiers to domains <a href="#id-3.-assign-identifiers-to-domains" id="id-3.-assign-identifiers-to-domains"></a>

Use **domains** to group related data sources and apply relevant identifiers. Assigning a reference identifier to a domain creates a domain-specific copy, allowing each team to tailor tagging logic without affecting others.

For example, a healthcare organization might create Clinical, Billing, and Research domains, each using domain-specific versions of relevant identifiers.

You can also configure a **Global SDD domain** to centrally manage tagging across enterprise-wide data sources. Because Immuta allows data sources to belong to multiple domains, a single source can be scanned by both global and business-specific identifiers. Each domain’s tagging logic is evaluated independently, so discovery runs in parallel without conflict.

### 4. Empower domain-level data discovery <a href="#id-4.-empower-domain-level-discovery" id="id-4.-empower-domain-level-discovery"></a>

Give domain stewards permission to manage identifiers. They can customize reference identifiers—adjusting regex, renaming, or modifying tags—to fit their domain’s context. This balance of local control and global consistency supports scalable, accurate tagging across the organization.

### 5. Iterate and improve <a href="#id-5.-iterate-and-improve" id="id-5.-iterate-and-improve"></a>

Identification is an ongoing process. Regularly review tagging results, refine detection logic, and adjust as data evolves or new data sources are added. With object sync, Immuta automatically detects new tables and columns and re-scans to keep metadata current with minimal manual effort.

## Best practices <a href="#best-practices" id="best-practices"></a>

**Start small, scale strategically:** Begin with a focused set of identifiers and one or two domains. Once tagging is accurate and policies are working as intended, expand incrementally to more domains and data sources. Scaling gradually helps ensure quality and reduces false positives.

**Use reference identifiers to drive consistency:** Establish a set of reference identifiers as the foundation for tagging common data types, like SSNs, names, or email addresses. Apply these consistently across domains to avoid duplication and ensure alignment on enterprise-wide policies.

**Customize only where necessary:** Allow domain stewards to adapt reference identifiers only when the data context truly differs; for example, when a regex pattern needs to match local naming conventions or different tags are required. This balance preserves standardization while enabling flexibility.

**Review and refine regularly:** Establish a cadence for reviewing identifier performance. Audit tag accuracy, monitor false positives or missed fields, and update patterns as your data evolves. Use Object Sync and automatic scanning to keep tags up to date without needing to reconfigure each time a table changes.

**Involve the right stakeholders:** Bring in data owners, compliance, legal, and business partners to help define sensitive data criteria and review tagging outcomes. Their input ensures tagging logic reflects both regulatory requirements and operational realities.

## What's next? <a href="#whats-next" id="whats-next"></a>

Accurate and automated identification is the foundation for scalable data governance. Once sensitive fields and business-critical attributes are tagged, you can dynamically apply policies, safely provision data, and drive governed self-service with confidence.

### Add classification <a href="#add-classification" id="add-classification"></a>

Immuta’s [classification](/SaaS/configuration/tags/data-classification.md) service builds on identification by categorizing data based on content and risk level, considering existing tags on the column, tags from neighboring columns, and table-level tags on the data source.

It then assigns a **sensitivity level** to each column. These classifications enhance governance by unlocking smarter automation:

* **Audit dashboards** show access activity by sensitivity level.
* **Approval flows** can use **AI-driven risk scoring** to determine whether to auto-approve access or require review.

For example, masked access to low-risk data might be auto-approved, while requests for unmasked or highly sensitive data could trigger manual approval. Classification helps you scale governed access while adapting to data risk in real time.

### Automate policy enforcement <a href="#automate-policy-enforcement" id="automate-policy-enforcement"></a>

Accurate and automated identification is the foundation for dynamic policy enforcement. Once sensitive fields and business-critical attributes are tagged, Immuta can automatically apply policies that mask, filter, or restrict access without manual intervention.

For example:

* An SSN column tagged by an identifier can automatically trigger a masking policy.
* A `department` or `hospital_name` tag can drive row-level filtering based on the user’s attributes.

Because Immuta policies are driven by tags, newly added tables and columns that match identifier logic are automatically protected. With policies in place, your data is ready to be safely shared through the Request app, giving analysts, researchers, and partners timely access to the data they need while maintaining strong security and compliance controls.

### Accelerate provisioning through the Request app <a href="#accelerate-provisioning-through-immuta-marketplace" id="accelerate-provisioning-through-immuta-marketplace"></a>

Once data is tagged through **identification**, governed through **policies**, and optionally classified by **sensitivity**, it becomes ready for safe and scalable delivery.

[The Request app](/SaaS/request/introduction.md) makes this provisioning process seamless. Tagged and protected datasets can be published as data products in the Request app, where users can discover and request access based on governed policies.

This process allows you to:

* **Automatically approve** access to low-risk, fully masked datasets
* **Route higher-risk requests** for manual approval using classification-driven logic
* **Track activity** through audit logs enriched with sensitivity context

For example, a data product containing masked employee emails may be auto-approved for analytics teams, while a request for a table with unmasked patient records would trigger a manual approval workflow due to its classification as highly sensitive.

By combining identification, classification, and policy enforcement, Immuta enables intelligent access decisions that balance data utility with risk. The Request app is the final step: allowing teams to access the data they need, when they need it, while giving governance teams full confidence that sensitive data remains protected.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://documentation.immuta.com/SaaS/knowledge-base/implementation/data-metadata/implementing-identification-a-strategic-guide.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.