Implementing Identification: A Strategic Guide
Organizations today manage more data than ever, yet many still struggle to understand what they have, where it lives, and how to protect it. With users ranging from analysts and AI agents to executives and external partners, access must be fast, secure, and compliant.
The key to enabling secure access at scale? Metadata. When data is accurately identified and tagged, you can apply dynamic protections, deliver tailored access, and confidently scale data sharing. Without trustworthy metadata, you risk slowing down access, or worse, exposing sensitive data.
That’s where Immuta’s identification service comes in. Identification is the automated process of detecting and tagging sensitive data—such as personal identifiers, business-critical fields, or regulatory attributes—based on configurable patterns like column names, data values, or dictionaries. This provides the foundation for
Faster provisioning of governed data
Automated policy enforcement
Consistent masking and filtering as data evolves
Audit-ready visibility into data protections
Without identification, every new table or column requires manual review. With it, protections keep pace with your data so access remains both fast and secure.
Key concepts
Identification
Immuta’s Identification service automatically scans data sources to identify and tag data based on configurable criteria. These tags drive access controls, masking, and audit logging, helping you enforce policies at scale.
Identifiers
An identifier defines the criteria and tags applied to data that matches those criteria. Immuta includes built-in identifiers for common data types, which you can use as-is, edit, or build upon with custom identifiers for your unique needs. Rules that define how Immuta detects specific types of data. These can be based on
Regex patterns (e.g., for SSNs or phone numbers)
Dictionary matching (e.g., known hospitals or countries)
Column name patterns (e.g., a column named “patient notes” or “comments” that may contain unstructured text, but still often includes sensitive details like names, emails, or IDs)
Tags applied through identifiers can come from Immuta’s built-in discovered hierarchy, external catalogs, or manual tagging, all of which contribute to a consistent metadata layer across your environment.
There are two types of identifiers:
Reference identifiers - A library of reusable identifiers that can be added to domains. Once added, they become domain-specific copies.
Domain-specific identifiers - These identifiers exist only within a specific domain and apply only to that domain’s data. Edits to reference identifiers won’t affect domain-specific copies
Getting started
1. Define what’s sensitive
Start by determining which data fields are sensitive and how they should be handled. Align this with compliance requirements (e.g., HIPAA, GDPR), business priorities (e.g., protecting IP), and access needs. Also identify business-critical fields that commonly drive access controls, such as region, department, or hospital name.
Use built-in identifiers for common types like SSNs, names, and emails.
Create custom identifiers for organization-specific fields like
customer_id
,project_code
, orhospital_name
.Define detection logic using regex, dictionaries, or column name patterns.
Establish a tagging hierarchy based on your data model. Tags applied by identifiers can come from external catalogs, be manually created in Immuta, or originate from Immuta’s built-in discovered hierarchy.
2. Create and organize identifiers
Build a reusable library of reference identifiers for data types that require consistent tagging, such as SSNs, email addresses, or account IDs. Reference identifiers can be applied across multiple domains and adapted to fit different data contexts.
3. Assign identifiers to domains
Use domains to group related data sources and apply relevant identifiers. Assigning a reference identifier to a domain creates a domain-specific copy, allowing each team to tailor tagging logic without affecting others.
For example, a healthcare organization might create Clinical, Billing, and Research domains, each using domain-specific versions of relevant identifiers.
You can also configure a Global SDD domain to centrally manage tagging across enterprise-wide data sources. Because Immuta allows data sources to belong to multiple domains, a single source can be scanned by both global and business-specific identifiers. Each domain’s tagging logic is evaluated independently, so discovery runs in parallel without conflict.
4. Empower domain-level data discovery
Give domain stewards permission to manage identifiers. They can customize reference identifiers—adjusting regex, renaming, or modifying tags—to fit their domain’s context. This balance of local control and global consistency supports scalable, accurate tagging across the organization.
5. Iterate and improve
Identification is an ongoing process. Regularly review tagging results, refine detection logic, and adjust as data evolves or new data sources are added. With object sync, Immuta automatically detects new tables and columns and re-scans to keep metadata current with minimal manual effort.
Best practices
Start small, scale strategically: Begin with a focused set of identifiers and one or two domains. Once tagging is accurate and policies are working as intended, expand incrementally to more domains and data sources. Scaling gradually helps ensure quality and reduces false positives.
Use reference identifiers to drive consistency: Establish a set of reference identifiers as the foundation for tagging common data types, like SSNs, names, or email addresses. Apply these consistently across domains to avoid duplication and ensure alignment on enterprise-wide policies.
Customize only where necessary: Allow domain stewards to adapt reference identifiers only when the data context truly differs; for example, when a regex pattern needs to match local naming conventions or different tags are required. This balance preserves standardization while enabling flexibility.
Review and refine regularly: Establish a cadence for reviewing identifier performance. Audit tag accuracy, monitor false positives or missed fields, and update patterns as your data evolves. Use Object Sync and automatic scanning to keep tags up to date without needing to reconfigure each time a table changes.
Involve the right stakeholders: Bring in data owners, compliance, legal, and business partners to help define sensitive data criteria and review tagging outcomes. Their input ensures tagging logic reflects both regulatory requirements and operational realities.
What's next?
Accurate and automated identification is the foundation for scalable data governance. Once sensitive fields and business-critical attributes are tagged, you can dynamically apply policies, safely provision data, and drive governed self-service with confidence.
Add classification
Immuta’s classification service builds on identification by categorizing data based on content and risk level, considering existing tags on the column, tags from neighboring columns, and table-level tags on the data source.
It then assigns a sensitivity level to each column. These classifications enhance governance by unlocking smarter automation:
Audit dashboards show access activity by sensitivity level.
Approval flows can use AI-driven risk scoring to determine whether to auto-approve access or require review.
For example, masked access to low-risk data might be auto-approved, while requests for unmasked or highly sensitive data could trigger manual approval. Classification helps you scale governed access while adapting to data risk in real time.
Automate policy enforcement
Accurate and automated identification is the foundation for dynamic policy enforcement. Once sensitive fields and business-critical attributes are tagged, Immuta can automatically apply policies that mask, filter, or restrict access without manual intervention.
For example:
An SSN column tagged by an identifier can automatically trigger a masking policy.
A
department
orhospital_name
tag can drive row-level filtering based on the user’s attributes.
Because Immuta policies are driven by tags, newly added tables and columns that match identifier logic are automatically protected. With policies in place, your data is ready to be safely shared through Immuta Marketplace, giving analysts, researchers, and partners timely access to the data they need while maintaining strong security and compliance controls.
Accelerate provisioning through Immuta Marketplace
Once data is tagged through identification, governed through policies, and optionally classified by sensitivity, it becomes ready for safe and scalable delivery.
Immuta Marketplace makes this provisioning process seamless. Tagged and protected datasets can be published to Marketplace, where users can discover and request access based on governed policies.
This process allows you to:
Automatically approve access to low-risk, fully masked datasets
Route higher-risk requests for manual approval using classification-driven logic
Track activity through audit logs enriched with sensitivity context
For example, a data product containing masked employee emails may be auto-approved for analytics teams, while a request for a table with unmasked patient records would trigger a manual approval workflow due to its classification as highly sensitive.
By combining identification, classification, and policy enforcement, Immuta enables intelligent access decisions that balance data utility with risk. Marketplace is the final step: allowing teams to access the data they need, when they need it, while giving governance teams full confidence that sensitive data remains protected.
Last updated
Was this helpful?