> For the complete documentation index, see [llms.txt](https://documentation.immuta.com/latest/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://documentation.immuta.com/latest/governance/getting-started-with-secure/automate-data-access-control-decisions/managing-data-metadata.md).

# Managing Data Metadata

## Prerequisites

[Your schema metadata is registered](/latest/configuration/integrations/registering-metadata/data-sources/register-data-sources.md)

## Considerations

<figure><img src="/files/1ynFB7ZC5RAnxWCw9AQ8" alt=""><figcaption></figcaption></figure>

Now that we’ve [enriched facts about our users](/latest/governance/getting-started-with-secure/automate-data-access-control-decisions/managing-user-metadata.md), let’s focus on the second point on the policy triangle diagram: the data tags. Just like you need user metadata, you need metadata on your data (tags) in order to decouple policy logic from referencing physical tables or columns. You must choose between the [orchestrated RBAC](/latest/governance/getting-started-with-secure/automate-data-access-control-decisions/the-two-paths.md#orchestrated-rbac-one-to-one) or [ABAC](/latest/governance/getting-started-with-secure/automate-data-access-control-decisions/the-two-paths.md#abac-many-to-many) method of data access:

* Orchestrated RBAC method: tag data sources at the table level
* ABAC method: tag data at the table and column level

While it is possible to target policies using both table- and column-level tags, for ABAC it’s more common to target column tags because they represent more granularly what is in the table. Just like user metadata needs to be facts about your users, the data metadata must be facts about the data. The tags on your tables should not contain any policy logic.

Fact-based column tags are descriptive (recommended):

* Column `ssn` has column tag `social security number`
* Column `f_name` has column tag `name`
* Column `dob` has column tags `date` and `date of birth`

Logic-based column tags requires subjective decisions (not recommended):

* Column `ssn` has column tag `PII`
* Column `f_name` has column tag `sensitive`
* Column `dob` has column tag `indirect identifier`

*But can't I get policy authoring scalability by tagging things with higher level classifications, like PII, so I can build broader policies?* This is what Immuta’s [classification frameworks](/latest/configuration/manage-data-metadata/data-classification.md) are for.

Entity tags are facts about the contents of individual columns in isolation. Entity tags are what we listed above: social security number, name, date, and data of birth. Entity tags do not attempt to contextualize column contents with neighboring columns' contents. Instead, *categorization* and *classification* tags describe the sensitive contents of a table with the context of all its columns, which is what is listed in the logic-based tags above, things like PII, sensitive, and indirect identifier.

For example, under the HIPAA framework a list of procedures a doctor performed is only considered protected health information (PHI) if it can be associated with the identity of patients. Since entity tagging operates on a single column-by-column basis, it can’t reason whether or not a column containing procedure codes merits classification as PHI. Therefore, entity tagging will not tag procedure codes as PHI. But *categorization* tagging will tag it PHI if it detects patient identity information in the other columns of the table.

Additionally, entity tagging does not indicate how sensitive the data is, but *categorization* tags carry a sensitivity level, the *classification* tag. For example, an entity tag may identify a column that contains telephone numbers, but the entity tag alone cannot say that the column is sensitive. A phone number associated with a person may be classified as sensitive, while the publicly-listed phone number of a company might not be considered sensitive.

Contextual tags are really what you should target with policy where possible. This provides a way to create higher level objects for more scalable and generic policy. Rather than building a policy like “allow access to tables with columns tagged `person name` and `phone number`,” it would be much easier to build it like “allow access to tables with columns tagged `PII`.”

In short, you must tag your entities, and then rely on a classification framework (provided by Immuta or customized by you) to provide the higher level context, also as tags. Remember, the owners of the tables (those who created them) can tag the data with facts about what is in the columns without having to understand the higher level implications of those tags (categorization and classification). This allows better separation of duty.

For orchestrated-RBAC, the data tags are no longer facts about your data, they are instead a single variable that determines access. As such, they should be table-level tags (which also improves the amount of processing Immuta must do).

## Applying data tags

There are several options for applying data tags:

1. **Data identification**: This is the most powerful option. Immuta is able to [discover your sensitive data](/latest/configuration/manage-data-metadata/data-discovery.md), and you are able to extend what types of entities are discovered to those specific to your business. Identification can run completely within your data platform, with no data leaving at all for Immuta to analyze. Identification is more relevant for the ABAC approach because the tags are facts about the data.
2. **Tags from an external source**: You may have already done all the work tagging your data in some external catalog or your own homegrown tool. If so, Immuta can pull those tags in and use them. See the [Support matrix](/latest/releases/support-matrix.md#external-catalogs) for a list of the supported external catalogs. But remember, just like user metadata, these should represent facts about your data and not policy decisions.
3. **Manually tag**: Just like with user metadata, you are able to manually tag tables and columns in Immuta from [within the UI](/latest/configuration/manage-data-metadata/tags/how-to-guides/managing-tags.md), using the [Immuta API](/latest/developer-guides/api-intro/immuta-v1-api/configure-your-instance-of-immuta/tagging.md), or when registering the data, either during initial registration or subsequent tables discovered in the future through schema monitoring.

## Data tag hierarchy

Just like hierarchy has an impact with user metadata, so can data tag hierarchy. We discussed the matching of user metadata to data metadata in the [Managing user metadata](/latest/governance/getting-started-with-secure/automate-data-access-control-decisions/managing-user-metadata.md) guide. However, there are even simpler approaches that can leverage data tag hierarchy beyond matching. This will be covered in more detail in the [Author policy](/latest/governance/getting-started-with-secure/automate-data-access-control-decisions/author-policy.md) guide, but is important to understand as you think through data tagging.

As a quick example, it is possible to tag your data with `Cars` and then also tag that same data with more specific tags (in the hierarchy) such as `Cars.Nissan.Xterra`. Then, when you build policies, you could allow access to tables tagged `Cars` to `administrators`, but only those tagged `Cars.Nissan.Xterra` to `suv_inspectors`. This will result in two separate policies landing on the same table, and the beauty of Immuta is that it will handle the conflict of those two separate policies. This provides a large amount of scalability because you have to manage far fewer policies.

Imagine if you didn’t have this capability? You would have to include `administrators` access to every policy you created for the different vehicle makes - and if that policy needed to evolve, such as adding more than `administrators` to all cars, it would be an enormous effort to make that change. With Immuta, it’s one policy change.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://documentation.immuta.com/latest/governance/getting-started-with-secure/automate-data-access-control-decisions/managing-data-metadata.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.