arrow-left

All pages
gitbookPowered by GitBook
1 of 2

Loading...

Loading...

Managing Data Metadata

This guide describes how to organize and manage data metadata, which is used by Immuta to identify data targeted by policy:

hashtag
Prerequisites

Your schema metadata is registered

hashtag
Enriching your data metadata with tags

To manage data metadata with this particular use case, you should use the .

This is because you must know the contents and sensitivity of every column in your data ecosystem to follow this use case. With orchestrated RBAC, you tag your columns with access logic baked in. ABAC means you tag your columns with facts: what is in the column. It is feasible to do the latter, extremely hard to do the former (unless you use , described below), especially in an data ecosystem with constant change.

Understanding that, read the guide.

However, there are some considerations specific to this use case with regard to data metadata.

hashtag
Automated data tagging

Automated data tagging with is recommended with this use case because it significantly reduces the manual effort involved in classifying and managing data. It ensures that data is consistently and accurately tagged, which is crucial for implementing effective data policies. Moreover, it allows for real-time tagging of data, ensuring that new or updated data is immediately protected by the appropriate policies. This is critical when all columns need to be considered for policy immediately, which is the case with this use case.

hashtag
Schema monitoring enabled

While not directly related to data tagging, is a feature in Immuta that allows organizations to keep track of changes in their data environments. When enabled, Immuta actively monitors the data platform to find when new tables or columns are created or deleted. It then automatically registers or disables these tables in Immuta and updates the tags. This feature ensures that any global policies set in Immuta are applied to these newly created or updated data sources. It is assumed you are using schema monitoring when also using identification. Without this feature, data will remain in while waiting for the policies to update.

hashtag
Data tags are facts

As discussed above, with ABAC your data tags need to be facts. Otherwise, a human must be involved to bake access logic into your tags. As an example, it's easy for identification to find an address, but it's harder for a human to decide if that address should be sensitive and who should have access to it - all defined with a single tag - for every column!

hashtag
Tag lineage

There is one way you can accomplish this use case using orchestrated RBAC: lineage. Immuta's , currently in private preview and for Snowflake only, is able to propagate tags based on transform lineage. What this means is that columns that existed in tables for which the downstream table was derived, those upstream table's tags will get carried through to that new downstream table's columns. This is powerful because you can tag the root table(s) once - with your policy logic tags, as described in - and they will carry through to all downstream tables' columns without any work. Be aware that tag lineage only works for tables; for views, the tags will not be propagated. However, policies on backing tables will be enforced on the views (which is why they are not propagated). Also be aware that if a table exists after some series of views, the tags will propagate to that table. As an example,

hashtag
Next steps

Learn

Read these guides to learn more about using Immuta to mask sensitive data.

  1. : Review this use case to understand how to mask or open up sensitive data to certain users for machine learning and analytics while remaining compliant.

  2. : This guide explains how meaningful user metadata is critical to building scalable policy and understanding the considerations around how and what to capture.

  3. : This guide describes how to define your global data policy logic.

Implement

Follow these guides to start using Immuta to mask sensitive data.

  1. . Tag your users with attributes and groups that are meaningful for Immuta global policies.

  2. . Tag your columns with tags that are meaningful.

  3. . Define your global data policy logic.

ABAC method as described in the Governance use cases introduction
tag lineage
Managing data metadata
identification
schema monitoring
data downtime
lineage feature
Managing data metadata
Once your external data catalog tags, identification tags, and classification tags are applied to your data sources, you can use those tags to author policies to protect your data.
Once your external data catalog tags, identification tags, and classification tags are applied to your data sources, you can use those tags to author policies to protect your data.
Optionally .
Compliantly open more sensitive data for ML and analytics
Managing user metadata
Author policy
Manage user metadata
Manage data metadata
Author policy
test and deploy policy
spinner

Manage Data Metadata How-to Guide

Before authoring global data policies to mask columns or redact rows, data metadata must exist in Immuta so that it can be used in the policy to identify the data that should be masked or redacted.

This how-to guide demonstrates how to manually manage tags, use data identification, or use existing tags in external catalogs to identify data that should be targeted by a data policy.

For detailed explanations and examples of how to manage data metadata, see the Managing data metadata guide.

hashtag
Requirement

Immuta permission: APPLICATION_ADMIN (if using an or ) or GOVERNANCE (if in Immuta)

hashtag
Prerequisites

hashtag
Select your strategy

  • Fact-based (ABAC): Use this strategy to tag data sources at the column and table level.

  • Logic-based (orchestrated RBAC): Use this strategy to tag data sources at the table level.

hashtag
Organize your data metadata

chevron-rightFact-based (ABAC) - Recommendedhashtag

Fact-based column tags are descriptive:

  • Column ssn has column tag social security number

chevron-rightLogic-based (orchestrated-RBAC)hashtag

Logic-based column tags requires subjective decisions (not recommended):

  • Column ssn has column tag PII

hashtag
Enable schema monitoring

Enable to allow Immuta to actively monitor your data platform to find when new tables or columns are created or deleted. Immuta will then automatically register or disable those tables and update the tags.

If you registered your data through , object sync will ensure the objects in your database stay synchronous with the registered objects in Immuta.

hashtag
Apply tags to data in Immuta

There are several options for applying data tags:

  1. : This is the most powerful option. Immuta can , and you can extend what types of entities are discovered to those specific to your business. Identification can run completely within your data platform, with no data leaving at all for Immuta to analyze. Identification is more relevant for the ABAC approach because the tags are facts about the data.

  2. : You may have already done all the work tagging your data in some external catalog or your own homegrown tool. If so, Immuta can pull those tags in and use them.

  3. : Manually tag tables and columns in Immuta from , using the , or when , either during initial registration or subsequent tables discovered in the future through

hashtag
Next steps

Column f_name has column tag name
  • Column dob has column tags date and date of birth

  • Create tags that describe the data source columns.

    Column f_name has column tag sensitive
  • Column dob has column tag indirect identifier

  • There is one way you can accomplish this use case using orchestrated RBAC: lineage. Immuta's lineage feature (for Snowflake only) can propagate tags based on transform lineage.

    1. Enable Snowflake lineage.

    2. Ensure tags are in a hierarchy that will support hierarchical matching. For example, if you have the tags Strictly Confidential, Confidential, Internal, and Public , you would want to ensure that user attributes follow the same hierarchy. For example,

      1. A user with access to all data: Classification: Strictly Confidential

      2. A user with access to only Internal and Public: Classification: Strictly Confidential.Confidential.Internal

    .

    2 - Manage data metadata how-to guide

    1 - Manage user metadata how-to guide

    3 - Author policy how-to guide

    Learn

    Read these guides to learn more about using Immuta to mask sensitive data.

    1. : This section describes the two different approaches (or mix) you can take to managing policy and their tradeoffs.

    2. : This guide explains how meaningful user metadata is critical to building scalable policy and understanding the considerations around how and what to capture.

    3. : This guide describes how to define your global data policy logic.

    Implement

    Follow these guides to start using Immuta to mask sensitive data.

    1. . Tag your users with attributes and groups that are meaningful for Immuta global policies.

    2. . Define your global data policy logic.

    3. Optionally .

    external catalog
    identification
    manually adding tags
    Data platform connected to Immuta
    Data sources registered in Immuta
    External catalog connected to Immuta (optional)
    schema monitoring
    connections
    Use identification
    discover your sensitive data
    Sync tags from an external source
    Manually tag
    within the UI
    Immuta API
    registering the data
    schema monitoring
    Choose your path: orchestrated RBAC and ABAC
    Managing user metadata
    Author policy
    Manage user metadata
    Author policy
    test and deploy policy