1 of 6

Data Classification

About Classification in Immuta

Public preview: This feature is available to all accounts.

Classification is the process in which data is categorized by the content and the associated risk level based on context. To classify your data, Discover evaluates your data in phases:

Sensitive data discovery (SDD) runs to identify your data by content type. The data is discovered and evaluated by the pattern it matches and is tagged.
The Data Security Framework scans those tags and any other tags applied to the data source and columns to categorize the data by context. This phase considers the data and the data surrounding it to understand the category of the data within the context of the data source.
Other regulatory-based frameworks scan and build off of the Data Security Framework tags. These frameworks are specific to regulations and standards and tag the data that matters to each framework.
The Risk Assessment Framework scans and builds off of the Data Security Framework. This framework tags data with specific risk assessment tags that describe the risk the data poses to your organization or the data subject. They also contain additional metadata used in the Detect dashboards to describe the risk as sensitivity and visualize when that sensitive data is accessed.

Every phase of classification in Immuta can be customized to find and tag the data your organization cares about. Users can customize the Data Security Framework to find, match, and tag data they want categorized based on the organization's processes. Then, users can modify the Risk Assessment Framework by adjusting the sensitivity of classification tags to the organization’s policies or creating new tags and rules in customized frameworks. After data is classified, classification tags can be used to build Secure policies or visualize sensitive data access in Detect dashboards.

Using Discover classification to assign risk and sensitivity levels to your data and Detect dashboards to visualize the risk levels offers these benefits:

Increasing the semantic understanding of your data to better meet compliance requirements
Reducing the time to make decisions about what data access is allowed under what purposes
Reducing the effort and time to respond to auditors about data access in your company
Reducing the labor of classifying data to enumerate what data is within the scope of security or regulatory compliance frameworks

What is the difference between entity tags and classification tags?

Both entity and classification tags describe the content of data on a per-column basis, and you can use them to monitor data access and build access policies. However, there are key differences between the two:

Entity tags are applied through identification and describe what the data is. SDD applies entity tags to columns based on the patterns of the data.
Classification tags are applied through categorization and risk assessment and describe the context of the data and the risk it poses. Using classification frameworks, classification tags are applied to columns based on the entity tags previously applied by SDD. Additional classification tags can then be applied, providing even more context or expressing the property of the record rather than just the column.

Why isn’t entity tagging sufficient for classification?

Entity tags describe the contents of individual columns, in isolation. But you don't access individual columns in isolation, so why would you determine their sensitivity that way? Entity tags do not attempt to and cannot contextualize column contents with neighboring columns' contents. This means that connections between data are lost if they cannot be identified through a pattern within the column itself. Classification tags describe the contents of a table with the context of all its columns, providing a holistic view of the risk of the data for what it is, rather than the pattern it fits. Context is necessary to understand whether your data is public or private data, risky or safe to have ungoverned access, or sensitive and creating toxic joins when accessed with other tables.

For example, under HIPAA, a list of procedures a doctor performed is only considered protected health information (PHI) if it can be associated with the identity of patients. Since entity tagging operates on a single column-by-column basis, it cannot reason whether or not a column containing procedure codes merits classification as PHI. Therefore, entity tagging will not tag procedure codes as PHI. But classification tagging will tag it PHI if it detects patient identity information in the other columns of the table. This is an example that Immuta built-in frameworks can address out-of-the-box using the Data Security and HIPAA frameworks.

Additionally, entity tagging does not indicate how sensitive the data is, but classification tags can carry a sensitivity level. For example, an entity tag may identify a column that contains telephone numbers, but the entity tag alone cannot say that the column is sensitive. A phone number associated with a person may be classified as sensitive, while the publicly listed phone number of a company might not be considered sensitive.

After you understand what entities your data contains using SDD, you need to adopt frameworks that determine what combinations of data constitute sensitive data and their level of sensitivity.

What is a framework?

Frameworks are a set of data categories and a set of classification rules to place data into those categories. In Immuta, the data categories are represented by tags, and when data fits a classification rule the tag is applied:

Classification tags are applied based on the Discovered tags from SDD or other tags on the data source. Classification tags contain additional metadata about each column, such as the source of the tag, the dimension, and the sensitivity level. This metadata is used in the framework rules and complex formulas that assign the sensitivity of queries visible in Detect dashboards.
Classification rules determine how each classification tag is applied. These rules can apply tags based on tags already on the column, tags applied to neighboring columns, and tags applied to the data source. This means that the complete data source is considered when classifying your data sources, and even tags applied to individual columns can affect the risk level of the entire data source.

Frameworks are often built off of an interpretation of regulatory frameworks or standards, such as the US Health Insurance Portability and Accountability Act (HIPAA) and the PCI standard. However, organizations can also build frameworks that represent their internal business processes. When used in Immuta, they automate data tagging and provide, through the data inventory dashboard, information about what data you have immediately after it is registered in Immuta.

See the Built-in classification frameworks guide for more information about the frameworks Immuta provides out-of-the-box.

What are the benefits of classification?

Data classification is a process, and with Immuta, much of it is automated. This means that you can reap the benefits of classified and tagged data quicker and easier than manually classifying and tagging it:

Quick data access control: Use Discover to identify and classify your data immediately after registration in Immuta. Then, build Secure governance policies off of those tags. This repeatable process will protect your data in its current state and whenever any new data sources are created. Automate the process further with schema monitoring; schema monitoring allows you to register data just once. Then, Immuta will monitor your data environment for changes and, when found, update the data source in Immuta, update the tags on that data source, and then update user access based on your governance policies when changes happen.
Scale your data monitoring: Use Discover to identify and classify your data immediately after registration in Immuta. Then, view your data users' access to your sensitive and risky data through the Detect dashboards.
Build data platform compliance: Use and customize the built-in compliance frameworks to identify and classify your data based on the industry practices and regulations your organization needs to abide by. The Immuta compliance frameworks are templates to provide a strong starting point for further customization to what matters to your organization. Once those frameworks are built, use them to classify your data immediately after data registration in Immuta.

How-to Guides

Activate Classification Frameworks

Private preview

This feature is only available to select accounts. To activate classification frameworks without the private preview feature, use the /frameworks endpoint in the API.

Requirements:

Native SDD enabled and turned on
Frameworks enabled
Registered Snowflake, Databricks, Redshift, or Starburst (Trino) data sources
Immuta permission GOVERNANCE

To activate a classification framework,

Navigate to Discover and select the Classification tab.
Click the more actions icon in the Actions column for the framework you want to activate.
Select Activate.

Repeat this process for all frameworks relevant to your data. See the Frameworks reference guide for information on Immuta's built-in frameworks.

Deactivate a classification framework

Navigate to Discover and select the Classification tab.
Click the more actions icon in the Actions column for the framework you want to activate.
Select Deactivate.

Activate and manage classification frameworks using the API

To activate a framework using the Immuta API, see the Frameworks API reference page.

Adjust Identification and Classification Framework Tags

Requirements:

Native SDD enabled and turned on
Frameworks enabled
Registered Snowflake, Databricks, Redshift, or Starburst (Trino) data sources
Immuta permission GOVERNANCE

Immuta Discover provides identification frameworks out-of-the-box to recognize and tag data, and Discover also provides classification frameworks out-of-the-box to categorize and classify data. These frameworks are all generic to industry practices and should be customized to each organization's specific needs.

Tune SDD frameworks, rules, and patterns first to adjust where Discovered tags are applied. Because classification frameworks apply classification tags from the Discovered tags, tuning SDD should come first and will have trickle-down effects on classification. Customizing SDD requires some initial work but will automate data tagging for all data sources in the future.

Follow the steps below to tune SDD from the Default Framework:

Create a new identification framework: It is recommended to copy the Default Framework and adjust the rules from there.
Configure the resulting tags in the rules.
Create a pattern and rule specific to your organization.
Add a few data sources to your new framework: This will remove the tags from any previous identification frameworks and rerun SDD with your new framework. From here, either continue to edit patterns and rules to reconfigure the applied tags, or if you are happy with the results, proceed to the next step.
Configure SDD to run your new framework on all data sources.

After SDD has applied entity tags, classification frameworks will automatically reapply their tags to account for any changes to Discovered tags. It may be necessary to adjust the classification tags based on your organization's data, security, and compliance needs.

Assess your queries with Detect

Requirements:

Immuta permission AUDIT
Snowflake integration (If you are using Databricks, use the assess your data source tags how-to below.)

Use the Detect dashboards to review queries at different sensitivity levels and review the tags that have been applied to your data source columns to understand the tags that Immuta applied there:

Have an Immuta user subscribed to a data source make multiple queries to a data source in Snowflake. The user should query both non-sensitive and sensitive data.
Navigate to the Audit page and click ↻Native Query Audit to pull in queries made in Snowflake.
Navigate to the Events (Beta) page. Note that Snowflake has a 15-minute data latency for all audit events.
Select the Event Id of one of the queries. Click the Columns tab.
The Column tab lists the columns in the query organized from highest to lowest sensitivity and the tags applied to each column. Check that the columns you know to be sensitive are here.
For example, if the query has a column with last names, you should see a minimum of the following tags: Discovered.PII, DSF. Personal, DSF.Record.Subject.Type.Individual, DSF.Record.Identifiability.Identifiable, and DSF.Control.Personal.
Note any sensitive columns not labeled as sensitive.
Complete steps 2-5 for as many queries as you want.

Assess your data source tags

Requirement: Immuta permission GOVERNANCE or data owner

Target some data sources to manually review tags:

Navigate to the data dictionary for the data source by opening the Data Sources page and selecting a data source. Click the Data Dictionary tab to open the data dictionary.
The data dictionary lists the data source columns, with details about the name, data type, and a list of the tags on each column. Assess whether the tags are accurate to your data.

If you find that too many tags are applied

Tags may be unexpected but still accurate to your data. Additionally, they may have been applied because they were found to be the best match from the SDD patterns in the framework.

If you want to improve SDD and personalize it to your data,

Assess why the tag was applied to your data.
Is the pattern incorrectly matching your data and irrelevant to your organization? Delete the rule that applied the tag from the identification framework.
Is the pattern incorrectly matching this specific column, but correct in other places? It must have been the most correct match found by SDD. Create a better match by completing the following steps:
1. Create a pattern specific to the column.
2. Create a Discovered tag for the column and new pattern.
3. Add the pattern and the tag to a rule in the identification framework so this column is correctly matched by SDD.

If you want to remove the unexpected tags, use one of the following how-to guides:

Deactivate frameworks irrelevant to your organization.
Ensure the Discovered tags are applied properly by adjusting SDD.
Remove any excess tags. Note that classification tags build off of other tags, so removing a single classification or Discovered tag can have trickle-down effects on the data source.
Adjust the classification framework rules using the frameworks API.

If you find that tags are missing

If you were expecting some sensitive data to be tagged and it is not, enable additional tags using one of the following how-to guides:

Activate additional frameworks relevant to your organization.
Ensure the Discovered tags are applied properly by adjusting SDD.
Add additional tags. Note that classification tags build off of other tags, so adding a single classification or Discovered tag can have trickle-down effects on the data source.
Adjust the classification framework rules using the frameworks API.

Tune your data dictionaries

Requirement: Immuta permissions GOVERNANCE and AUDIT

Tags can be edited on an individual basis for each data source. If broad changes to the classification framework are necessary to re-tag your data, use the frameworks API.

Navigate to the Data Sources page and select the data sources that you assessed and noted issues.
Click the Data Dictionary tab.
Delete unnecessary tags by clicking on the tag you want to remove from the column, and select Disable from the tag side sheet.
To add tags,
1. Click Add Tags in the Actions column.
2. Begin typing the name of the tag you want to add in the Search by Name field and select the tag from the dropdown list.
3. Click Add.

How to Use a Built-In Classification Framework with Your Own Tags

The built-in classification frameworks in Immuta provide a quick way to leverage your own catalog or data platform tags to establish classifications tags. These classification tags can then be used in the Immuta Data Platform for query activity visualizations, monitors, reports, and policies. After you have configured a data catalog integration and registered data sources in Immuta, you can start automating data classification of a column based on its context by considering the combination of its associated tags, its neighboring columns' tags, or its table tag. Classification frameworks also provide query event context. To use classification frameworks with your current tags from an external catalog, use one of the following options:

Follow the tutorial below: This starter framework is built to map a classification scale of restricted, confidential, internal, and public to Immuta's three level scale. It requires an external catalog to be set up, but all other steps are described below.
Use Risk Assessment Framework (RAF): This minimal framework allows you to map your own classification tags to Immuta classification tags. Then, your users' queries will have a sensitivity score on the Detect dashboard and in audit logs based on the classification tags on the data columns they queried. Use this option if you have already classified your organization’s data in an external catalog and want that metadata reflected in Immuta as Sensitive and Highly Sensitive.
Use a compliance framework: This option allows you to map your own tags describing your data to Immuta's predefined classification tags in the context of a specific compliance framework. Immuta provides built-in frameworks for GDPR, CCPA, and HIPAA. Map your tags to the most comparable Data Security Framework (DSF) tag, and Immuta will apply the classification tag based on the framework. Use this option if you have descriptive tags on your data and want that metadata mapped to a specific compliance framework.

Follow this guide to map your external catalog tags to the example framework, or consult the framework API guide for more information about the framework schema.

Customize the framework

Using the example framework below, customize the framework for your organization's classification tags.

Example framework

{
  "shortName": "ECMC Framework",
  "name": "External Catalog Mapping Classification Framework",
  "description": "This framework maps the classification tags the organization has in Collibra to Immuta data sources.",
  "": [
    {
      "name": "ECMC.Confidentiality.Highly Sensitive",
      "source": "curated",
      "": [
        {
          "": "confidentiality",
          "": 2
        }
      ]
    },
    {
      "name": "ECMC.Confidentiality.Sensitive",
      "source": "curated",
      "sensitivities": [
        {
          "dimension": "confidentiality",
          "sensitivity": 1
        }
      ]
    },
    {
      "name": "ECMC.Confidentiality.Nonsensitive",
      "source": "curated",
      "sensitivities": []
    }
  ],
  "": [
    {
      "name": "ECMC 00001",
      "": {
        "name": "ECMC.Confidentiality.Highly Sensitive",
        "": "curated"
      },
      "": [
        {
          "name": "Restricted",
          "source": "collibra"
        }
      ],
      "": [],
      "": []
    },
    {
      "name": "ECMC 00002",
      "classificationTag": {
        "name": "ECMC.Confidentiality.Sensitive",
        "source": "curated"
      },
      "columnTags": [
        {
          "name": "Confidential",
          "source": "collibra"
        }
      ],
      "neighborColumnTags": [],
      "tableTags": []
    },
    {
      "name": "ECMC 00003",
      "classificationTag": {
        "name": "ECMC.Confidentiality.Sensitive",
        "source": "curated"
      },
      "columnTags": [
        {
          "name": "Internal",
          "source": "collibra"
        }
      ],
      "neighborColumnTags": [],
      "tableTags": []
    },
    {
      "name": "ECMC 00004",
      "classificationTag": {
        "name": "ECMC.Confidentiality.Nonsensitive",
        "source": "curated"
      },
      "columnTags": [
        {
          "name": "Public",
          "source": "curated"
        }
      ],
      "neighborColumnTags": [],
      "tableTags": []
    }
  ],
  "": true
}

Parameters

For more information about these parameters see the Frameworks API reference guide.

tags: These tags are automatically created in Immuta with the sensitivity you assign. All tags used in the classificationTag parameter should be defined here.
tags.sensitivities: This is metadata for the sensitivity of the new tag. Use confidentiality for dimension. Options for sensitivity are 1 (shown as sensitive in Detect dashboards) and 2 (shown as highly sensitive in Detect dashboards). For nonsensitive, leave this parameter empty.
rules: These are the rules for applying the tags defined above.
rules.classificationTag: This classification tag must be defined in tags. Add the name you want and the source is curated. This is the tag that will be applied if the rule requirement is met.
rules.columnTags: This object represents tags on a column. If the tag defined here is found on a column, then the rule's classificationTag will be applied to the same column.
rules.neighborColumnTags: This object represents tags on other columns in the data source. If the tag defined here is found on any column in the data source, then the rule's classificationTag will be applied to all the neighboring columns.
rules.tableTags: This object represents tags on the data source. If the tag defined here is found on the data source, then the rule's classificationTag will be applied to all the columns in that data source.
active: When true the framework is active and will apply tags when the rules are met.

How to edit rules

Follow the example below to map your external tags to the rules in the example framework.

The Immuta built-in framework, Risk Assessment Framework has a rule where columns tagged DSF.Interpretation.Credentials.Secret by sensitive data discovery will be tagged RAF.Confidentiality.High:

"rules": [
{
    "name": "RAF 00004",
    "classificationTag": {
      "name": "RAF.Confidentiality.High",
      "source": "curated"
    },
    "columnTags": [
    {
        "name": "DSF.Interpretation.Credentials.Secret",
        "source": "curated"
    }
    ],
    "neighborColumnTags": [],
    "tableTags": []
}
]

To translate this to your tags, replace the name and source value of the columnTags, neighborColumnTags, or tableTags with your own. This new example is for a Collibra tag that an organization uses for confidential data. This rule now states: Apply the classification tag RAF.Confidentiality.High to a column if it has the collibra tag Confidential. Repeat this for your organization's remaining classification levels.

"rules": [
{
    "name": "RAF 00004",
    "classificationTag": {
      "name": "RAF.Confidentiality.High",
      "source": "curated"
    },
    "columnTags": [
    {
        "name": "Confidential",
        "source": "collibra"
    }
    ],
    "neighborColumnTags": [],
    "tableTags": []
}
]

Find the `name` and `source` for your tags

If you do not know the name or source for your tags, you can list your tags using the Immuta API:

curl \
    --request GET \
    --header "accept: application/json" \
    --header "Authorization: Bearer <your-token." \
    https://your-immuta-url.com/tag

This request will list all the tags in your Immuta environment, similar to this example response:

[
  {
    "id": 114,
    "name": "DataProperties.Cross-Sectional",
    "source": "curated",
    "deleted": false,
    "systemCreated": true
  },
  {
    "id": 2,
    "name": "Discovered.Country.Argentina",
    "source": "curated",
    "deleted": false,
    "systemCreated": true
  },
  {
    "id": 9,
    "name": "Discovered.Country.Australia",
    "source": "collibra",
    "deleted": false,
    "systemCreated": true
  }
]

Activate your new framework

Requirement: Immuta permission GOVERNANCE

Once you have made all the customizations to the example framework, make the following request using the Immuta API, with your full customized framework as the payload.

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer <your-token>" \
    --data @example-payload.json \
    https://your.immuta.com/frameworks/

Your new framework will now be visible in the Immuta UI by navigating the the Classification section under Discover.

Built-in Classification Frameworks Reference Guide

Public preview: This feature is public preview and available to all accounts.

Discover comes preconfigured with a bundle of classification frameworks for use out-of-the-box once endorsed by your organization's admins. These frameworks are designed by Immuta’s Legal Engineering and Research Engineering teams and informed by data privacy regulations and security standards: GDPR, CCPA, GLBA, HIPAA, PCI, and global best practices. They are a starting point for companies to customize to their own classification, security, and risk policies.

Data Security Framework

The Data Security Framework is the general classification framework. It provides the groundwork for categorizing data based on its context but is not specific to any regulatory framework and does not assign sensitivity or risk values to the data it tags. It provides a consistent taxonomy used throughout Immuta, from other built-in frameworks to customized frameworks that classify data valuable to your organization to Secure data and subscription policies.

The Data Security Framework is a supportive tool that accelerates data classification. Use the Data Security Framework in tandem with Discover identification frameworks out-of-the-box for the easy and quick onboarding of data sources and tags. Then, choose the compliance frameworks that matter to your industry or start building your own classification frameworks that assign sensitivity to the specific data of your organization. Your organization's compliance team should review the compliance frameworks as you would a template for a policy or contract and adapt them as needed to ensure a complete inventory and proper classification of your data.

You can view the Data Security Framework tags and their descriptions from the tags page in the UI or from the data dictionary when they are applied to a data source. Note the field and record tags. While they seem similar, the field and record tags are both necessary to convey the content of your data. Field tags describe the content of the columns, and record tags describe the content of the table.

Use the Data Security Framework with the Risk Assessment Framework

To classify your data use both the Data Security Framework to set the groundwork for classification and the Risk Assessment Framework to apply tags with sensitivity metadata based on the Data Security Framework tags. With Snowflake, these frameworks together will show sensitive queries in Detect dashboards.

Risk Assessment Framework

The Risk Assessment Framework provides the visible tags to your data's sensitivity based on the confidentiality risks it poses to your organization or the data subjects.

Use the Risk Assessment Framework out-of-the-box with the Data Security Framework and Discover identification frameworks to provide sensitivities to view in the Detect dashboards. Additionally, you can copy the framework using the API and create new rules to assign risk level and sensitivity to other data specific to your use case.

Risk assessment tags

The risk assessment tags have sensitivity level metadata assigned to them that will appear in the Detect dashboards as non-sensitive (when no risk assessment tag is applied), sensitive, and highly-sensitive. Additionally, use the risk assessment tags to build Secure policies to restrict access to highly-risky and confidential data.

Tag Name

Description

Sensitivity

Sensitivity Level

Compliance frameworks

Private preview: This feature is private preview and available to select accounts.

Use the Data Security Framework with regulatory frameworks

The Data Security Framework provides the necessary translation of Discovered entity tags to classification tags. Without the Data Security Framework on, the regulatory frameworks will not automatically work with your data and will require customization.

Immuta comes with four regulatory frameworks informed by the best practices of a specific regulation or standard. These are designed by Immuta’s Legal Engineering and Research Engineering teams as a general interpretation, but each organization should customize them based on their internal practices:

CCPA Framework: Classifies personal sensitive information controlled under the California Consumer Privacy Act (CCPA), as amended by the Consumer Privacy Rights Act (CCPA). This framework tags personal information, including communication content (like the body of a text message) and details about an individual's sexual orientation, religion, race, or biometric data.
GDPR Framework: Classifies personal data of specific categories protected under the EU General Data Protection Regulation (GDPR). This framework tags personal data, including details about an individual's health, sexual orientation, religion, race, or biometric data.
HIPAA Framework: Classifies protected health data controlled under the US Health Insurance Portability and Accountability Act (HIPAA). This framework tags health data connected to a specific individual.
PCI Framework: Classifies payment card information relevant to the Payment Card Industry (PCI) standard. This framework tags payment card information, including account, authentication, and cardholder data.

Some compliance frameworks are used to to add context and apply Data Security Framework tags. Use the data inventory dashboard to enable frameworks with information on the other frameworks they depend on.

About Immuta's frameworks

Organizations are responsible for making their own independent assessment of the framework rules. The framework rules are only templates and are not necessarily adapted to the specific context in which an organization operates. Framework rules do not constitute legal advice. They do not create any commitments or assurances from Immuta that users will necessarily comply with the statutes or standards that have informed these framework rules.

Data Classification

About Classification in Immuta

Public preview: This feature is available to all accounts.

Classification is the process in which data is categorized by the content and the associated risk level based on context. To classify your data, Discover evaluates your data in phases:

Sensitive data discovery (SDD) runs to identify your data by content type. The data is discovered and evaluated by the pattern it matches and is tagged.
The Data Security Framework scans those tags and any other tags applied to the data source and columns to categorize the data by context. This phase considers the data and the data surrounding it to understand the category of the data within the context of the data source.
Other regulatory-based frameworks scan and build off of the Data Security Framework tags. These frameworks are specific to regulations and standards and tag the data that matters to each framework.
The Risk Assessment Framework scans and builds off of the Data Security Framework. This framework tags data with specific risk assessment tags that describe the risk the data poses to your organization or the data subject. They also contain additional metadata used in the Detect dashboards to describe the risk as sensitivity and visualize when that sensitive data is accessed.

Using Discover classification to assign risk and sensitivity levels to your data and Detect dashboards to visualize the risk levels offers these benefits:

Increasing the semantic understanding of your data to better meet compliance requirements
Reducing the time to make decisions about what data access is allowed under what purposes
Reducing the effort and time to respond to auditors about data access in your company
Reducing the labor of classifying data to enumerate what data is within the scope of security or regulatory compliance frameworks

What is the difference between entity tags and classification tags?

Entity tags are applied through identification and describe what the data is. SDD applies entity tags to columns based on the patterns of the data.
Classification tags are applied through categorization and risk assessment and describe the context of the data and the risk it poses. Using classification frameworks, classification tags are applied to columns based on the entity tags previously applied by SDD. Additional classification tags can then be applied, providing even more context or expressing the property of the record rather than just the column.

Why isn’t entity tagging sufficient for classification?

After you understand what entities your data contains using SDD, you need to adopt frameworks that determine what combinations of data constitute sensitive data and their level of sensitivity.

What is a framework?

Classification tags are applied based on the Discovered tags from SDD or other tags on the data source. Classification tags contain additional metadata about each column, such as the source of the tag, the dimension, and the sensitivity level. This metadata is used in the framework rules and complex formulas that assign the sensitivity of queries visible in Detect dashboards.
Classification rules determine how each classification tag is applied. These rules can apply tags based on tags already on the column, tags applied to neighboring columns, and tags applied to the data source. This means that the complete data source is considered when classifying your data sources, and even tags applied to individual columns can affect the risk level of the entire data source.

See the Built-in classification frameworks guide for more information about the frameworks Immuta provides out-of-the-box.

What are the benefits of classification?

Quick data access control: Use Discover to identify and classify your data immediately after registration in Immuta. Then, build Secure governance policies off of those tags. This repeatable process will protect your data in its current state and whenever any new data sources are created. Automate the process further with schema monitoring; schema monitoring allows you to register data just once. Then, Immuta will monitor your data environment for changes and, when found, update the data source in Immuta, update the tags on that data source, and then update user access based on your governance policies when changes happen.
Scale your data monitoring: Use Discover to identify and classify your data immediately after registration in Immuta. Then, view your data users' access to your sensitive and risky data through the Detect dashboards.
Build data platform compliance: Use and customize the built-in compliance frameworks to identify and classify your data based on the industry practices and regulations your organization needs to abide by. The Immuta compliance frameworks are templates to provide a strong starting point for further customization to what matters to your organization. Once those frameworks are built, use them to classify your data immediately after data registration in Immuta.

Adjust Identification and Classification Framework Tags

Requirements:

Native SDD enabled and turned on
Frameworks enabled
Registered Snowflake, Databricks, Redshift, or Starburst (Trino) data sources
Immuta permission GOVERNANCE

Follow the steps below to tune SDD from the Default Framework:

Create a new identification framework: It is recommended to copy the Default Framework and adjust the rules from there.
Configure the resulting tags in the rules.
Create a pattern and rule specific to your organization.
Add a few data sources to your new framework: This will remove the tags from any previous identification frameworks and rerun SDD with your new framework. From here, either continue to edit patterns and rules to reconfigure the applied tags, or if you are happy with the results, proceed to the next step.
Configure SDD to run your new framework on all data sources.

Assess your queries with Detect

Requirements:

Immuta permission AUDIT
Snowflake integration (If you are using Databricks, use the assess your data source tags how-to below.)

Use the Detect dashboards to review queries at different sensitivity levels and review the tags that have been applied to your data source columns to understand the tags that Immuta applied there:

Have an Immuta user subscribed to a data source make multiple queries to a data source in Snowflake. The user should query both non-sensitive and sensitive data.
Navigate to the Audit page and click ↻Native Query Audit to pull in queries made in Snowflake.
Navigate to the Events (Beta) page. Note that Snowflake has a 15-minute data latency for all audit events.
Select the Event Id of one of the queries. Click the Columns tab.
The Column tab lists the columns in the query organized from highest to lowest sensitivity and the tags applied to each column. Check that the columns you know to be sensitive are here.
For example, if the query has a column with last names, you should see a minimum of the following tags: Discovered.PII, DSF. Personal, DSF.Record.Subject.Type.Individual, DSF.Record.Identifiability.Identifiable, and DSF.Control.Personal.
Note any sensitive columns not labeled as sensitive.
Complete steps 2-5 for as many queries as you want.

Assess your data source tags

Requirement: Immuta permission GOVERNANCE or data owner

Target some data sources to manually review tags:

Navigate to the data dictionary for the data source by opening the Data Sources page and selecting a data source. Click the Data Dictionary tab to open the data dictionary.
The data dictionary lists the data source columns, with details about the name, data type, and a list of the tags on each column. Assess whether the tags are accurate to your data.

If you find that too many tags are applied

Tags may be unexpected but still accurate to your data. Additionally, they may have been applied because they were found to be the best match from the SDD patterns in the framework.

If you want to improve SDD and personalize it to your data,

Assess why the tag was applied to your data.
Is the pattern incorrectly matching your data and irrelevant to your organization? Delete the rule that applied the tag from the identification framework.
Is the pattern incorrectly matching this specific column, but correct in other places? It must have been the most correct match found by SDD. Create a better match by completing the following steps:
1. Create a pattern specific to the column.
2. Create a Discovered tag for the column and new pattern.
3. Add the pattern and the tag to a rule in the identification framework so this column is correctly matched by SDD.

If you want to remove the unexpected tags, use one of the following how-to guides:

Deactivate frameworks irrelevant to your organization.
Ensure the Discovered tags are applied properly by adjusting SDD.
Remove any excess tags. Note that classification tags build off of other tags, so removing a single classification or Discovered tag can have trickle-down effects on the data source.
Adjust the classification framework rules using the frameworks API.

If you find that tags are missing

If you were expecting some sensitive data to be tagged and it is not, enable additional tags using one of the following how-to guides:

Activate additional frameworks relevant to your organization.
Ensure the Discovered tags are applied properly by adjusting SDD.
Add additional tags. Note that classification tags build off of other tags, so adding a single classification or Discovered tag can have trickle-down effects on the data source.
Adjust the classification framework rules using the frameworks API.

Tune your data dictionaries

Requirement: Immuta permissions GOVERNANCE and AUDIT

Tags can be edited on an individual basis for each data source. If broad changes to the classification framework are necessary to re-tag your data, use the frameworks API.

Navigate to the Data Sources page and select the data sources that you assessed and noted issues.
Click the Data Dictionary tab.
Delete unnecessary tags by clicking on the tag you want to remove from the column, and select Disable from the tag side sheet.
To add tags,
1. Click Add Tags in the Actions column.
2. Begin typing the name of the tag you want to add in the Search by Name field and select the tag from the dropdown list.
3. Click Add.

How to Use a Built-In Classification Framework with Your Own Tags

Follow the tutorial below: This starter framework is built to map a classification scale of restricted, confidential, internal, and public to Immuta's three level scale. It requires an external catalog to be set up, but all other steps are described below.
Use Risk Assessment Framework (RAF): This minimal framework allows you to map your own classification tags to Immuta classification tags. Then, your users' queries will have a sensitivity score on the Detect dashboard and in audit logs based on the classification tags on the data columns they queried. Use this option if you have already classified your organization’s data in an external catalog and want that metadata reflected in Immuta as Sensitive and Highly Sensitive.
Use a compliance framework: This option allows you to map your own tags describing your data to Immuta's predefined classification tags in the context of a specific compliance framework. Immuta provides built-in frameworks for GDPR, CCPA, and HIPAA. Map your tags to the most comparable Data Security Framework (DSF) tag, and Immuta will apply the classification tag based on the framework. Use this option if you have descriptive tags on your data and want that metadata mapped to a specific compliance framework.

Follow this guide to map your external catalog tags to the example framework, or consult the framework API guide for more information about the framework schema.

Customize the framework

Using the example framework below, customize the framework for your organization's classification tags.

Example framework

{
  "shortName": "ECMC Framework",
  "name": "External Catalog Mapping Classification Framework",
  "description": "This framework maps the classification tags the organization has in Collibra to Immuta data sources.",
  "": [
    {
      "name": "ECMC.Confidentiality.Highly Sensitive",
      "source": "curated",
      "": [
        {
          "": "confidentiality",
          "": 2
        }
      ]
    },
    {
      "name": "ECMC.Confidentiality.Sensitive",
      "source": "curated",
      "sensitivities": [
        {
          "dimension": "confidentiality",
          "sensitivity": 1
        }
      ]
    },
    {
      "name": "ECMC.Confidentiality.Nonsensitive",
      "source": "curated",
      "sensitivities": []
    }
  ],
  "": [
    {
      "name": "ECMC 00001",
      "": {
        "name": "ECMC.Confidentiality.Highly Sensitive",
        "": "curated"
      },
      "": [
        {
          "name": "Restricted",
          "source": "collibra"
        }
      ],
      "": [],
      "": []
    },
    {
      "name": "ECMC 00002",
      "classificationTag": {
        "name": "ECMC.Confidentiality.Sensitive",
        "source": "curated"
      },
      "columnTags": [
        {
          "name": "Confidential",
          "source": "collibra"
        }
      ],
      "neighborColumnTags": [],
      "tableTags": []
    },
    {
      "name": "ECMC 00003",
      "classificationTag": {
        "name": "ECMC.Confidentiality.Sensitive",
        "source": "curated"
      },
      "columnTags": [
        {
          "name": "Internal",
          "source": "collibra"
        }
      ],
      "neighborColumnTags": [],
      "tableTags": []
    },
    {
      "name": "ECMC 00004",
      "classificationTag": {
        "name": "ECMC.Confidentiality.Nonsensitive",
        "source": "curated"
      },
      "columnTags": [
        {
          "name": "Public",
          "source": "curated"
        }
      ],
      "neighborColumnTags": [],
      "tableTags": []
    }
  ],
  "": true
}

Parameters

For more information about these parameters see the Frameworks API reference guide.

tags: These tags are automatically created in Immuta with the sensitivity you assign. All tags used in the classificationTag parameter should be defined here.
tags.sensitivities: This is metadata for the sensitivity of the new tag. Use confidentiality for dimension. Options for sensitivity are 1 (shown as sensitive in Detect dashboards) and 2 (shown as highly sensitive in Detect dashboards). For nonsensitive, leave this parameter empty.
rules: These are the rules for applying the tags defined above.
rules.classificationTag: This classification tag must be defined in tags. Add the name you want and the source is curated. This is the tag that will be applied if the rule requirement is met.
rules.columnTags: This object represents tags on a column. If the tag defined here is found on a column, then the rule's classificationTag will be applied to the same column.
rules.neighborColumnTags: This object represents tags on other columns in the data source. If the tag defined here is found on any column in the data source, then the rule's classificationTag will be applied to all the neighboring columns.
rules.tableTags: This object represents tags on the data source. If the tag defined here is found on the data source, then the rule's classificationTag will be applied to all the columns in that data source.
active: When true the framework is active and will apply tags when the rules are met.

How to edit rules

Follow the example below to map your external tags to the rules in the example framework.

The Immuta built-in framework, Risk Assessment Framework has a rule where columns tagged DSF.Interpretation.Credentials.Secret by sensitive data discovery will be tagged RAF.Confidentiality.High:

"rules": [
{
    "name": "RAF 00004",
    "classificationTag": {
      "name": "RAF.Confidentiality.High",
      "source": "curated"
    },
    "columnTags": [
    {
        "name": "DSF.Interpretation.Credentials.Secret",
        "source": "curated"
    }
    ],
    "neighborColumnTags": [],
    "tableTags": []
}
]

"rules": [
{
    "name": "RAF 00004",
    "classificationTag": {
      "name": "RAF.Confidentiality.High",
      "source": "curated"
    },
    "columnTags": [
    {
        "name": "Confidential",
        "source": "collibra"
    }
    ],
    "neighborColumnTags": [],
    "tableTags": []
}
]

Find the `name` and `source` for your tags

If you do not know the name or source for your tags, you can list your tags using the Immuta API:

curl \
    --request GET \
    --header "accept: application/json" \
    --header "Authorization: Bearer <your-token." \
    https://your-immuta-url.com/tag

This request will list all the tags in your Immuta environment, similar to this example response:

[
  {
    "id": 114,
    "name": "DataProperties.Cross-Sectional",
    "source": "curated",
    "deleted": false,
    "systemCreated": true
  },
  {
    "id": 2,
    "name": "Discovered.Country.Argentina",
    "source": "curated",
    "deleted": false,
    "systemCreated": true
  },
  {
    "id": 9,
    "name": "Discovered.Country.Australia",
    "source": "collibra",
    "deleted": false,
    "systemCreated": true
  }
]

Activate your new framework

Requirement: Immuta permission GOVERNANCE

Once you have made all the customizations to the example framework, make the following request using the Immuta API, with your full customized framework as the payload.

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer <your-token>" \
    --data @example-payload.json \
    https://your.immuta.com/frameworks/

Your new framework will now be visible in the Immuta UI by navigating the the Classification section under Discover.

Data Classification

What is the difference between entity tags and classification tags?

Why isn’t entity tagging sufficient for classification?

What is a framework?

What are the benefits of classification?

How-to Guides

Activate Classification Frameworks

Deactivate a classification framework

Activate and manage classification frameworks using the API

Adjust Identification and Classification Framework Tags

Assess your queries with Detect

Assess your data source tags

If you find that too many tags are applied

If you find that tags are missing

Tune your data dictionaries

How to Use a Built-In Classification Framework with Your Own Tags

Customize the framework

Example framework

Parameters

How to edit rules

Find the name and source for your tags

Activate your new framework

Built-in Classification Frameworks Reference Guide

Data Security Framework

Risk Assessment Framework

Risk assessment tags

Compliance frameworks

About Immuta's frameworks

Activate Classification Frameworks

Deactivate a classification framework

Activate and manage classification frameworks using the API

Data Classification

What is the difference between entity tags and classification tags?

Why isn’t entity tagging sufficient for classification?

What is a framework?

What are the benefits of classification?

Adjust Identification and Classification Framework Tags

Assess your queries with Detect

Assess your data source tags

If you find that too many tags are applied

If you find that tags are missing

Tune your data dictionaries

How to Use a Built-In Classification Framework with Your Own Tags

Customize the framework

Example framework

Parameters

How to edit rules

Find the name and source for your tags

Activate your new framework

Built-in Classification Frameworks Reference Guide

Data Security Framework

Risk Assessment Framework

Risk assessment tags

Compliance frameworks

About Immuta's frameworks

Find the `name` and `source` for your tags

Find the `name` and `source` for your tags