All pages
Powered by GitBook
1 of 7

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Data Classification

Learn about how data classification categorizes the risk level associated with your data

Classification is the process in which data is categorized by the content and the associated risk level based on context. Classification complements identification, and the tags classification applies can give additional information in the audit dashboards for data sources.

How-to guides

  • Activate classification frameworks: Use the API to activate a classification framework.

  • : Create a classification framework using a provided template.

Reference guide

: This reference guide describes classification frameworks and how classification works in Immuta.

How to Use a Classification Framework with Your Own Tags

Customize a classification framework to use your own catalog tags

After you have registered data sources in Immuta, you can start automating data classification of a column based on its context, which is the combination of

  • associated tags already applied to the column

  • tags applied to the neighboring columns and

  • table tags on the data source.

The starter framework in this how-to is built to map a classification scale of restricted, confidential, internal, and public to Immuta's three-level scale of sensitivity. The sensitivity in the classification tags will then appear in .

Follow this guide to map your tags to the example framework, or consult the for more information about the framework schema.

Customize the framework

Using the example framework below, customize the framework for your organization's classification tags:

Parameters

For more information about these parameters see the .

  1. tags: These tags are automatically created in Immuta with the sensitivity you assign. They must not already exist in Immuta. All tags used in the classificationTag parameter should be defined here.

  2. tags.sensitivities: This is metadata for the sensitivity of the new tag. Use confidentiality for dimension. Options for sensitivity are

How to edit rules

Follow the example below to map your tags to the rules in the example framework.

This example framework has a rule where columns tagged DSF.Interpretation.Credentials.Secret by identification will be tagged RAF.Confidentiality.High:

To translate this to your tags, replace the name and source value of the columnTags, neighborColumnTags, or tableTags with your own. This new example is for a Collibra tag from the external catalog that an organization uses for confidential data. This rule now states: Apply the classification tag RAF.Confidentiality.High to a column if it has the collibra tag Confidential. Repeat this for your organization's remaining classification levels.

Find the name and source for your tags

If you do not know the name or source for your tags, you can list your tags using the Immuta API:

This request will list all the tags in your Immuta environment, similar to this example response:

Activate your new framework

Requirement: Immuta permission GOVERNANCE

Once you have made all the customizations to the example framework, make the following request using the Immuta API, with your full customized framework as the payload.

Your new framework will now be visible in the Immuta UI by navigating the Classification section.

Classification Frameworks

Learn about how classification frameworks categorize the risk level of your data

Classification is the process in which data is categorized by the content and the associated risk level based on context. To classify your data, Immuta evaluates your data in two phases:

  1. Identification runs to identify your data by content type. The data is discovered and evaluated by the identifier it matches and is tagged.

  2. Classification runs to classify your data by its context. The data is classified by the rules within a framework and the tags currently applied to the column and table. Once the data is classified, it's tagged with special tags with additional metadata used in the as sensitivity and visualize when that sensitive data is accessed.

Adjust identification and classification framework tags
How to use a classification framework with your own tags
Classification frameworks
Both phases of classification in Immuta can be customized to find and tag the data your organization cares about. After data is classified, classification tags can be used to build policies or visualize sensitive data access in the audit dashboards.

Using classification to assign risk and sensitivity levels to your data and audit dashboards to visualize the risk levels offers these benefits:

  • Increasing the semantic understanding of your data to better meet compliance requirements

  • Reducing the time to make decisions about what data access is allowed under what purposes

  • Reducing the effort and time to respond to auditors about data access in your company

  • Reducing the labor of classifying data to enumerate what data is within the scope of security or regulatory compliance frameworks

What is the difference between entity tags and classification tags?

Both entity and classification tags describe the content of data on a per-column basis, and you can use them to monitor data access and build access policies. However, there are key differences between the two:

  • Entity tags are applied through identification and describe what the data is. Identification applies entity tags to columns based on the patterns of the data.

  • Classification tags are applied through categorization and risk assessment and describe the context of the data and the risk it poses. Using classification frameworks, classification tags are applied to columns based on the entity tags previously applied by identification. Additional classification tags can then be applied, providing even more context or expressing the property of the record rather than just the column.

Why isn’t entity tagging sufficient for classification?

Entity tags describe the contents of individual columns, in isolation. But you don't access individual columns in isolation, so why would you determine their sensitivity that way? Entity tags do not attempt to and cannot contextualize column contents with neighboring columns' contents. This means that connections between data are lost if they cannot be identified through a pattern within the column itself. Classification tags describe the contents of a table with the context of all its columns, providing a holistic view of the risk of the data for what it is, rather than the pattern it fits. Context is necessary to understand whether your data is public or private data, risky or safe to have ungoverned access, or sensitive and creating toxic joins when accessed with other tables.

For example, under HIPAA, a list of procedures a doctor performed is only considered protected health information (PHI) if it can be associated with the identity of patients. Since entity tagging operates on a single column-by-column basis, it cannot reason whether or not a column containing procedure codes merits classification as PHI. Therefore, entity tagging will not tag procedure codes as PHI. But classification tagging will tag it PHI if it detects patient identity information in the other columns of the table.

Additionally, entity tagging does not indicate how sensitive the data is, but classification tags can carry a sensitivity level. For example, an entity tag may identify a column that contains telephone numbers, but the entity tag alone cannot say that the column is sensitive. A phone number associated with a person may be classified as sensitive, while the publicly listed phone number of a company might not be considered sensitive.

After you understand what entities your data contains using identification, you need to adopt frameworks that determine what combinations of data constitute sensitive data and their level of sensitivity.

What is a framework?

Frameworks are a set of data categories and a set of classification rules to place data into those categories. In Immuta, the data categories are represented by tags, and when data fits a classification rule the tag is applied:

  • Classification tags are applied based on the tags applied by identification or other tags on the data source. Classification tags contain additional metadata about each column, such as the source of the tag, the dimension, and the sensitivity level. This metadata is used in the framework rules and complex formulas that assign the sensitivity of queries visible in audit dashboards.

  • Classification rules determine how each classification tag is applied. These rules can apply tags based on tags already on the column, tags applied to neighboring columns, and tags applied to the data source. This means that the complete data source is considered when classifying your data sources, and even tags applied to individual columns can affect the risk level of the entire data source.

Frameworks are often built off of an interpretation of regulatory frameworks or standards, such as the US Health Insurance Portability and Accountability Act (HIPAA) and the PCI standard. However, organizations can also build frameworks that represent their internal business processes. When used in Immuta, they automate data tagging and provide information about what data you have immediately after it is registered in Immuta.

What are the benefits of classification?

Data classification is a process, and with Immuta, much of it is automated. This means that you can reap the benefits of classified and tagged data quicker and easier than manually classifying and tagging it:

  • Quick data access control: Use classification to identify and classify your data immediately after registration in Immuta. Then, build governance policies off of those tags. This repeatable process will protect your data in its current state and whenever any new data sources are created. Automate the process further with schema monitoring; schema monitoring allows you to register data just once. Then, Immuta will monitor your data environment for changes and, when found, update the data source in Immuta, update the tags on that data source, and then update user access based on your governance policies when changes happen.

  • Scale your data monitoring: Use classification to identify and classify your data immediately after registration in Immuta. Then, view your data users' access to your sensitive and risky data through the audit dashboards.

  • Build data platform compliance: Create classification frameworks to identify and classify your data based on the industry practices and regulations your organization needs to abide by. Once the frameworks are built, they will automatically tag data as it's registered, ensuring your data sources are properly tagged to abide by the regulations you care about.

audit dashboards
1
(shown as sensitive in audit dashboards) and
2
(shown as highly sensitive in audit dashboards). For nonsensitive, leave this parameter empty.
  • rules: These are the rules for applying the tags defined above. Each rule contains the classification tag to apply if the requirements are met and the requirements: the column tags, neighboring column tags, and table tags that must be present. All requirements within each defined rule must be met for the classification tag to be applied.

  • rules.classificationTag: The name and source of the tag you want applied if the rule requirements are met. This classification tag must be defined in tags. The source is curated.

  • rules.columnTags: These are the required tags for a column. If the tags defined here are found on a column, and the other tag rules are met, then the rule's classificationTag will be applied to the same column.

  • rules.neighborColumnTags: These are the required tags on other columns in the data source (or in the query if dynamic query classification is enabled). If the tags defined here are found on any column in the data source, and the other tag rules are met, then the rule's classificationTag will be applied to all the neighboring columns.

  • rules.tableTags: These are the required tags on the data source. If the tags defined here are found on the data source, and the other tag rules are met, then the rule's classificationTag will be applied to all the columns in that data source.

  • active: When true the framework is active and will apply tags when the rules are met.

  • data source and query event dashboards
    framework API guide
    Frameworks API reference guide
    {
      "shortName": "ECMC Framework",
      "name": "External Catalog Mapping Classification Framework",
      "description": "This framework maps the classification tags the organization has in Collibra to Immuta data sources.",
      "tags": [
        {
          "name": "ECMC.Confidentiality.Highly Sensitive",
          "source": "curated",
          "sensitivities": [
            {
              "dimension": "confidentiality",
              "sensitivity": 2
            }
          ]
        },
        {
          "name": "ECMC.Confidentiality.Sensitive",
          "source": "curated",
          "sensitivities": [
            {
              "dimension": "confidentiality",
              "sensitivity": 1
            }
          ]
        },
        {
          "name": "ECMC.Confidentiality.Nonsensitive",
          "source": "curated",
          "sensitivities": []
        }
      ],
      "rules": [
        {
          "name": "ECMC 00001",
          "classificationTag": {
            "name": "ECMC.Confidentiality.Highly Sensitive",
            "source": "curated"
          },
          "columnTags": [
            {
              "name": "Restricted",
              "source": "collibra"
            }
          ],
          "neighborColumnTags": [],
          "tableTags": []
        },
        {
          "name": "ECMC 00002",
          "classificationTag": {
            "name": "ECMC.Confidentiality.Sensitive",
            "source": "curated"
          },
          "columnTags": [
            {
              "name": "Confidential",
              "source": "collibra"
            }
          ],
          "neighborColumnTags": [],
          "tableTags": []
        },
        {
          "name": "ECMC 00003",
          "classificationTag": {
            "name": "ECMC.Confidentiality.Sensitive",
            "source": "curated"
          },
          "columnTags": [
            {
              "name": "Internal",
              "source": "collibra"
            }
          ],
          "neighborColumnTags": [],
          "tableTags": []
        },
        {
          "name": "ECMC 00004",
          "classificationTag": {
            "name": "ECMC.Confidentiality.Nonsensitive",
            "source": "curated"
          },
          "columnTags": [
            {
              "name": "Public",
              "source": "curated"
            }
          ],
          "neighborColumnTags": [],
          "tableTags": []
        }
      ],
      "active": true
    }
    "rules": [
    {
        "name": "RAF 00004",
        "classificationTag": {
          "name": "RAF.Confidentiality.High",
          "source": "curated"
        },
        "columnTags": [
        {
            "name": "DSF.Interpretation.Credentials.Secret",
            "source": "curated"
        }
        ],
        "neighborColumnTags": [],
        "tableTags": []
    }
    ]
    "rules": [
    {
        "name": "RAF 00004",
        "classificationTag": {
          "name": "RAF.Confidentiality.High",
          "source": "curated"
        },
        "columnTags": [
        {
            "name": "Confidential",
            "source": "collibra"
        }
        ],
        "neighborColumnTags": [],
        "tableTags": []
    }
    ]
    curl \
        --request GET \
        --header "accept: application/json" \
        --header "Authorization: Bearer <your-token." \
        https://your-immuta-url.com/tag
    [
      {
        "id": 114,
        "name": "DataProperties.Cross-Sectional",
        "source": "curated",
        "deleted": false,
        "systemCreated": true
      },
      {
        "id": 2,
        "name": "Discovered.Country.Argentina",
        "source": "curated",
        "deleted": false,
        "systemCreated": true
      },
      {
        "id": 9,
        "name": "Discovered.Country.Australia",
        "source": "collibra",
        "deleted": false,
        "systemCreated": true
      }
    ]
    curl \
        --request POST \
        --header "Content-Type: application/json" \
        --header "Authorization: Bearer <your-token>" \
        --data @example-payload.json \
        https://your.immuta.com/frameworks/

    How-to Guides

    Adjust Identification and Classification Framework Tags

    Tune identification and classification frameworks to adjust where tags are applied based on your own security and compliance needs

    Requirements:

    • Registered data sources; see the reference page for supported technologies

    • Immuta permission GOVERNANCE

    Immuta provides identifiers out-of-the-box to recognize and tag data. Users can then utilize classification frameworks and build them to apply tags based off those identifier tags and their own catalog tags.

    Tune identifiers first to adjust where the tags are applied. Because classification frameworks can apply classification tags from the identification applied tags, tuning identification should come first and will have trickle-down effects on classification. Customizing identification requires some initial work but will automate data tagging for all data sources in the future.

    Follow the steps below to tune identification for your data:

    1. : This will remove the tags from any previous identification runs and re-run identification with your new identifiers. From here, either continue to edit identifiers to reconfigure the applied tags, or you're finished if you are happy with the results.

    Assess

    After identification has applied entity tags, any active classification frameworks will automatically reapply their tags to account for any changes to tags. It may be necessary to adjust the classification tags based on your organization's data, security, and compliance needs.

    After identification runs, you will receive a notification that the job is complete. Then, you can view the results from the data source columns.

    1. Navigate to the data source overview page of the data source you added to the framework.

    2. Click the Columns tab.

    3. Assess whether the tags are applied as expected.

    4. If you are happy with the tags,

    Assess your data source tags

    Requirement: Immuta permission GOVERNANCE or data owner

    Target some data sources to manually review tags:

    1. Navigate to the Data Sources page and click the Columns tab to open the data source columns.

    2. You will see the data source columns, with details about the name, data type, and a list of the tags on each column. Assess whether the tags are accurate to your data.

    If you find that too many tags are applied

    Tags may be unexpected but still accurate to your data. Additionally, they may have been applied because they were found to be the best match from the identifiers in the framework.

    If you want to improve identification and personalize it to your data, assess why the tag was applied to your data:

    1. Is the identifier incorrectly matching your data and irrelevant to your organization? .

    2. Is the identifier incorrectly matching this specific column, but correct in other places? It must have been the most correct match found by identification. Create a better match by completing the following steps:

      1. .

    If you want to remove the unexpected tags, use one of the following how-to guides:

    1. .

    2. Ensure the tags are applied properly by adjusting identification.

    3. . Note that classification tags build off of other tags, so removing a single classification or identification tag can have trickle-down effects on the data source.

    If you find that tags are missing

    If you were expecting some sensitive data to be tagged and it is not, enable additional tags using one of the following how-to guides:

    1. .

    2. Ensure the tags are applied properly by adjusting identification.

    3. . Note that classification tags build off of other tags, so adding a single tag can have trickle-down effects on the data source.

    Tune your data source columns

    Requirement: Immuta permissions GOVERNANCE and AUDIT

    Tags can be edited on an individual basis for each data source. If broad changes to the classification framework are necessary to re-tag your data, use the .

    1. Navigate to the Data Sources page and select the data sources that you assessed and noted issues.

    2. Click the Columns tab.

    3. Delete unnecessary tags by clicking on the tag you want to remove from the column, and select Disable from the tag side sheet.

    and
    .
  • If you want additional tags, follow the to create identifiers that matter to your data.

  • so this column is correctly matched by identification.
    .
    .
    To add tags,
    1. Click Add Tags in the Actions column.

    2. Begin typing the name of the tag you want to add in the Search by Name field and select the tag from the dropdown list.

    3. Click Add.

    Create new identifiers
    Add identifiers to a domain
    Add data sources to your domain
    Run identification for the domain
    Delete the identifier that applied the tag from the domain
    Create an identifier specific to the column with a new tag
    Remove unnecessary identifiers from the domain
    Remove any excess tags
    Create classification frameworks relevant to your organization
    Add additional tags
    frameworks API
    add the rest of your data sources to the domain
    run identification on the rest of your data sources
    Create an identifier guide
    Add the identifier to the domain
    Adjust the classification framework rules using the frameworks API
    Adjust the classification framework rules using the frameworks API

    Activate Classification Frameworks

    Activate a classification framework to categorize the sensitivity of your data

    Requirement: Immuta permission GOVERNANCE

    To activate a classification framework,

    1. Click tags Metadata in the navigation menu and select Classifications.

    2. Click the more actions icon in the Actions column for the framework you want to activate.

    3. Select Activate.

    Deactivate a classification framework

    1. Click Metadata in the navigation menu and select Classifications.

    2. Click the more actions icon in the Actions column for the framework you want to activate.

    3. Select Deactivate.

    Activate and manage classification frameworks using the API

    To activate a framework using the Immuta API, see the .

    tags
    Frameworks API reference page

    Reference Guide