Data Classification

About Classification in Immuta

Public preview: This feature is available to all accounts.

Classification is the process in which data is categorized by the content and the associated risk level based on context. To classify your data, Discover evaluates your data in phases:

Sensitive data discovery (SDD) runs to identify your data by content type. The data is discovered and evaluated by the identifier it matches and is tagged.
The Data Security Framework scans those tags and any other tags applied to the data source and columns to categorize the data by context. This phase considers the data and the data surrounding it to understand the category of the data within the context of the data source.
Other regulatory-based frameworks scan and build off of the Data Security Framework tags. These frameworks are specific to regulations and standards and tag the data that matters to each framework.
The Risk Assessment Framework scans and builds off of the Data Security Framework. This framework tags data with specific risk assessment tags that describe the risk the data poses to your organization or the data subject. They also contain additional metadata used in the to describe the risk as sensitivity and visualize when that sensitive data is accessed.

Every phase of classification in Immuta can be customized to find and tag the data your organization cares about. Users can customize the Data Security Framework to find, match, and tag data they want categorized based on the organization's processes. Then, users can modify the by adjusting the sensitivity of classification tags to the organization’s policies or creating new tags and rules in customized frameworks. After data is classified, classification tags can be used to or .

Using Discover classification to assign risk and sensitivity levels to your data and Detect dashboards to visualize the risk levels offers these benefits:

Increasing the semantic understanding of your data to better meet compliance requirements
Reducing the time to make decisions about what data access is allowed under what purposes
Reducing the effort and time to respond to auditors about data access in your company
Reducing the labor of classifying data to enumerate what data is within the scope of security or regulatory compliance frameworks

What is the difference between entity tags and classification tags?

Both entity and classification tags describe the content of data on a per-column basis, and you can use them to and . However, there are key differences between the two:

Entity tags are applied through identification and describe what the data is. SDD applies entity tags to columns based on the pattern of the data.
Classification tags are applied through categorization and risk assessment and describe the context of the data and the risk it poses. Using classification frameworks, classification tags are applied to columns based on the entity tags previously applied by SDD. Additional classification tags can then be applied, providing even more context or expressing the property of the record rather than just the column.

Why isn’t entity tagging sufficient for classification?

Entity tags describe the contents of individual columns, in isolation. But you don't access individual columns in isolation, so why would you determine their sensitivity that way? Entity tags do not attempt to and cannot contextualize column contents with neighboring columns' contents. This means that connections between data are lost if they cannot be identified through a pattern within the column itself. Classification tags describe the contents of a table with the context of all its columns, providing a holistic view of the risk of the data for what it is, rather than the pattern it fits. Context is necessary to understand whether your data is public or private data, risky or safe to have ungoverned access, or sensitive and creating toxic joins when accessed with other tables.

Additionally, entity tagging does not indicate how sensitive the data is, but classification tags can carry a sensitivity level. For example, an entity tag may identify a column that contains telephone numbers, but the entity tag alone cannot say that the column is sensitive. A phone number associated with a person may be classified as sensitive, while the publicly listed phone number of a company might not be considered sensitive.

After you understand what entities your data contains using SDD, you need to adopt frameworks that determine what combinations of data constitute sensitive data and their level of sensitivity.

What is a framework?

Frameworks are a set of data categories and a set of classification rules to place data into those categories. In Immuta, the data categories are represented by tags, and when data fits a classification rule the tag is applied:

Classification rules determine how each classification tag is applied. These rules can apply tags based on tags already on the column, tags applied to neighboring columns, and tags applied to the data source. This means that the complete data source is considered when classifying your data sources, and even tags applied to individual columns can affect the risk level of the entire data source.

What are the benefits of classification?

Data classification is a process, and with Immuta, much of it is automated. This means that you can reap the benefits of classified and tagged data quicker and easier than manually classifying and tagging it:

Last updated 7 months ago

Was this helpful?

What is the difference between entity tags and classification tags?

Both entity and classification tags describe the content of data on a per-column basis, and you can use them to and . However, there are key differences between the two:

Entity tags are applied through identification and describe what the data is. SDD applies entity tags to columns based on the pattern of the data.

Classification tags are applied through categorization and risk assessment and describe the context of the data and the risk it poses. Using classification frameworks, classification tags are applied to columns based on the entity tags previously applied by SDD. Additional classification tags can then be applied, providing even more context or expressing the property of the record rather than just the column.

Why isn’t entity tagging sufficient for classification?

For example, under HIPAA, a list of procedures a doctor performed is only considered protected health information (PHI) if it can be associated with the identity of patients. Since entity tagging operates on a single column-by-column basis, it cannot reason whether or not a column containing procedure codes merits classification as PHI. Therefore, entity tagging will not tag procedure codes as PHI. But classification tagging will tag it PHI if it detects patient identity information in the other columns of the table. This is an example that Immuta built-in frameworks can address out-of-the-box using the .

After you understand what entities your data contains using SDD, you need to adopt frameworks that determine what combinations of data constitute sensitive data and their level of sensitivity.

What is a framework?

Classification tags are applied based on the Discovered tags from SDD or other tags on the data source. Classification tags contain additional metadata about each column, such as the source of the tag, the dimension, and the sensitivity level. This metadata is used in the framework rules and complex formulas that assign the sensitivity of queries visible in .

Classification rules determine how each classification tag is applied. These rules can apply tags based on tags already on the column, tags applied to neighboring columns, and tags applied to the data source. This means that the complete data source is considered when classifying your data sources, and even tags applied to individual columns can affect the risk level of the entire data source.

Frameworks are often built off of an interpretation of regulatory frameworks or standards, such as the US Health Insurance Portability and Accountability Act (HIPAA) and the PCI standard. However, organizations can also build frameworks that represent their internal business processes. When used in Immuta, they automate data tagging and provide, through the , information about what data you have immediately after it is registered in Immuta.

See the for more information about the frameworks Immuta provides out-of-the-box.

What are the benefits of classification?

Quick data access control: Use Discover to identify and classify your data immediately after registration in Immuta. Then, off of those tags. This repeatable process will protect your data in its current state and whenever any new data sources are created. Automate the process further with ; schema monitoring allows you to register data just once. Then, Immuta will monitor your data environment for changes and, when found, update the data source in Immuta, update the tags on that data source, and then update user access based on your governance policies when changes happen.

Scale your data monitoring: Use Discover to identify and classify your data immediately after registration in Immuta. Then, view your data users' access to your sensitive and risky data through the .

Build data platform compliance: Use and customize the to identify and classify your data based on the industry practices and regulations your organization needs to abide by. The Immuta compliance frameworks are templates to provide a strong starting point for further customization to what matters to your organization. Once those frameworks are built, use them to classify your data immediately after data registration in Immuta.