Skip to content

Data Classification

Private Preview

This feature is only available to select accounts.

Classification is the categorization of data by the content and the level of responsibility that the data requires. Both of these factors inform a sensitivity scale of highly sensitive, sensitive, and non-sensitive. To classify your data, Immuta Detect scans your data in two phases:

  1. Immuta Detect runs sensitive data discovery (SDD) to find the content type, such as age, country, or phone number and tag the column that contains that content type.
  2. Immuta Detect evaluates your data using the Immuta Data Security Framework (Immuta DSF) to determine the level of responsibility that the data requires based on the content type. The Immuta DSF is a customizable set of rules that is informed by data compliance best practices and data regulations.

Once Immuta Detect has completed those steps, it can be used to review user and data activity based on the classification of your data according to common regulatory frameworks as seen in the Immuta DSF or your organization’s security rules if you choose to customize the Immuta DSF.

The sensitivity scale reflects levels of risk which enables you to react to, triage, and prioritize governance improvements with context and nuance. The levels of risk can also be modified, by adjusting the sensitivity of classification tags, to your individual organization’s policies through customized frameworks.

Visualizing the sensitivity of your data using a scale of risk levels offers these benefits:

  • Increasing the semantic understanding of your data to meet compliance better
  • Reducing the time to make decisions about what data access is allowed under what purposes
  • Reducing the effort and time to respond to auditors about data access in your company
  • Reducing the effort of classifying data to enumerate what data is within scope of security or regulatory compliance frameworks

Entity tags versus classification tags

Classification results are represented in the Immuta UI as classification tags. Like the existing entity tags, classification tags describe the content of data on a per-column basis and you can use them to monitor data access and build access policies.

However, Immuta applies entity and classification tags differently. Immuta SDD applies entity tags to columns based on the patterns of the data. Immuta DSF applies classification tags to columns based on the entity tags that were previously applied by SDD and based on the rules contained within a specific framework.

For example, if you have data within a column that is mostly or all four alphabetic characters followed by an optional space or hyphen (-), and then eight digits with an optional hyphen or space after the fourth digit, then that column is likely to be a Quebec's Health Insurance Number and SDD will tag it with Discovered.Entity.Quebec Health Insurance Number. This is an example of entity tagging.

Why isn’t entity tagging sufficient for classification?

Entity tags describe the contents of individual columns, in isolation. Entity tags do not attempt to contextualize column contents with neighboring columns' contents. Classification tags describe the sensitive contents of a table with the context of all its columns.

For example, under the HIPAA framework a list of procedures a doctor performed is only considered protected health information (PHI) if it can be associated with the identity of patients. Since entity tagging operates on a single column-by-column basis, it can’t reason whether or not a column containing procedure codes merits classification as PHI. Therefore, entity tagging will not tag procedure codes as PHI. But classification tagging will tag it PHI if it detects patient identity information in the other columns of the table.

Additionally, entity tagging does not indicate how sensitive the data is, but classification tags carry a sensitivity level. For example, an entity tag may identify a column that contains telephone numbers, but the entity tag alone cannot say that the column is sensitive. A phone number associated with a person may be classified as sensitive, while the publicly-listed phone number of a company might not be considered sensitive.

Frameworks

After you understand what entities your data contains using SDD, you need to adopt frameworks that determine what combinations of data constitute sensitive data and what level of sensitivity they are. Classification tags and rules are included in frameworks:

  • Classification tags are applied based on the Discovered tags from SDD. Classification tags contain additional metadata about each column such as the source of the tag, the dimension, and the sensitivity level. This metadata is used in the rules of the framework and in complex formulas used to assign the sensitivity of queries that you can see in the Detect dashboards.

  • Classification rules determine how each classification tag is applied. These rules can apply tags based on tags already on the column, tags applied to neighboring columns, and tags applied to the data source. This means that the comprehensive data source is taken into consideration when classifying your data sources and even tags that are applied to individual columns can affect the risk level of the entire data source.

Immuta DSF

The Immuta DSF was designed by Immuta’s Legal Engineering and Research Engineering teams and is informed by data privacy regulations and security standards: GDPR, CCPA, GLBA, HIPAA, PCI, and global best practices. Tags should be reviewed by your organization's compliance team to ensure complete inventory and proper classification. Once activated, Immuta DSF will immediately work within Immuta Detect by assessing the entity tags applied by SDD and applying the built-in rules and conditions to apply classification tags to the necessary columns. These classification tags then inform the sensitivity type of the data in the dashboards and reflect the risk levels of the columns, data sources, and queries that contain your data. The Immuta DSF is a supportive tool that accelerates data classification.

Workflow

The classification process will run as part of a job. It can be triggered manually through the data source health check dropdown menu, but runs automatically from the following events:

  • When a framework gets created
  • When a framework gets updated
  • When a framework gets deleted
  • When a tag gets added to a column
  • When a tag gets removed from a column
  • When a tag gets added to a data source

Caveats

  • Enabling and customizing frameworks currently requires users to use the Immuta API.
  • Sensitive data discovery tags are automatically applied to columns that match the pattern, as described above. However, they will also be automatically removed from that column if the pattern is no longer recognized. This means that it is possible for Discovered tags, and therefore classification tags, to be removed from data sources without user action.