Skip to content

Discover Module

The Discover module is how you can automate discovering and tagging data across your data platform. It encompasses the identification and classification of data using frameworks.

Requirements

Components

The Immuta UI has separate sections for identification frameworks and classification frameworks. Both frameworks are made of rules, criteria, and resulting tags, but the criteria types differ for each framework type. Identification frameworks use competitive pattern matching and column name matching to discover data types and tag them. Classification frameworks use tags on the column, neighboring columns, and data source for context and then tag the columns based on that context. Find more information about each framework type below.

Identification frameworks

Identification frameworks run with sensitive data discovery (SDD). They use data patterns to discover data and tag it based on what the data is.

Supported criteria and pattern types

  • Competitive pattern analysis: This criteria is a process that will review all the regex and dictionary patterns within the rules of the framework and search for the pattern with the best fit. In this review, each competitive pattern analysis criteria in the framework competes against each other to find the best and most specific pattern that fits the data. The resulting tags for the best pattern's rule are then applied to the column.

  • Column name: This criteria matches a column name pattern to the column names in the data sources. The rule's resulting tags will be applied to the column where the name is found.

Classification frameworks

Classification frameworks run with the classify service. They determine rule match and criteria fit based on proximity tags and then tag data based on the context it is within.

Supported criteria

  • Match column tag: This criteria applies resulting tags based on specific tags already on the column.
  • Match neighboring column tag: This criteria applies resulting tags based on specific tags on neighboring columns.

Data inventory dashboard

Private Preview

This feature is only available to select accounts.

The data inventory dashboard visualizes information about your organization's data. It presents your entire data corpus within the context of the frameworks you have actively tagging your data with details like when your data was scanned last or how much of the scanned data is relevant to your active frameworks.

In the data inventory dashboard you will see tiles for scanned coverage and the percent of data scanned within a specific time frame. These tiles are referencing data scanned by an identification framework with SDD. To increase the number of your data sources that have been scanned, run SDD.

The next section of the dashboard shows tiles for the compliance frameworks. Within each graph is the separation of columns found containing or not containing the data important to the compliance framework. These graphs update every time classification runs, which will happen from these events.

For information on the frameworks visualized in the dashboard, see the Immuta frameworks reference guide.

Workflow

The Discover workflow involves both identification with SDD and classification:

  1. A user with the GOVERNANCE permission enables SDD and activates classification frameworks.
  2. Users register data in Immuta.
  3. SDD runs:
    1. Immuta generates a SQL query using the identification framework's rules.
    2. That query is executed in the native database.
    3. Immuta receives the query results containing the column name and the matching rules but no raw data values.
    4. SDD applies the resulting tags to the relevant columns.
  4. Classification runs:
    1. The data source's current tags are checked against the framework's rules.
    2. When a matching rule is found, the resulting tags are applied to the relevant columns.
  5. Users with the GOVERNANCE permission or data owners can view the data inventory dashboard with visualizations of their scanned data.

Frequency

This workflow will run when a new data source is manually registered in Immuta or found from schema monitoring. Additionally, SDD alone will run from the following events:

  • A new data source is created.
  • Schema monitoring is enabled, and a new data source is detected.
  • Column detection is enabled, and new columns are detected. Here, SDD will only run on new columns, and no existing tags will be removed or changed.
  • A user manually triggers it from the data source health check menu.
  • A user manually triggers it from the identification frameworks page.
  • A user manually triggers it through the API.

Classification will run from the following events:

  • A framework gets created, updated, or deleted.
  • A tag gets added to or removed from a column manually or by SDD.
  • A tag gets added to a data source.
  • A user manually triggers it from the data source health check menu.
  • A user manually triggers it through the API.

Caveat

  • Customizing classification frameworks currently requires users to use the Immuta API.

Discover section contents

Conceptual guides:

Getting started guide:

How-to guides:

Reference guides: