Data Processing
To understand how Immuta processes data, it's imperative to understand the purpose of the Immuta components deployed in the Immuta Cloud infrastructure:
Immuta Tenant Metadata Database: The database specific to a specific SaaS tenant that contains the tenant's metadata that powers the core functionality of Immuta, including policy data and attributes about data sources (tags, audit data, etc.).
Immuta Web Service: This component includes the Immuta UI and API and is responsible for all web-based user interaction with Immuta, metadata ingest, and the data fingerprinting process.
Data Categories
Data processed by Immuta falls into one of the following categories. For additional details, click a category to navigate to that section.
Audit logs include details about data access, such as who subscribes to a data source, when they access the data, and the queries they've run.
This data is stored in the tenant's Metadata Database.
This data includes user account data, such as email addresses, names, and entitlements.
This data is stored in the tenant's Metadata Database, unless an organization has opted to use an external identity provider.
This data includes column names, tags, free-text descriptions of columns, and health check results, such as row counts and high cardinality checks. Additionally, this data source metadata may include the schema, column data types, and information about the host.
This data is stored in the tenant's Metadata Database.
This data includes summary statistics regarding changes to data sources, including when policies have been applied, when external views have been created, when sensitive data elements have been added, and when users have enabled checks for new tables through schema monitoring.
This data is stored in the tenant's Metadata Database.
This data includes the metadata (such as usernames, group information, or other kinds of personal identifiers) sent to the Immuta Web Service to determine if a user has access. When such information is relevant for access determination, it may be retained as part of the policy definition.
This data is stored in the tenant's Metadata Database.
This data includes tenant metrics -- statistics about activities occurring within Immuta, such as how many policies, projects, or tags have been created and how many users are authenticated within Immuta -- and user metrics, such as the user and session and event properties (user and session IDs, page views, and clicks).
This data is stored in a single, US-based region.
TCP Connection
Immuta communicates with remote databases over a TCP connection.
Immuta Audit Logs
Audit data includes metadata (e.g., who subscribes to a data source, when they access data, potentially what SQL queries were run, etc.) that is generated by a variety of actions and processes in Immuta. The most common processes are illustrated in the diagram below.
All audit logs flow from the Web Service to the Metadata Database (local to the region) and are stored for 90 days.
Immuta Identity Management Data
This process is only relevant when using an external identity provider service to manage user accounts in Immuta.
The initial Immuta user account is created on the Immuta SaaS tenant, and this data is stored in the tenant's Metadata Database.
A System Administrator configures an external IAM with Immuta.
User account information is collected from the external IAM and stored in the tenant's Metadata Database.
Data Dictionary and Data Source Metadata
This data is processed to support data source creation, health checks, policy enforcement, and dictionary features.
A System Administrator configures the integration in Immuta.
A Data Owner registers data sources from their remote data platform with Immuta. Note: Data Owners can see sample data when editing a data source. However, this action requires the database password, and the small sample of data visible is only displayed in the UI and is not stored in Immuta.
When a data source is created or updated, the Metadata Database pulls in and stores statistics about the data source, including row count and high cardinality calculations.
The data source health check runs daily to ensure existing tables are still valid.
If an external catalog is enabled, the daily health check will pull in data source attributes (e.g., tags and definitions) and store them in the Metadata Database.
Policy Decision Data
Policy decision data is transmitted to ensure end users querying data are limited to the appropriate access as defined by the policies in Immuta.
A user runs a query against data in their environment.
The query is sent to the Immuta Web Service.
The Web Service queries the Metadata Database to obtain the policy definition, which includes data source metadata (tags, column names, etc.) and user entitlements (groups and attributes).
The policy information is transmitted to the remote data system for policy enforcement.
Query results are displayed based on what policy definition was applied.
Sample Raw Data
Fingerprinting Process
In the Snowflake integration, statistical queries made during data source registration are distilled into summary statistics, called fingerprints. Fingerprinting allows Immuta to implement advanced privacy enhancing masking and data policies.
During this process, query results return statistics (not data samples) about the data to Immuta (no PII is included). The fingerprinting process checks for new tables through schema monitoring (when enabled) and captures summary statistics of changes to data sources, including when policies were applied, external views were created, or sensitive data elements were added.
Policy Processes
Sample data is processed when k-anonymization or randomized response policies are applied to Snowflake data sources.
Sample data exists temporarily in memory during the computation.
Raw data is processed for masking, producing either a distinct set of values or aggregated groups of values.
If either of the following policy types targets a column that contains PII, Immuta stores that PII in the Metadata Database in order to enforce the policy:
User Metrics Data
Immuta collects a variety of metrics and details about app usage that is stored in a single US-based region.
Data about activity within the tenant is aggregated nightly.
Aggregates create metrics (the number of policies created, number of users authenticated, number of tags created, etc.). This data is stored in our data warehouse, which resides in a single, US-based region (AWS us-east-1).
Telemetry Data (session ID, length, event properties, page views, etc.) is collected using Segment and Heap.
Last updated
Was this helpful?