Terminology: Local Region
The Local Region is the customer's operating region, which determines where an Immuta tenant is deployed and the Immuta Metadata Database lives. Immuta SaaS can deploy in these AWS regions.
To understand how Immuta processes data, it's imperative to understand the purpose of the Immuta components (illustrated in the diagram below) deployed in the Immuta Cloud infrastructure:
Fingerprint Service: When enabled, additional statistical queries made during the health check are distilled into summary statistics, called fingerprints. During this process, statistical query results and data samples (which may contain PII) are temporarily held in memory by the Fingerprint Service.
Immuta Tenant Metadata Database: The database specific to a customer's tenant that contains the tenant's metadata that powers the core functionality of Immuta, including policy data and attributes about data sources (tags, audit data, etc.).
Immuta Web Service: This component includes the Immuta UI and API and is responsible for all web-based user interaction with Immuta, metadata ingest, and the data fingerprinting process.
Immuta tenants are localized to the customer
The Immuta tenants and its components (Metadata Database, Fingerprint Service, and Web Service) are localized to the customer.
Data processed by Immuta falls into one of the following categories. For additional details, click a category to navigate to that section.
Audit logs include details about data access, such as who subscribes to a data source, when they access the data, and the queries they've run.
This data is stored in the tenant's Metadata Database.
This data includes user account data, such as email addresses, names, and entitlements.
This data is stored in the tenant's Metadata Database, unless a customer has opted to use an external identity provider.
This data includes column names, tags, free-text descriptions of columns, and health check results, such as row counts and high cardinality checks. Additionally, this data source metadata may include the schema, column data types, and information about the host.
This data is stored in the tenant's Metadata Database.
This data includes summary statistics regarding changes to data sources, including when policies have been applied, when external views have been created, when sensitive data elements have been added, and when users have enabled checks for new tables through schema monitoring.
This data is stored in the tenant's Metadata Database.
This data includes the metadata (such as usernames, group information, or other kinds of personal identifiers) sent to the Immuta Web Service to determine if a user has access. When such information is relevant for access determination, it may be retained as part of the policy definition.
This data is stored in the tenant's Metadata Database.
Data that is processed and aggregated/reduced as a part of the Immuta fingerprinting process and specific policy processes.
Data exists temporarily in memory in the Fingerprint Service.
This data includes tenant metrics -- statistics about activities occurring within Immuta, such as how many policies, projects, or tags have been created and how many users are authenticated within Immuta -- and user metrics, such as the user and session and event properties (user and session IDs, page views, and clicks).
This data is stored in a single, US-based region.
Immuta communicates with remote databases over a TCP connection.
Audit data includes metadata (e.g., who subscribes to a data source, when they access data, potentially what SQL queries were run, etc.) that is generated by a variety of actions and processes in Immuta. The most common processes are illustrated in the diagram below.
All audit logs flow from the Web Service to the Metadata Database (local to the customer's region) and are stored for 90 days.
This process is only relevant to customers using an external identity provider service to manage user accounts in Immuta.
The initial Immuta user account is created on the Immuta SaaS tenant, and this data is stored in the tenant's Metadata Database.
A System Administrator configures an external IAM with Immuta.
User account information is collected from the external IAM and stored in the tenant's Metadata Database.
This data is processed to support data source creation, health checks, policy enforcement, and dictionary features.
A System Administrator configures the integration in Immuta.
A Data Owner registers data sources from their remote data platform with Immuta. Note: Data Owners can see sample data when editing a data source. However, this action requires the database password, and the small sample of data visible is only displayed in the UI and is not stored in Immuta.
When a data source is created or updated, the Metadata Database pulls in and stores statistics about the data source, including row count and high cardinality calculations.
The data source health check runs daily to ensure existing tables are still valid.
If an external catalog is enabled, the daily health check will pull in data source attributes (e.g., tags and definitions) and store them in the Metadata Database.
Policy decision data is transmitted to ensure end users querying data are limited to the appropriate access as defined by the policies in Immuta.
Spark plugin
In the Databricks Spark integration, the user, data source information, and query are sent to Immuta through the Spark Plugin to determine what policies need to be applied while the query is being processed. Data that travels from Immuta to the Databricks cluster could include
user attributes.
what columns to mask.
the entire predicate itself (for row-level policies).
A user runs a query against data in their environment.
The query is sent to the Immuta Web Service.
The Web Service queries the Metadata Database to obtain the policy definition, which includes data source metadata (tags, column names, etc.) and user entitlements (groups and attributes).
The policy information is transmitted to the remote data system for native policy enforcement.
Query results are displayed based on what policy definition was applied.
Sample data is processed and aggregated or reduced during Immuta's fingerprinting process and specific policy processes. Note: Data Owners can see sample data when editing a data source. However, this action requires the database password, and the small sample of data visible is only displayed in the UI and is not stored in Immuta.
When enabled, statistical queries made during data source registration are distilled into summary statistics, called fingerprints. Fingerprinting allows Immuta to implement advanced privacy enhancing masking and data policies.
During this process, statistical query results and data samples (which may contain PII) are temporarily held in memory by the Fingerprint Service only for the amount of time it takes to calculate the statistics needed. For Snowflake, no data sample is needed, and only statistics about the data are returned to Immuta (no PII).
The fingerprinting process checks for new tables through schema monitoring (when enabled) and captures summary statistics of changes to data sources, including when policies were applied, external views were created, or sensitive data elements were added.
Immuta does not sample data for row-level policies
Immuta does not sample data for row-level policies; Immuta only pulls samples of data to determine if a column is a candidate for randomized response and aggregates of user-defined cohorts for k-anonymization. Both datasets only exist in memory during the computation.
Sample data is processed when k-anonymization or randomized response policies are applied to data sources.
Sample data exists temporarily in memory in the Fingerprint Service during the computation.
k-Anonymization Policies: At the time of its application, the columns of a k-anonymization policy are queried under a separate fingerprinting process that generates rules enforcing k-anonymity. The results of this query, which may contain PII, are temporarily held in memory by the Fingerprint Service. The final rules are stored in the Metadata Database as the policy definition for enforcement. Immuta requires that you opt in to use this masking policy type.
Randomized Response Policies: If the list of substitution values for a categorical column is not part of the policy specification (e.g., when specified via the API), a list is obtained via query and merged into the policy definition in the Metadata Database. Immuta requires that you opt in to use this masking policy type.
Raw data is processed for masking, producing either a distinct set of values or aggregated groups of values.
Immuta collects a variety of metrics and details about app usage that is stored in a single US-based region.
Data about activity within the tenant is aggregated nightly.
Aggregates create metrics (the number of policies created, number of users authenticated, number of tags created, etc.). This data is stored in our data warehouse, which resides in a single, US-based region (AWS us-east-1).
Telemetry Data (session ID, length, event properties, page views, etc.) is collected using Segment and Heap.
SaaS: This deployment option provides data access control through Immuta's integrations with automatic software updates and no infrastructure or maintenance costs.
Self-Managed: Immuta supports self-managed deployments for users who store their data on-premises or in private clouds, such as VPC. Users can connect to on-premises data sources and cloud data platforms that run on Amazon Web Services, Microsoft Azure, and Google Cloud Platform.
Immuta SaaS tenants are deployed into global segments, which are groups of cloud provider regions. Global segments are designed to help customers with data locality restrictions meet their compliance needs, as data stored for a given tenant does not leave its global segment. Each global segment is built using multiple regions for disaster recovery purposes.
The IP addresses below must be authorized in your network firewall configuration to allow Immuta to connect.
3.106.147.24
13.114.123.176
13.238.102.94
13.55.39.43
13.55.159.66
18.176.33.125
35.74.60.182
35.79.88.51
52.63.167.163
52.68.224.114
54.79.122.121
52.196.249.32
3.9.86.222
3.72.210.214
3.74.143.208
3.126.62.66
13.41.25.70
18.158.34.136
18.169.27.181
18.169.63.243
18.169.76.225
18.195.8.208
35.158.87.229
35.179.66.5
52.16.117.91
52.211.82.12
54.171.9.121
63.34.27.228
63.34.155.116
108.129.45.8
34.192.38.214
34.223.179.107
35.155.223.131
35.163.162.139
44.205.48.68
44.237.62.198
52.2.174.14
54.68.252.84
54.88.42.98
54.205.215.251
54.225.122.15
100.20.168.64
Asia Pacific (AP)
Sydney
Tokyo
Europe (EU)
Frankfurt
Ireland
London
North America (NA)
N. Virginia
Oregon
Immuta captures metadata and stores it in an internal PostgreSQL database (Metadata Database). Customers can encrypt the volumes backing the database using an external Key Management Service to ensure that data is encrypted at rest.
To encrypt data in transit, Immuta uses TLS protocol, which is configured by the customer.
Immuta encrypts values with data encryption keys, either those that are system-generated or managed using an external key management service (KMS). Immuta recommends a KMS to encrypt or decrypt data keys and supports the AWS Key Management Service; however, if no KMS is configured, Immuta will generate a data encryption key on a user-defined rollover schedule, using the most recent data key to encrypt new values while preserving old data keys to decrypt old values.
Immuta employs three families of functions in its masking policies:
One-way Hashing: One-way (irreversible) hashing is performed via a salted SHA256 hash. A consistent salt is used for values throughout the data source, so users can count or track the specific values without revealing the true value. Since hashed values are different across data sources, users are unable to join on hashed values. Note: Joining on masked values can be enabled in Immuta Projects.
Reversible Masking: For reversible masking, values are encrypted using AES-256 CBC encryption. Encryption is performed using a cell-specific initialization vector. The resulting values can be unmasked by an authorized user. Note that this is dynamic encryption of individual fields as results are streamed to the querying system; Immuta is not modifying records in the data store.
Reversible Format Preserving Masking: Format preserving masking maintains the format of the data while masking the value and is achieved by initializing and applying the NIST standard method FF1 at the column level. The resulting values can be unmasked by an authorized user.