Encryption and Hashing Practices
Encryption Data at Rest
Immuta captures metadata and stores it in an internal PostgreSQL database. Customers can encrypt the volumes backing the database using an external Key Management Service to ensure that data is encrypted at rest.
Encryption Data in Transit
To encrypt data in transit, Immuta uses TLS protocol, which is configured by the customer.
Encryption Key Management
Immuta encrypts values with data encryption keys, either those that are system-generated or managed using an external key management service (KMS). Immuta recommends a KMS to encrypt or decrypt data keys and supports the AWS Key Management Service; however, if no KMS is configured, Immuta will generate a data encryption key on a user-defined rollover schedule, using the most recent data key to encrypt new values while preserving old data keys to decrypt old values.
Immuta has three families of hashing functions:
Irreversible Hashing Function: The irreversible hashing function uses a sha256 hash, which is consistent for the same value throughout the data source, so users can count or track the specific values, but not know the true raw value. However, hashed values are different across data sources, so users are not able to join on hashed values. Note: joining on masked values can be enabled in Immuta Projects.
Reversible Hashing Function: For the reversible hashing function, values are encrypted using AES-256 CBC encryption. Encryption is performed using an internal or external encryption key devised by a data-source- and cell-specific initialization vector. The resulting values can be unmasked by an authorized user.
Reversible Format Preserving Hashing Function: For the reversible format preserving hashing function, values are encrypted using NIST standard FF1 encryption methods. Format preserving masking maintains the underlying format of the data, but obscures the nominal value, using either an internal or external encryption key devised by a data-source- and cell-specific initialization vector. This method is identical to the method above aside from the preservation of the data format.
Metadata, including schema, column data types, and information about the host, are collected and distilled to a series of summary statistics (called fingerprints) that are stored in Immuta's Metadata Database to drive policies.
To generate the fingerprint, the results of an initial SQL query, which is run against the data source during a health check, exist in memory within the fingerprint container. Immuta also pulls a sample of data from the data source through a Postgres proxy (called the Immuta Query Engine) and it lives, temporarily, in the fingerprint container.
Similar to the fingerprint process to generate k-anonymization policies Immuta executes SQL queries through a Postgres proxy (called the Immuta Query Engine). The results of this query, which contain some PII, exist temporarily in memory as Immuta generates the rules required to enforce K-anonymity. Once these rules are generated, Immuta pushes the rules back to the Metadata Database; these rules are then stored here and inserted into subsequent SQL queries to enforce k-anonymity.