Encryption and Masking Practices
Encryption of Data at Rest
Immuta captures metadata and stores it in an internal PostgreSQL database. Customers can encrypt the volumes backing the database using an external Key Management Service to ensure that data is encrypted at rest.
Encryption of Data in Transit
To encrypt data in transit, Immuta uses TLS protocol, which is configured by the customer.
Encryption Key Management
Immuta encrypts values with data encryption keys, either those that are system-generated or managed using an external key management service (KMS). Immuta recommends a KMS to encrypt or decrypt data keys and supports the AWS Key Management Service; however, if no KMS is configured, Immuta will generate a data encryption key on a user-defined rollover schedule, using the most recent data key to encrypt new values while preserving old data keys to decrypt old values.
Immuta employs three families of functions in its masking policies:
One-way Hashing: One-way (irreversible) hashing is performed via a salted SHA256 hash. A consistent salt is used for values throughout the data source, so users can count or track the specific values without revealing the true value. Since hashed values are different across data sources, users are unable to join on hashed values. Note: joining on masked values can be enabled in Immuta Projects.
Reversible Masking: For reversible masking, values are encrypted using AES-256 CBC encryption. Encryption is performed using a cell-specific initialization vector. The resulting values can be unmasked by an authorized user.
Reversible Format Preserving Masking: Format preserving masking maintains the format of the data while masking the value, and is achieved by initializing and applying the NIST standard method FF1 at the column level. The resulting values can be unmasked by an authorized user.
Immuta collects and stores the following kinds of metadata in Immuta's Metadata Database for policy enforcement. Further, policy information may be transmitted to data source host systems for enforcement purposes as part of a query, or to enable the host system to perform native enforcement.
Identity Management Information: Usernames, group information, and other kinds of personal identifiers may be stored and referenced for the purposes of performing authentication and access control and may be retained in audit logs. When such information is relevant for access determination under policy, it may be retained as part of the policy definition.
Schema Information: Data source metadata such as schema, column data types, and information about the host.
Fingerprints: When enabled, additional statistical queries made during the health check are distilled into summary statistics, called fingerprints. During this process, statistical query results, data samples (which may contain PII), and the resulting fingerprints are temporarily held in memory by the fingerprint service.
k-Anonymization Policies: At the time of its application, the columns of a k-anonymization policy are queried under a separate fingerprinting process which generates rules enforcing k-anonymity. The results of this query, which may contain PII, are temporarily held in memory by the fingerprint service. The final rules are stored for enforcement.
Randomized Response Policies: If the list of substitution values for a categorical column is not part of the policy specification (e.g., when specified via the API), a list is obtained via query and merged into the policy definition.