When activated by data governors, these templated policies are automatically enforced on data sources that have had relevant tags applied to them by users or Sensitive Data Discovery.
This page outlines the types of templated policies users can manage in Immuta. To learn how to activate a templated policy, navigate to the tutorial.
Deprecation notice
Support for this policy has been deprecated.
HIPAA De-identification requires that
18 direct identifiers are removed from data sources.
Data owners do not have actual knowledge that Data Users could re-identify individuals.
The HIPAA De-identification policy is a Global Policy included in Immuta by default. When combined with Sensitive Data Discovery, this policy automatically applies to relevant data sources. However, to fully comply with HIPAA Safe Harbor, data owners will need to certify that tags on data sources are accurate; after the policy is applied, multiple warnings indicate that certification is required, including a "Policy Certification Required" label on the data source and on the policy. Additionally, owners will receive a notification to certify the policy.
Note: The HIPAA De-identification Policy is staged by default and cannot be edited by any user. However, governors can clone this policy and then edit the clone.
The data owner and data user certifications serve as official acknowledgements that the users and data comply with HIPAA Safe Harbor:
Data Owner Certification: Data owners certify that all 18 identifiers have been correctly tagged and that they have no knowledge that the information in the data sources could be used by Data Users to identify individuals.
Data User Certification: Data Users agree to use the data only for the stated purpose of the project; refrain from sharing that data outside the project; not re-identify or take any steps to re-identify individuals' health information; notify the Project Owner or Governance team in the event that individuals have been identified or could be identified; and refrain from contacting individuals who might be identified.
HIPAA Expert Determination allows data scientists to increase utility of datasets while still complying with strict HIPAA regulations that require a "very low" re-identification risk. It does this through a project by allowing the project owner to adjust the k-anonymization noise across multiple columns to gain utility. To learn more about Expert Determination, see Policy Adjustments and HIPAA Expert Determination in Immuta .
Deprecation notice
Support for this policy has been deprecated.
The CCPA policy is a Global Policy included in Immuta by default. When combined with Sensitive Data Discovery, this policy automatically applies to relevant data sources.
CCPA sets forth two routes to achieve compliance:
businesses processing consumer personal information abide by all applicable restrictions (e.g., purpose restrictions or consumer rights), and/or
businesses transform consumer personal information into de-identified or aggregate data so that restrictions, such as consumer rights, become inapplicable.
Under CCPA, de-identification is successfully performed if data “cannot reasonably identify, relate to, describe, be capable of being associated with, or be linked, directly or indirectly, to a particular consumer,” provided that an organization that uses de-identified information
implements technical safeguards that prohibit re-identification of the consumer to whom the information may pertain,
implements business processes that specifically prohibit re-identification of the information,
implements business processes to prevent inadvertent release of de-identified information, and
makes no attempt to re-identify the information.
Immuta’s CCPA de-identification policy was created to comply with this definition and consists of 4 main components (each of which addresses at least one prong of CCPA's de-identification test):
a self-executing data policy that applies a de-identification technique that serves as a technical safeguard to prohibit re-identification of the consumer.
certifications by the data owner. These serve as an official acknowledgement that the covered business has initially appropriately labeled consumer information and is not aware that the data user is in position to re-identify consumers prior to the re-use of the data. This component is crucial to prevent inadvertent release of de-identified information.
certifications by the data user. These serve as official acknowledgements that the data user is subject to business processes that prohibit re-identification and inadvertent release of de-identified information to third parties.
functionalities to enable real-time monitoring and auditing of query-based access to data. These aim to deter and detect attempts to re-identify.
Note: The language used in certifications can be customized to meet specific needs of customers, such as when customers want to use specific language found in data-sharing agreements.
The data policy is made of four rules.
The first rule ensures that access to data can only happen for two types of use cases: those that require access to de-identified data (Re-identification Prohibited.CCPA
) and those that require access to identifying data (Use Case Outside De-identification
). Data Users are then strictly segmented by use case through attribute-based access control and purpose acknowledgement.
The second rule nulls direct identifiers and undetermined identifiers for Data Users with access to de-identified data.
The third rule generalizes indirect identifiers with k-anonymization so that the re-identifiability probability is always equal to or below 5% for Data Users with access to de-identified data. Note: Immuta has analyzed industry standards and thresholds recommended by statistical methods experts and selected the most restrictive value of 5% for the maximum re-identifiability probability.
The fourth rule applies the first three rules to all data sources containing columns tagged Discovered.Identifier Direct
, Discovered.Identifier Indirect
, or Discovered.Identifier Undetermined
.
Immuta's CCPA policy addresses both both direct and indirect identifiers because robust de-identification requires considering all types of identifying attributes, and the identifiers are masked differently to maximize utility. With this combination of masking techniques, the data re-identification risk (the amount of re-identification possible for each data source) meets CCPA’s de-identification criteria.
Note: The CCPA policy is staged by default and cannot be edited by any user. However, governors can clone this policy and then edit the clone. However, customers will have to check that after the customization the overall re-identification risk is still acceptable.
When paired with Schema Monitoring, this policy masks newly added columns to data sources until data owners review and approve these changes from the Requests tab of their profile page.
Categorical Randomized Response: Categorical values are randomized by replacing a value with some non-zero probability. Not all values are randomized, and the consumer of the data is not told which values are randomized and which ones remain unchanged. Values are replaced by selecting a different value uniformly at random from among all other values. For example, if a randomized response policy were applied to a “state” column, a person’s residency could flip from Maryland to Virginia, which would provide ambiguity to the actual state of residency. This policy is appropriate when obscuring sensitive values such as medical diagnosis or survey responses.
Custom Function: This function uses SQL functions native to the underlying database to transform the values in a column. This can be used in numerous use cases, but notional examples include top-coding to some upper limit, a custom hash function, and string manipulation.
K-Anonymization: Masking through k-anonymization is a distinct policy that can operate over multiple attributes. A k-anonymization policy applies rounding and NULL masking policies over multiple columns so that the columns contain at least “K” records, where K is a positive integer. As a result, attributes will only be disclosed when there is a sufficient number of observations. This policy is appropriate to apply over indirect identifiers, such as zip code, gender, or age. Generally, each of these identifiers is not uniquely linked to an individual, but when combined with other identifiers can be associated with a single person. Applying k-anonymization to these attributes provides the anonymity of crowds so that individual rows are made indistinct from each other, reducing the re-identification risk by making it unclear which record corresponds to a specific person. Immuta supports k-anonymization of text, numeric, and time-based data types.
Mask with Format Preserving Masking: This function masks using a reversible function but does so in a way that the underlying structure of a value is preserved. This means the length and type of a value are maintained. This is appropriate when the masked value should appear in the same format as the underlying value. Examples of this would include social security numbers and credit card numbers where Mask with Format Preserving Masking would return masked values in a format consistent with credit cards or social security numbers, respectively. There is larger overhead with this masking type, and it should really only be used when format is critically valuable, such as situations when an engineer is building an application where downstream systems validate content. In almost all analytical use cases, format should not matter.
Mask with Reversibility: This function masks in a way that an authorized user can “unmask” a value and reveal the value to an authorized user. Masking with Reversibility is appropriate when there is a need to obscure a value while allowing an authorized user to recover the underlying value. All of the same use cases and caveats that apply to Replace with Hashing apply to this function. Reversibly masked fields can leak the length of their contents, so it is important to consider whether or not this may be an attack vector for applications involving its use.
Randomized Response: This function randomizes the displayed value to make the true value uncertain, but maintains some analytic utility. The randomization is applied differently to both categorical and quantitative values. In both cases, the noise can be increased to enhance privacy or reduced to preserve more analytic value.
Datetime and Numeric Randomized Response: Numeric and datetime randomized response apply a tunable, unbiased noise to the nominal value. This noise can obscure the underlying value, but the impact of the noise is reduced in aggregate. This masking type can be applied to sensitive numerical attributes, such as salary, age, or treatment dates.
Replace with Constant: This function replaces any value in a column with a specified value. The underlying data will appear to be a constant. This masking carries the same privacy and utility guarantees as Replace with NULL. Apply this policy to strings that require a specific repeated value.
Replace with Hashing: This function masks the values with an irreversible hash, which is consistent for the same value throughout the data source, so you can count or track the specific values, but not know the true raw value. This is appropriate for cases where the underlying value is sensitive, but there is a need to segment the population. Such attributes could be addresses, time segments, or countries. It is important to note that hashing is susceptible to inference attacks based on prior knowledge of the population distribution. For example, if “state” is hashed, and the dataset is a sample across the United States, then an adversary could assume that the most frequently occurring hash value is California. As such, it's most secure to use the hashing mask on attributes that are evenly distributed across a population.
Replace with Null: This function replaces any value in a column with NULL
. This removes any identifiability from the column and removes all utility of the data. Apply this policy to numeric or text attributes that have a high re-identification risk, but little analytic value (names and personal identifiers).
Replace with REGEX: This function uses a regular expression to replace all or a portion of an attribute. REGEX replacement allows for some groupings to be maintained, while providing greater ambiguity to the disclosed value. This masking technique is useful when the underlying data has some consistent structure, the remasked underlying data represents some re-identification risk, and a regular expression can be used to mask the underlying data to be less identifiable.
Rounding: Immuta’s rounding policy reduces, rounds, or truncates numeric or datetime values to a fixed precision. This policy is appropriate when it is important to maintain analytic value of a quantity, but not at its native precision.
Date/Time Rounding: This policy truncates the precision of a datetime value to a user-defined precision. minute
, hour
, day
, months
, and year
are the supported precisions.
Numeric Rounding: This policy maps the nominal value to the ceiling of some specified bandwidth. Immuta has a recommended bandwidth based on the Freedman-Diaconis rule.
The masking functions described above can be implemented in a variety of use cases. Use the table below to determine the circumstance under which a function should be used.
Applicable to Numeric Data: The masking function can be applied to numeric values.
Column-Value Determinism: Repeated values in the same column are masked with the same output.
Introduces NULLs: The masking function may, under normal or irregular circumstances, return NULL values.
Performance: How performant the masking function will be (10/10 being the best).
Preserves Appearance: The output masked value resembles the valid column values. For example, a masking function would output phone numbers when given phone numbers. Here, NULL values are not counted against this property.
Preserves Averages: The average of the masked values (avg(mask(v))
) will be near the average of the values in the clear (avg(v)
).
Suitable for De-Identification: The masking function can be used to obscure record identifiers, hiding data subject identities and preventing future linking against other identified data.
Provides Deniability of Record Content: A (possibly identified) person can plausibly attribute the appearance of the value to the masking function. This is a desirable property of masking functions that retain analytic utility, as such functions must necessarily leak information about the original value. Fields masked with these functions provide strong protections against value inference attacks.
Preserves Equality and Grouping: Each value will be masked to the same value consistently without colliding with others. Therefore, equal values remain equal under masking while unequal values remain unequal, preserving equality. This implies that counting statistics are also preserved.
Preserves Message Length: The length of the masked value is equal to the length of the original value.
Preserves Range Statistics: The number of data values falling in a particular range is preserved. For strings, this can be interpreted as the number of strings falling between any two values by alphabetical order.
Preserves Value Locality: The output will remain near the input, which may be important for analytic purposes.
Reversible: Qualified individuals can reveal the original input value.
Masking Policy Support by Integration
Since Global Policies can apply masking policies across multiple different databases at once, if an unsupported masking policy is applied to a column, Immuta will revert to NULLing that column.
See the integration support matrix for an outline of masking policies supported by each integration.
Immuta policies restrict access to data and apply to data sources at the local or global level:
Local policies refer to specific tables.
Global policies refer to tags instead of specific tables, allowing you to build a single policy that impacts a large percentage of your data rather than building separate local policies for each table.
Consider the following local and global policy examples:
Local policy example: Mask using hashing the values in the columns ssn
, last_name
, and home_address
.
Global policy example: Mask using hashing values in columns tagged PII
on all data sources.
In this scenario, the local policy would mask the sensitive columns specified in the data policy on the single data source it was created for. If only using local policies, a data owner or governor would have to write that policy for every data source on which they wanted to mask sensitive data. The second policy would mask any column tagged PII
on all data sources that had the PII
tag applied to a column. Because this global policy automatically applies to those qualifying data sources, that policy only needs to be written once.
Consequently, global policies are the best practice for using Immuta: they provide the most scalability and manageability of access control.
For details about subscription and data policy types, see the subscription policies reference guide or data policies reference guide.
Best Practice: Access Controls
In most cases, the goal is to share as much data as possible while still being compliant with privacy regulations. Immuta recommends a scale of wide subscription policies and specific data policies to give as much access as possible.
Global policies can be authored by users with GOVERNANCE permission or data owners. Data owners' global policies will only attach to data sources they own (also called restricted global policies), even if the tags their policies target go beyond their data sources.
Immuta policies compare data and user attributes at query-time to determine whether or not the querying user should access the data.
Data attributes are information about the data within the data source. These attributes are then matched against policy logic to determine if a row or object should be visible to a specific user. This matching is usually done between the data attribute and the user attribute.
For example, consider the policy below:
Only show rows where user is a member of a group that matches the value in the column tagged Department
.
The data attribute (the value in the column tagged Department
) is matched against the user attribute (their group) to determine whether or not rows will be visible to the user accessing the data.
User attributes are values connected to specific Immuta user accounts and are used in policies to restrict access to data. These attributes fall into three categories:
attributes: Attributes are custom tags that are applied to users to restrict what data users can see. Attributes can be added manually or mapped from an external catalog.
groups: Groups allow System Administrators to group sets of users together. Users can belong to any number of groups and can be added or removed from groups at any time. Like attributes, groups can be used to restrict what data a set of users has access to.
permissions: Permissions control what actions a user can take in Immuta, both API and UI actions. Permissions can be added and removed from user accounts by a System Administrator (an Immuta user with the USER_ADMIN
permission); however, the permissions themselves are managed by Immuta, and the actions associated with the permissions cannot be altered.
The table below outlines the various states of global policies in Immuta.
Note: Policies that contain the circumstance When selected by data owners cannot be staged.
Data owners who are not governors can write restricted global policies for data sources that they own. With this feature, data owners have higher-level policy controls and can write and enforce policies on multiple data sources simultaneously, eliminating the need to write redundant local policies on data sources.
Unlike global policies, the application of these policies is restricted to the data sources owned by the users or groups specified in the policy and will change as users' ownerships change.
Data owners and governors can access a data source's policies on the policies tab of the data source. There, these users can view existing policies or apply new policies to the data source with the following features:
Apply Existing Policies Button: By clicking this button in the top right corner of the policies tab, users can search for and apply existing policies to the data source from other data sources or global policies.
Subscription Policy Builder: In this section, users can determine who may access the data source. If a subscription policy has already been set by a global policy, a notification and a Disable button appear at the bottom of this section. Users can click the Disable button to make changes to the subscription policy.
Data Policy Builder: In this section, users can create policies to enforce privacy controls on the data source and see a list of data policies currently applied to the data source.
All changes made to policies by data owners or governors appear in a collapsible Activity panel on the right side of the screen.
The information recorded in the activity panel includes when the data source was created, the name and type of the policy, when the policy was applied or changed, and if the policy is in conflict on the data source. Additionally, global policy changes are identified by the governance icon; all other updates are labeled by the data sources icon.
Policy state | Enforcement | Description |
---|---|---|
Active policies
Enforced
If policies are edited in this state, the changes will be immediately enforced on data sources when the changes are saved.
Deleted policies
Not enforced
Once a policy has been deleted, it cannot be recovered or reactivated.
Disabled policies
Not enforced
Data owners or governors can place a policy in this state at the local level for a specific data source. Although this is similar to the staged policy state, this policy will still be enforced on other data sources after it is disabled for a specific data source.
Staged policies
Not enforced
This state is useful when regularly editing and reviewing policies. This state also allows you to lift a policy's enforcement without deleting the policy so that it can easily be re-enforced. See Clone, Activate, or Stage a Global Policy for a tutorial.