Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
In some cases, two conflicting global data policies may apply to a single data source. When this happens, the policy containing a tag deeper in the hierarchy will apply to the data source to resolve the conflict.
Consider the following global data policies created by a data governor:
Data Policy 1: Mask columns tagged
PII
by making null for everyone on data sources with columns taggedPII
Data Policy 2: Mask columns tagged
PII.SSN
using hashing for everyone on data sources with columns taggedPII.SSN
If a Data Owner creates a data source and applies the PII.SSN
tag, both of these global data policies will apply. Instead of having a conflict, the policy containing a deeper tag in the hierarchy will apply:
In this example, Data Policy 2 cannot be applied to the data source. If data owners wanted to use Data Policy 2 on the data source instead, they would need to disable Data Policy 1.
Once enabled on a data source, global data policies can be edited and disabled by data owners.
The Immuta policy builder allows you to use custom functions that reference important Immuta metadata from within your where clause. These custom functions can be seen as utilities that help you create policies easier. Using the Immuta Policy Builder, you can include these functions in your policy queries by choosing where in the sub-action drop-down menu.
@attributeValuesContains()
FunctionThis function returns true
for a given row if the provided column evaluates to an attribute value for which the querying user has a corresponding attribute value. This function requires two arguments and accepts no more than three arguments.
@columnTagged()
FunctionThis function returns the column name with the specified tag.
If this function is used in a Global Policy and the tag doesn't exist on a data source, the policy will not be applied.
@groupsContains()
FunctionThis function returns true
for a given row if the provided column evaluates to a group to which the querying user belongs. This function requires at least one argument.
@hasAttribute()
FunctionThis function returns a boolean indicating if the current user has the specified attribute name and value combination. If the specified attribute name or attribute value has a single quote, you will need to escape it using a \'\'
expression within a custom WHERE
policy.
@iam
FunctionThis function returns the IAM ID for the current user.
None.
@interpolatedComparison()
FunctionDeprecation notice
Support for this function has been deprecated.
The Interpolated Comparison function is perhaps the most complex custom WHERE clause function. It essentially generates chained WHERE clauses. As opposed to the rest of the custom WHERE
clause functions, its output is not a list of values or a boolean, but rather the comparison itself.
The @interpolatedComparison() function does not work with Immuta's database integrations.
¹ A custom function will not accept a placeholder argument when used inside @interpolatedComparison
.
The following invocation of interpolatedComparison()
:
WHERE @interpolatedComparison("group", "=", "upper('##')", "##", @groupsContains, "OR")
is equivalent to writing the following SQL WHERE
clause, given that a user belongs to two groups called founders and engineers:
WHERE ((group = upper('founders')) or (group = upper('engineers')))
@isInGroups()
FunctionThis function returns a boolean indicating if the current user is a member of all of the specified groups. If any of the specified groups has a single quote, you will need to escape it using a \'\'
expression within a custom WHERE
policy.
@isUsingPurpose()
FunctionThis function returns a boolean indicating if the current user is using the specified purpose. If the specified purpose has a single quote, you will need to escape it using a \'\'
expression within a custom WHERE
policy.
@purposesContains()
FunctionThis function returns true
for a given row if the provided column evaluates to a purpose under which the querying user is currently acting. This function requires at least one argument and accepts no more than two arguments.
@username
FunctionThis function returns the current user's user name.
None.
# | Parameter | Type | Required | Description |
---|---|---|---|---|
# | Parameter | Type | Required | Description |
---|---|---|---|---|
# | Parameter | Type | Required | Description |
---|---|---|---|---|
# | Parameter | Type | Required | Description |
---|---|---|---|---|
# | Parameter | Type | Required | Description |
---|---|---|---|---|
# | Parameter | Type | Required | Description |
---|---|---|---|---|
# | Parameter | Type | Required | Description |
---|---|---|---|---|
# | Parameter | Type | Required | Description |
---|---|---|---|---|
1
Attribute Name
String
Required
The name of the attribute to retrieve values for
2
Column Name/SQL Expression
String
Required
The column that contains the value to match the attribute key against
3
Placeholder
String
Optional
A placeholder in case the list of values is empty
1
Tag Name
String
Required
The name of the tag
1
Column Name/SQL Expression
String
Required
The column that contains the value to match the group against
2
Placeholder
String
Optional
A placeholder in case the list of values is empty
1
Attribute Name
String
Required
The name of the attribute
2
Attribute Value
String
Required
The value to correspond with the attribute name
1
Column Name
String
Required
The name of a column in the remote database to use in the left-hand side of the clauses
2
Comparison Operator
String
Required
The comparison operator to use to compare the column and array value
3
Literal Template
String
Required
A necessary SQL logic and the template string which will be replaced with an array
4
Template String to Replace
String
Required
The string to replace in Literal Template with the array value
5
Custom function
Function
Required
A tagged function¹. Valid functions: @groupsContains
, @purposesContains
and @attributeValuesContains
6
Logical (Chain) Operator
String
Required
The logical operator used to link each generated clause (AND
, OR
, etc.)
1
Group names
Array (String)
Required
A list of group names, e.g. groups('group_a', 'group_b', 'group_c')
1
Purpose
String
Required
The name of the purpose to check the user against
1
Column Name/SQL Expression
String/Expression
Required
The column that contains the value to match the purpose against
2
Placeholder
String
Optional
A placeholder in case the list of values is empty
Private preview
This feature is only available to select accounts. Contact your Immuta representative to enable this feature.
Orchestrated masking policies (OMP) reduces conflicts between masking policies that apply to a single column, allowing policies to scale more effectively across your organization. Furthermore, OMP fosters distributed data stewardship, empowering policy authors who share responsibility of a data set to protect it while allowing data consumers acting under various roles or purposes to access the data.
When multiple masking policies apply to a column, Immuta combines the exception conditions of the masking policy so that data subscribers can access the data when they satisfy one of those exception conditions. Multiple masking policies will be enforced on a column if the following conditions are true:
Policies use the same masking type.
Policies use the for everyone except
condition.
Databricks Spark or Starburst (Trino) integration
OMP supports the following masking types:
Constant
Hashing
Format preserving masking
Null
Regex
Rounding
Governors can apply policies to all columns in a data source or target specific columns with tags or a regular expression. Without orchestrated masking policies enabled, when multiple global policies apply to the same columns, Immuta could only apply one of those policies.
Consider the following example to examine how policies behaved when one tag is used in two different policies:
Mask PII Global Policy 1: Mask using hashing the value in columns tagged email
except when user is acting under the purpose Email Campaign
.
Mask PII Global Policy 2: Mask using hashing the value in columns tagged email
except when user is acting under purpose Marketing
.
For columns tagged email
, only one of these policies is enforced. The Mask PII Global Policy 2 is not applied to the data source, so Immuta is not enforcing the masking policy properly for users who should be able to see emails because they are acting under the Marketing purpose.
Consider the following example where multiple masking policies apply to columns that have multiple tags, resulting in one policy applying:
Global Policy 3: Mask using hashing the value in columns tagged Employee Data
unless users are acting under the purpose Retention Analysis
.
Global Policy 4: Mask using hashing the value in columns tagged HR Data
unless users are acting under the purpose Employee Satisfaction Survey
.
If a column is tagged Employee Data
and HR Data
, Immuta will only apply one of the policies.
With orchestrated masking policies, Immuta applies multiple global masking policies that apply to a single column by combining the policy exceptions with OR. For these policies to combine, the masking type must be identical and the policy must use the for everyone except condition.
Consider the following example, both of these policies will apply to the data source:
Mask PII Global Policy 1: Mask using hashing the value in columns tagged email
except when user is acting under the purpose Email Campaign
.
Mask PII Global Policy 2: Mask using hashing the value in columns tagged email
except when user is acting under purpose Marketing
.
Users acting under the purpose Marketing
or Email Campaign
will be able to see emails in the clear.
However, in the following example, only one of these policies will apply to the data source because one masks using a constant and the other masks using hashing:
Global Policy 5: Mask using the constant REDACTED
the value in columns tagged Employee Data
unless users are acting under the purpose Retention Analysis
.
Global Policy 6: Mask using hashing the value in columns tagged HR Data
unless users are acting under the purpose Employee Satisfaction Survey
.
No UI enhancements were made in this release. Multiple masking policies applied to the same column are visible on a data source, but there is no indication that the exceptions are combined with OR
.
Masking types must match exactly for the policies to be combined. For example, both policies must mask using rounding.
Existing policies will not automatically migrate to the new policy logic when you enable the feature. To re-compute existing policies with the new logic, you must manually trigger global policy changes by staging and re-enabling each policy.
Deprecation notice
Support for this feature has been deprecated.
Use Deterministic IVs/Salt
Use deterministic IVs/salt to ensure the same value is masked consistently throughout the data, as Immuta always pushes down the masked version of the literal when the querying user is exempt from the policy.
Immuta can make requests to your External Masking service with one of two authentication methods:
Username and password authentication: Immuta can send requests with a username and a password in the Authorization
HTTP header. In this case, your service will need to be able to parse a Basic Authorization Header and validate the credentials sent with it.
PKI Certificate: Immuta can send requests using a CA certificate, a certificate, and a key.
Alternatively, Immuta can make unauthenticated requests to your REST masking service. This is recommended only if you have other security measures in place (e.g., if the service is in an isolated network that's reachable only by your Immuta environment.)
The unmask
action allows Immuta to build predicates that can be used to query data that is consistently masked at rest in the remote database; it does not dynamically mask data at query time.
To dynamically mask data, use Immuta’s standard masking policies.
This endpoint accepts a set of values and a directive to either mask or unmask them.
Your service will need to parse and process the following body parameters:
Below is an example request payload to mask values in the ssn
and ccn
columns:
Below is an example request payload to unmask values in the ssn
and ccn
columns:
Your service will need to return a map of values that corresponds to the columns and values that were specified in the request. It is important that your service returns the same column keys and that the position of each masked/unmasked value in your response corresponds to the masked/unmasked value from the request.
For example, the following request
could return the following body:
Notice that both ssn
and ccn
columns are present and that each of them contains the exact number of values specified in the request. Immuta will fail to validate responses to its request under the following circumstances:
The response contains column keys that were not present in the request.
The response is missing column keys that were present in the request.
The response doesn't contain the exact number of values for each of the corresponding column keys in the request.
Below are some very simplistic implementation examples of a service with mask()
and unmask()
functions:
For example, data governors could add a custom certification that states that data owners must verify that tags have been added correctly to their data sources before certifying the policy.
When a global data policy with a custom certification is cloned, the certification is also cloned. If the user who clones the policy and custom certification is not a governor, the policy will only be applied to data sources that user owns.
Once a subscription policy is applied to a data source, Immuta users must be subscribed to that data source to access the data. The subscription policy determines who can request access and has one of four possible restriction levels:
Anyone: Users will automatically be granted access (least restricted).
Anyone who asks (and is approved): Users will need to request access and be granted permission by the configured approvers (moderately restricted).
Users with specific groups or attributes: Only users with the specified groups/attributes will be able to see the data source and subscribe (moderately restricted). This restriction type is referred to as an attribute-based access control (ABAC) global subscription policy throughout this page.
Individual users you select: The data source will not appear in search results; data owners must manually add/remove users (most restricted).
In some cases, multiple global subscription policies created by a data governor may apply to a single data source.
Anyone
Anyone who asks (and is approved)
Individual users you select
When such conflicts occur, data owners can manually choose which policy will apply. To do this the data owner must
Disable the applied global subscription policy in the policies tab on a data source.
Provide a reason the global policy should be disabled.
Select which conflicting global subscription policy they want to apply.
You can build ABAC global subscription policies in Immuta by selecting the users with specific groups or attributes restriction level. These policies will automatically subscribe users who possess the user entitlements specified in the policy.
In some cases, multiple ABAC global policies may apply to a single data source. Rather than allowing the two policies to conflict, Immuta combines the conditions of the subscription policies.
When creating a users with specific groups or attributes global policy, data governors select whether the global subscription policy should be
Always Required: Users must meet all the conditions outlined in each policy to get access (i.e., the conditions of the policies are combined with AND
).
Share Responsibility: Users need to meet the condition of at least one policy that applies (i.e., the conditions of the policies are combined with OR
).
Consider the following global subscription policies created by a data governor:
Policy 1: (Always Required) Allow users to subscribe to the data source when user is a member of group HR; otherwise, allow users to subscribe when approved by an Owner of the data source.
Policy 2: (Shared Responsibility) Allow users to subscribe to the data source when user is a member of group Analytics; otherwise, allow users to subscribe when approved by anyone with permission Governance.
Policy 3: (Shared Responsibility) Allow users to subscribe to the data source when user has attribute Office Location Ohio; otherwise, allow users to subscribe when approved by anyone with permission Audit.
If a data owner creates a data source and all of these policies apply, the subscription policies are combined, so the subscribing user must meet the requirements of the Always Required policy and the requirements of at least one of the Shared Responsibility policies to get access to the data source.
By default, users must meet all the conditions outlined in each global subscription policy that has been combined on a data source to get access (i.e., the conditions of the policies are combined with AND
). However, governors can opt to check the Shared Responsibility box if they would like users to meet the condition of at least one policy that applies (i.e., the conditions of the policies are combined with OR
).
Once enabled on a data source, combined global subscription policies can be edited and disabled by data owners.
Users can create more complex policies using functions and variables in the advanced DSL policy builder than the subscription policy builder allows.
Users can specify more than one level of criteria for the ABAC global subscription policy. Specifically, users can combine manual approvals with an ABAC subscription policy.
After a user selects Users with Specific Groups/Attributes in the global subscription policy builder, they can also enable Request Approval to Access. Selecting this option allows users to request access to a data source and be manually approved by a specified user, even if the requesting user does not meet the group or attribute conditions in the policy. By allowing an approval workflow as an alternative method of access if a user does not meet the ABAC conditions, this feature can reduce the number of policies required and allow more flexibility in approval workflows.
Manually added users will now see data regardless of meeting policy conditions.
In previous versions of Immuta, users who had been manually subscribed to a data source could not see any data. Now, these manually added users will see data even though they don't meet the ABAC conditions in the policy. Governors can change the behavior by switching the subscription policy to auto-subscribe (which removes any users who don't meet the subscription policy) or by adding a data policy that redacts rows for users who do not have the groups or attributes specified in the subscription policy.
Beyond Request Approval to Access, these options are also available:
Allow Data Source Discovery: Users can still see that this data source exists in Immuta, even if they do not have these attributes and groups. This option will automatically be selected and locked if users select Request Approval to Access.
Note: When these policies merge, if one or more policies do not have discovery enabled, then approvals will be removed from the final merged policy.
Require Manual Subscription: Users will not be automatically subscribed to the data source. They must manually subscribe to gain access.
Governors must also select how this policy should be merged with other policies that apply to the same data source:
Always Required: Users must always meet this condition, no matter what other policies apply (it will be AND'ed with other policies that apply to the data source) or
Share Responsibility: Users need to meet the conditions in this policy OR another share responsibility policy that applies to the data source. (It will be OR'ed with other shared responsibility policies that apply.)
Once a user is subscribed to a data source, the data policies that are applied to that data source determine what data the user sees.
For all data policies, you must establish the conditions for which they will be enforced. Immuta allows you to append multiple conditions to the data. Those conditions are based on user attributes and groups (which can come from multiple identity management systems and applied as conditions in the same policy), or purposes they are acting under through Immuta projects.
Conditions can be directed as exclusionary or inclusionary, depending on the policy that's being enforced:
exclusionary condition example: Mask using hashing values in columns tagged PII
on all data sources for everyone except users in the group AUDIT
.
Certain policies are not supported, or supported with caveats*, depending on the integration:
*Supported with Caveats:
On Databricks data sources, joins will not be allowed on data protected with replace with NULL/constant policies.
On Trino data sources, the Immuta functions @iam
and @interpolatedComparison
for WHERE clause policies can block the creation of views.
For example, governors could mask values using hashing for users acting under a specified purpose while masking those same values by making null for everyone else who accesses the data.
This variation can be created by selecting for everyone who when available from the condition dropdown menus and then completing the Otherwise clause.
For example, if the purpose Research
included Marketing
, Product
, and Onboarding
as sub-purposes, a governor could write the following global policy:
Limit usage to purpose(s) Research for everyone on data sources tagged PHI.
This hierarchy allows you to create this as a single purpose instead of creating separate purposes, which must then each be added to policies as they evolve.
Now, any user acting under the purpose or sub-purpose of Research
- whether Research.Marketing
or Research.Onboarding
- will meet the criteria of this policy. Consequently, purpose hierarchies eliminate the need for a governor to rewrite these global policies when sub-purposes are added or removed. Furthermore, if new projects with new Research purposes are added, for example, the relevant global policy will automatically be enforced.
Masking policies hide values in data, providing various levels of utility while still preserving privacy. In order to create masking policies on object-backed data sources, you must create data dictionary entries and the data format must be either, csv, tsv, or json.
This policy masks the values with an irreversible sha256 hash, which is consistent for the same value throughout the data source, so you can count or track the specific values, but not know the true raw value.
This policy makes values null, removing any utility of the data the policy applies to.
With this policy, you can replace the values with the same constant value you choose, such as 'Redacted', removing any utility of that data.
This policy is similar to replacing with a constant, but it provides more utility because you can retain portions of the true value. For example, the following regex rule would mask the final digits of an IP address:
Mask using a regex
\d+$
the value in the columnsip_address
for everyone.
In this case, the regular expression \d+$
\d
matches a digit (equal to [0-9])
+
Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
$
asserts position at the end of the string, or before the line terminator right at the end of the string (if any)
This ensures we capture the last digit(s) after the last .
in the ip address. We then can enter the replacement for what we captured, which in this case is XXX
. So the outcome of the policy, would look like this: 164.16.13.XXX
This is a technique to hide precision from numeric values while providing more utility than simply hashing. For example, you could remove precision from a geospatial coordinate. You can also use this type of policy to remove precision from dates and times by rounding to the nearest hour, day, month, or year.
Note: The user receiving the unmasking request must send the unmasked value to the requester.
With Reversible Masking, the raw values are switched out with consistent values to allow analysis without revealing the underlying sensitive data. The direct identifier is replaced with a token that can still be tracked or counted.
This option masks the value, but preserves the length and type of the value.
Preserving the data format is important if the format has some relevance to the analysis at hand. For example, if you need to retain the integer column type or if the first 6 digits of a 12-digit number have an important meaning.
This option uses functions native to the underlying database to transform the column.
Limitations
The masking functions are executed against the remote database directly. A poorly written function could lead to poor quality results, data leaks, and performance hits.
Using custom functions can result in changes to the original data type. In order to prevent query errors you must ensure that you cast this result back to the original type.
The function must be valid for the data type of the selected column. If it is not
Local policies will error and show a message that the function is not valid.
Global policies will error and change to the default masking type (hashing for text and NULL for all others).
For all of the policies above, both at the local and global policy levels, you can conditionally mask the value based on a value in another column. This allows you to build a policy that looks something like: "Mask bank account number where country = 'USA'" instead of blindly stating you want bank account masked always.
K-anonymity is measured by grouping records in a data source that contain the same values for a common set of quasi identifiers (QIs) - publicly known attributes (such as postal codes, dates of birth, or gender) that are consistently, but ambiguously, associated with an individual.
The k-anonymity of a data source is defined as the number of records within the least populated cohort, which means that the QIs of any single record cannot be distinguished from at least k other records. In this way, a record with QIs cannot be uniquely associated with any one individual in a data source, provided k is greater than 1.
In Immuta, masking with k-anonymization examines pairs of values across columns and hides groups that do not appear at least the specified number of times (k). For example, if one column contains street numbers and another contains street names, the group 123, "Main Street"
probably would appear frequently while the group 123, "Diamondback Drive"
probably would show up much less. Since the second group appears infrequently, the values could potentially identify someone, so this group would be masked.
After the fingerprint service identifies columns with a low number of distinct values, users will only be able to select those columns when building the policy. Users can either use a minimum group size (k) given by the fingerprint or manually select the value of k.
Masking multiple columns with k-anonymization
Governors can write global data policies using k-anonymization in the global data policy builder.
When this global policy is applied to data sources, it will mask all columns matching the specified tag.
Applying k-anonymization over disjoint sets of columns in separate policies does not guarantee k-anonymization over their union.
If you select multiple columns to mask with k-anonymization in the same policy, the policy is driven by how many times these values appear together. If the groups appear fewer than k times, they will be masked.
For example, if Policy A
Policy A: Mask with k-anonymization the values in the columns
gender
andstate
requiring a group size of at least 2 for everyone
was applied to this data source
the values would be masked like this:
Note: Selecting many columns to mask with k-anonymization increases the processing that must occur to calculate the policy, so saving the policy may take time.
However, if you select to mask the same columns with k-anonymization in separate policies, Policy C and Policy D,
Policy C: Mask with k-anonymization the values in the column
gender
requiring a group size of at least 2 for everyonePolicy D: Mask with k-anonymization the values in the column
state
requiring a group size of at least 2 for everyone
the values in the columns will be masked separately instead of as groups. Therefore, the values in that same data source would be masked like this:
All members of this cohort have indicated substance abuse, sensitive personal information that could have damaging consequences, and, even though direct identifiers have been removed and k-anonymization has been applied, outsiders could infer substance abuse for an individual if they knew a male welder in this zip code.
In this scenario, using randomized response would change some of the Y's in substance_abuse
to N's and vice versa; consequently, outsiders couldn't be sure of the displayed value of substance_abuse
given in any individual row, as they wouldn't know which rows had changed.
How the randomization works
Immuta applies a random number generator (RNG) that is seeded with some fixed attributes of the data source, column, backing technology, and the value of the high cardinality column, an approach that simulates cached randomness without having to actually cache anything.
For numeric data, Immuta uses the RNG to add a random shift from a 0-centered Laplace distribution with the standard deviation specified in the policy configuration. For most purposes, knowing the distribution is not important, but the net effect is that on average the reported values should be the true value plus or minus the specified deviation value.
Preserving data utility
Using randomized response doesn't destroy the data because data is only randomized slightly; aggregate utility can be preserved because analysts know how and what proportion of the values will change. Through this technique, values can be interpreted as hints, signals, or suggestions of the truth, but it is much harder to reason about individual rows.
Additionally, randomized response gives deniability of record content not dataset participation, so individual rows can be displayed.
In some cases, you may want several different masking policies applied to the same column through Otherwise policies. To build these policies, select everyone who instead of everyone or everyone except. After you specify who the masking policy applies to, select how it applies to everyone else in the Otherwise condition.
You can add and remove tags in Otherwise conditions for global policies (unlike local policy Otherwise conditions), as illustrated above; however, all tags or regular expressions included in the initial everyone who rule must be included in an everyone or everyone except rule in the additional clauses.
Feature limitations
Masking struct and array columns is only available for Databricks data sources.
Immuta only supports Parquet and Delta table types.
If the table is queried through the Query Engine instead of natively in Databricks, any struct fields that have an applied masking policy will have the root struct column made null.
Spark supports a class of data types called complex types, which can represent multiple data values in a single column. Immuta supports masking fields within array and struct columns:
array: an ordered collection of elements
struct: a collection of elements that are primitive or complex types
Without this feature enabled, the struct and array columns of a data source default to jsonb
in the Data Dictionary, and the masking policies that users can apply to jsonb
columns are limited. For example, if a user wanted to mask PII inside the column patient
in the image below, they would have to apply null masking to the entire column or use a custom function instead of just masking name
or address
.
After a global or local policy masks the columns containing PII, users who do not meet the exception specified in the policy will see these values masked:
Note: Immuta uses the >
delimiter to indicate that a field is nested instead of the .
delimiter, since field and column names could include .
.
Caveats
Struct Columns with Many Fields
If users have struct columns with many fields, they will need to either
create the data source against a cluster running Spark 3 or
add spark.debug.maxToStringFields 1000
to their Spark 2 cluster's configuration.
To get column information about a data source, Immuta executes a DESCRIBE
call for the table. In this call, Spark returns a simple string representation of the schema for each column in the table. For the patient
column above, the simple string would look like this:
struct<name:string,ssn:string,age:int,address:struct<city:string,state:string,zipCode:string,street:text>>
Immuta then parses this string into the following format for the data source's dictionary:
However, if the struct contains more than 25 fields, Spark truncates the string, causing the parser to fail and fall back to jsonb
. Immuta will attempt to avoid this failure by increasing the number of fields allowed in the server-side property setting, maxToStringFields
; however, this only works with clusters on a Spark 3 runtime. The maxToStringFields
configuration in Spark 2 cannot be set through the ODBC driver and can only be set through the Spark configuration on the cluster with spark.debug.maxToStringFields 1000
on cluster startup.
Deprecation notice
Support for this feature has been deprecated.
This feature allows Immuta to unmask data that is masked at rest in a remote database using a customer-provided encryption or masking algorithm. To do so,
Data owners apply these tags to columns that are masked (with encryption or another algorithm) in the remote database.
Unmasking process
Immuta will only unmask externally masked data if two conditions are met:
A masking policy is applied against that tagged column.
The querying user is exempt from that policy.
When a user who is exempt from the policy restrictions queries that masked column using a filter, Immuta converts the literal being queried using the external algorithm provided. Consider the following example:
The social_security_number
column is masked on-ingest and has the tag externally_masked_data
applied to it.
This masking policy is applied to the data source in Immuta: Mask using hashing the values in the column tagged externally_masked_data
except for users who belong to the group view_masked_values
.
The querying user belongs to the view_masked_values
group.
When the user above runs the query select * from table A where social_security_number = 220869988
, Immuta converts 220869988
to the masked value using the provided algorithm to query the database and return matching rows.
Use Equality Queries Only
Queries against masked values on-ingest should be equality queries only. For example, if an exempt user ran a query like select * from table A where social_security_number > 220869988
, the results may not make sense (depending on the algorithm used for masking the data).
Tutorials
These policies hide entire rows or objects of data based on the policy being enforced; some of these policies require the data to be tagged as well.
These policies match a user attribute with a row/object/file attribute to determine if that row/object/file should be visible. This process uses a direct string match, so the user attribute would have to match exactly the data attribute in order to see that row of data.
For example, to restrict access to insurance claims data to the state for which the user's home office is located, you could build a policy such as this:
Only show rows where user possesses an attribute in
Office Location
that matches the value in the columnState
for everyone except when user is a member of groupLegal
.
In this case, the Office Location
is retrieved by the identity management system as a user attribute or group. If the user's attribute (Office Location
) was Missouri
, rows containing the value Missouri
in the State
column in the data source would be the only rows visible to that user.
This policy can be thought of as a table "view" created automatically for the user based on the condition of the policy. For example, in the policy below, users who are not members of the Admins
group will only see taxi rides where passenger_count < 2
.
Only show rows where
public.us.taxis.passenger_count <2
for everyone except when user is a member of group Admins.
WHERE clause policy requirement
All columns referenced in the policy must have fully qualified names. Any column names that are unqualified (just the column name) will default to a column of the data source the policy is being applied to (if one matches the name).
These policies restrict access to rows/objects/files that fall within the time restrictions set in the policy. If a data source has time-based restriction policies, queries run against the data source by a user will only return rows/blobs with a date in its event-time
column/attribute from within a certain range.
The time window is based on the event time you select when creating the data source. This value will come from a date/time column in relational sources.
These policies return a limited percentage of the data, which is randomly sampled, at query time. but it is the same sample for all the users. For example, you could limit certain users to only 10% of the data. Immuta uses a hashing policy to return approximately 10% of the data, and the data returned will always be the same; however, the exact number of rows exposed depends on the distribution of high cardinality columns in the database and the hashing type available. Additionally, Immuta will adjust the data exposed when new rows are added or removed.
Best practice: row count
Immuta recommends you use a table with over 1,000 rows for the best results when using a data minimization policy.
Public preview
This feature is currently in public preview and available to all accounts.
If a global masking policy applies to a column, you can still use that masked column in a global row-level policy.
Consider the following policy examples:
Masking policy: Mask values in columns tagged Country
for everyone except users in group Admin
.
Row-level policy: Only show rows where user possesses an attribute in OfficeLocation
that matches the value in column tagged Country
for everyone.
Both of these policies use the Country
tag to restrict access. Therefore, the masking policy and the row-level policy would apply to data source columns with the tag Country
for users who are not in the Admin
group.
Limitations
Deprecation notice
Support for these policies has been deprecated.
Parameter | Type | Description |
---|---|---|
When building a global data policy, governors can , which must then be acknowledged by data owners when the policy is applied to data sources.
See for a tutorial.
By default, Immuta does not apply subscription policies to data sources when you create them. This behavior ensures that existing policies in your remote data platform remain on that data and that current workflows stay intact. However, you can change this default setting on the so that data sources are immediately locked down when data is registered in Immuta.
For details, see the .
When two or more global subscription policies of the listed below apply to the same data source they may conflict:
For instructions on creating these policies, see the .
After an application admin has , data governors and owners can create global subscription policies using all the functions and variables outlined below.
Variable/Function | Description | Example |
---|
See for a tutorial.
: Only show rows where user is a member of a group that matches the value in the column tagged Department
.
For all policies except , inclusionary logic allows governors to vary policy actions with an Otherwise clause.
Purposes help define the scope and use of data within a project and allow users to meet . Governors create and manage purposes and their sub-purposes, which project owners then add to their project(s) and use to drive Data Policies.
Purposes can be constructed as a hierarchy, meaning that purposes can contain nested sub-purposes, much like in Immuta. This design allows more flexibility in managing purpose-based restriction policies and transparency in the relationships among purposes.
Please refer to the for a tutorial on purpose-based restrictions on data.
Hashed values are different across data sources, so you cannot join on hashed values unless you . Immuta prevents joins on hashed values to protect against link attacks where two data owners may have exposed data with the same masked column (a quasi-identifier), but their data combined by that masked value could result in a sensitive data leak.
This option masks the values using hashing, but allows users to submit an to users who meet the exceptions of the policy.
This option also allows users to submit an to users who meet the exceptions of the policy.
Note: When building conditional masking policies with custom SQL statements, avoid using a column that is masked using in the SQL statement, as this can lead to different behavior depending on whether you’re using the Query Engine, Spark, or Snowflake and may produce results that are unexpected.
Note: The default cardinality cutoff for columns to qualify for k-anonymization is 500. For details about adjusting this setting, navigate to the .
Gender | State |
---|
Gender | State |
---|
Gender | State |
---|
This policy masks data by slightly randomizing the values in a column, while preventing outsiders from inferring content of specific records.
For example, if an analyst wanted to publish data from a health survey she conducted, she could remove direct identifiers and apply to indirect identifiers to make it difficult to single out individuals. However, consider these survey participants, a cohort of male welders who share the same zip code:
participant_id | zip_code | gender | occupation | substance_abuse |
---|
For string data, the random number generator essentially flips a biased coin. If the coin comes up as tails, which it does with the frequency of the replacement rate , then the value is changed to any other possible value in the column, selected uniformly at random from among those values. If the coin comes up as heads, the true value is released.
After Complex Data Types is enabled on the , the column type for struct columns for new data sources will display as struct
in the Data Dictionary. (For data sources that are already in Immuta, users can edit the data source and change the column types for the appropriate columns from jsonb
to struct
.) Once struct fields are available, they can be searched, tagged, and used in masking policies. For example, a user could tag name
, ssn
, and street
as PII instead of the entire patient
column.
Immuta’s External Masking feature expects data to be masked at rest by an external tool consistently on a per-cell basis in the remote database. Immuta then provides policy-based unmasking (and additional masking on top of this using ) through the Query Engine.
System Administrators build their own custom logic and security in an . Because Immuta always pushes down the masked version of the literal when the user is exempt from the policy, the organization should use deterministic IVs/salt to ensure the same value is masked consistently throughout the data.
System Administrators give Immuta access to the that will be used by data owners to indicate that data is masked at rest in the remote database.
Data owners or governors that allow Immuta to reach out to this external REST service to unmask data according to the specifications in the policy.
To configure External Masking, see the .
For an implementation guide, see the .
Note: When building row-level policies with custom SQL statements, avoid using a column that is masked using in the SQL statement, as this can lead to different behavior depending on whether you’re using the Query Engine, Spark, or Snowflake and may produce results that are unexpected.
You can put any valid SQL in the policy. See the for a list of custom functions.
This feature is only available for and integrations.
This feature is only supported for , not local data policies.
Immuta includes the and the . Governors can activate these global policies to automatically enforce restrictions on data sources that have had relevant tags applied to them by users or .
To learn how to activate a templated policy, navigate to the .
action
Enum(MASK
, UNMASK
)
Either MASK
or UNMASK
the values
values
Map(string
, String[]
)
A map of columns containing an array of masked/unmasked values
@database | Users who have an attribute key that matches a database will be subscribed to the data source(s) within the database. | @hasAttribute('SpecialAccess', '@hostname.@database.*'): If a user had the attribute |
@hasAttribute('Attribute Name', 'Attribute Value') | Users who have the specified attribute are subscribed to the data source. | @hasAttribute('Occupation', 'Manager'): Any user who has the attribute |
@hasTagAsAttribute('Attribute Name', 'dataSource' or 'column' ) | Users who have an attribute key that matches a tag on a data source or column will be subscribed to that data source. | @hasTagAsAttribute('PersonalData', 'dataSource'): Users who have the attribute key |
@hasTagAsGroup('dataSource' or 'column' ) | Users who are members of a group that matches a tag on a data source or column (respectively) will be subscribed to that data source. | @hasTagAsGroup('dataSource'): If Data Source 1 has the tags |
@hostname | Users who have an attribute key that match a hostname will be subscribed to the data source(s) with that hostname. | @hasAttribute('SpecialAccess', '@hostname.*'): If a user had the attribute |
@iam | Users who sign in with the IAM with the specified ID (ID that displays on the App Settings page) will be subscribed to the data source. | @iam == 'oktaSamlIAM': Any user whose IAM ID is |
@isInGroups('List', 'of', 'Groups') | Users who are members of the specified group(s) can be subscribed to the data source. | @isInGroups('finance','marketing','newhire'): Users who are members of the groups |
@schema | Users who have an attribute key that match this schema will be subscribed to the data source(s) under that schema. | @hasAttribute('SpecialAccess', '@hostname.@database.@schema'): If a user had the attribute |
@table | Users who have an attribute key that match this table will be subscribed to the data source(s). | @hasAttribute('SpecialAccess', '@hostname.@database.@schema.@table'): If a user had the attribute |
Female | Ohio |
Female | Florida |
Female | Florida |
Female | Arkansas |
Male | Florida |
Null | Null |
Female | Florida |
Female | Florida |
Null | Null |
Null | Null |
Female | Null |
Female | Florida |
Female | Florida |
Female | Null |
Null | Florida |
... | ... | ... | ... | ... |
| 75002 | Male | Welder | Y |
| 75002 | Male | Welder | Y |
| 75002 | Male | Welder | Y |
| 75002 | Male | Welder | Y |
| 75002 | Male | Welder | Y |
... | ... | ... | ... | ... |