1 of 4

Compliantly Open More Sensitive Data for ML and Analytics

Who is this for?

This guide is intended for users who want to open more data for access by creating more granular and powerful policies at the data layer.

Prerequisites

Data platform integration configured in Immuta
Users and data sources have been registered in Immuta

Goals

Firstly, it's crucial to remember that just because a subscription policy, as described in the Automate data access control decisions use case, grants a user access to data, it doesn’t mean that they should have access to all of that data. Often, organizations stop at just granting access without considering the nuances of what specific columns or rows should be accessible to different users. It's important to see the process all the way through by masking sensitive values that are not necessary for a user's role. This ensures that while users have the data access they need, sensitive information is appropriately protected.

Secondly, when considering subscription policies in the context of global data policies, an interesting perspective emerges. A subscription policy could essentially be seen as mirroring the functionality of a global masking policy. This is because, like a global masking policy, a subscription policy can be used to mask or redact the entirety of a table. This interpretation underscores the potential of global data policies for comprehensive data protection.

One of the primary advantages is an easy and maintainable way to manage data leak risks without impeding data access, which means more data for ML and analytics. By focusing on global data policies, organizations can ensure that sensitive data, down to the row and column level, is appropriately protected, regardless of who has access to it. This means that while data remains broadly accessible for business operations and decision-making, the risk of data leaks is significantly reduced. This is because you can

be more specific with your policies as described above and
mask using advanced privacy enhancing technologies (PETs) that allow you to get utility from data in a column while still preserving privacy in that same column.

However, it's important to note that this approach does not mean that you should never create subscription policies. Subscription policies still have their place in data governance. The key point here is that the primary focus shifts away from subscription policies and towards global data policies, which offer a more comprehensive and effective approach to data protection. This shift in focus allows for more nuanced control over data access, enhancing both data security and compliance.

When is this appropriate?

This use case is particularly suitable in scenarios where you already have a process for granting access to tables. If your organization has established procedures for table access that are working effectively, introducing global data policies can enhance your data governance without disrupting existing workflows.

It's also fitting when you grant access to tables to everyone. In such cases, the focus is less on who has access and more on what they can access. Global data policies can help ensure that while data is broadly accessible, sensitive information is appropriately masked or redacted, maintaining compliance and security.

Lastly, this approach is appropriate when you have very generic subscription policies. Your native tool may refer to subscription policies as table GRANTs. If your subscription policies are not tailored to specific user attributes, roles, or data sensitivity levels, they may not provide adequate data protection. Shifting the focus to global data policies, such as data masking, allows for more nuanced and effective control over data access, enhancing both security and compliance.

In essence, this use case is appropriate when you want to maintain or improve data accessibility while ensuring robust data protection, regardless of your current table grants.

When is this not appropriate?

Data sensitivity

If the existence of certain tables, schemas, or columns is considered sensitive information within your organization, this solution pattern may not be appropriate. Revealing the existence of certain data, even without granting access to the actual data, can pose a security risk in some contexts. In such cases, a more restrictive strategy may be required.

With this use case, users might have to navigate through a large number of tables to find the data they need. This could potentially hinder user experience, especially in large organizations with extensive data environments.

Configuration steps

Follow these steps to learn more about and start using Immuta to compliantly open more sensitive data for ML and analytics:

Complete the Monitor and secure sensitive data platform query activity use case to configure Immuta.
Opt to review the Automate data access control decisions use case.
Manage user metadata. This step is critical to building scalable policy and understanding the considerations around how and what to capture. Tag your users with attributes and groups that are meaningful for Immuta global data policies.
Manage data metadata. This is the final setup step you must complete before authoring policy. Tag your columns with tags that are meaningful for Immuta global data policies.
Author policy. In this step, you will define your global data policy logic for granularly masking and redacting rows and columns. Optionally test and deploy policy.

Managing User Metadata

You may have read the Automate data access control decisions use case already. If so you are aware of the two paths you must choose between: orchestrated-RBAC vs ABAC. To manage user metadata with this particular use case, you should use ABAC.

This is because you must know the contents and sensitivity of every column in your data ecosystem to follow this use case. With orchestrated RBAC, you tag your columns with access logic baked in. ABAC means you tag your columns with facts: what is in the column. It is feasible to do the latter, extremely hard to do the former (unless you use tag lineage described in the next topic), especially in a data ecosystem with constant change. This means that your users will need to have facts about them that drive policy decisions (ABAC) rather than single variables that drive access (as in orchestrated-RBAC).

Understanding that, read the ABAC section in the automate data access control decisions use case's Managing user metadata guide.

Managing Data Metadata

Prerequisites

Your schema metadata is registered using either of the Detect use cases:

Monitor and secure sensitive data platform query activity use case (Snowflake or Databricks)
General Immuta configuration use case (if not using Snowflake nor Databricks).

Enriching your data metadata with tags

You may have read the Automate data access control decisions use case already. If so, you are aware of the two paths you must choose between: orchestrated-RBAC vs ABAC. To manage data metadata with this particular use case, you should use ABAC.

This is because you must know the contents and sensitivity of every column in your data ecosystem to follow this use case. With orchestrated RBAC, you tag your columns with access logic baked in. ABAC means you tag your columns with facts: what is in the column. It is feasible to do the latter, extremely hard to do the former (unless you use tag lineage, described below), especially in an data ecosystem with constant change.

Understanding that, read the Managing data metadata guide.

However, there are some considerations specific to this use case with regard to data metadata.

Automated data tagging

Automated data tagging with Immuta sensitive data discovery (SDD) is recommended with this use case because it significantly reduces the manual effort involved in classifying and managing data. It ensures that data is consistently and accurately tagged, which is crucial for implementing effective data policies. Moreover, it allows for real-time tagging of data, ensuring that new or updated data is immediately protected by the appropriate policies. This is critical when all columns need to be considered for policy immediately, which is the case with this use case.

Schema monitoring enabled

While not directly related to data tagging, schema monitoring is a feature in Immuta that allows organizations to keep track of changes in their data environments. When enabled, Immuta actively monitors the data platform to find when new tables or columns are created or deleted. It then automatically registers or disables these tables in Immuta and updates the tags. This feature ensures that any global policies set in Immuta are applied to these newly created or updated data sources. It is assumed you are using schema monitoring when also using SDD. Without this feature, data will remain in data downtime while waiting for the policies to update.

Data tags are facts

As discussed above, with ABAC your data tags need to be facts. Otherwise, a human must be involved to bake access logic into your tags. As an example, it's easy for SDD to find an address, but it's harder for a human to decide if that address should be sensitive and who should have access to it - all defined with a single tag - for every column!

If you focus on tagging with facts, and use Immuta's frameworks to build higher level logic on those tags, then you are set up to build data policies in a scalable and error proof manner with limited data downtime.

Tag lineage

There is one way you can accomplish this use case using orchestrated RBAC: lineage. Immuta's lineage feature, currently in private preview and for Snowflake only, is able to propagate tags based on transform lineage. What this means is that columns that existed in tables for which the downstream table was derived, those upstream table's tags will get carried through to that new downstream table's columns. This is powerful because you can tag the root table(s) once - with your policy logic tags, as described in Managing data metadata - and they will carry through to all downstream tables' columns without any work. Be aware that tag lineage only works for tables; for views, the tags will not be propagated. However, policies on backing tables will be enforced on the views (which is why they are not propagated). Also be aware that if a table exists after some series of views, the tags will propagate to that table. As an example,

Author Policy

In the use case, we covered at length. Subscription policies control table level access.

In this use case, we are focused one level deeper: columns and rows within a table that can be protected in a more granular manner with .

Global data policy: column masking

With Immuta, you are able to , or even mask cells within a given column per row using (also termed conditional masking).

This is important for this use case to granularly mask or grant unmasked access to specific columns to specific users. This granularity of policy allows more access to data because you no longer have to roll up coarse access decisions to the table level and can instead make them at the individual column, row, or cell level.

Using tags

The concept of masking columns has been mentioned a few times already, but as you already know per the guide, your masking policies will actually target tags instead of physical columns. This abstraction is powerful, because it allows you to build a single policy that may apply to many different columns across many different tables (or ).

Masking conflicts

If you build policy using tags, isn't there a good chance that multiple masking policies could land on the same column? Yes.

This can be avoided . Immuta supports tag hierarchy, so if the depth of a tag targeted by a policy is deeper than the conflicting policy, the deepest (the more specific one) wins. As an example, mask by making null columns tagged PII is less specific (depth of 1) than the policy mask using hasing columns tagged Discovered.name (depth of 2), so the hashing masking policy would apply to columns tagged Discovered.name and PII rather than the null one.

Principle of least privilege

Immuta meets principle of least privilege by following an . What this means is that if you mask a column, that mask will apply to everyone except [some list of exceptions]. This uni-directional approach avoids policy conflicts, makes change management easier, authoring policy less complex, and (most importantly) avoids data leaks.

Masking techniques

There are many different approaches you can take to masking a column. Some masks render the column completely useless to the querying user, such as nulling it, while other masking techniques can provide some level of utility from the column while at the same time maintaining a level of privacy and security. These advanced masking techniques are sometimes termed . Immuta provides a that allow for privacy-vs-utility trade-off decisions when authoring masking policies.

Dealing with data types

If you were to build masking policies natively in your data platform, they require that you build a masking policy per data type it could mask. This makes sense, because a varchar column type can't display numerics, or vice versa, for example. Furthermore, when building masking policies that target tags instead of physical columns, it is possible that policy may target many differing data types, or even target new unforeseen data types in the future when new columns appear.

Global data policy: row-level

You may hear this policy called row filter or row access policy. The idea is to redact rows at query time based on the user running the query.

Without this capability, you would need a transform process that segments your data across different tables and then manage access to those tables. This introduces extra compute costs and at some point, when dealing with large tables and many differing permutations of access, it may be impossible to maintain as those tables grow.

Using tags

Advanced functions

@groupsContains('SQL Column or Expression') allows you to compare the value of a column to see if the user possesses any groups with a matching name (case sensitive).
@attributeValuesContains('Attribute Name', 'SQL Column or Expression') allows you to compare the value of a column to see if the user possesses any attribute values under a specific key with a matching name (case sensitive).
@purposesContains('SQL Column or Expression') allows you to compare the value of a column to see if the user is acting under a matching purpose (case sensitive).
@columnTagged('Tag Name') allows you to target tags instead of physical columns when using one of the above functions.

Here's a simple example that targets a physical column COUNTRY comparing it to the querying user's country attribute key's values:

@attributeValuesContains('country', 'COUNTRY')

This could also be written to instead use the @columnTagged function instead of the physical column name:

@attributeValuesContains('country', @columnTagged('Discovered.Country'))

Allowing this policy to be reused across many different tables that may have the COUNTRY column name spelled differently.

Data policy temporary overrides

...for everyone except users acting under purpose [some legitimate purpose(s)]

Data sources with the policies they want to be excluded from and
Purposes

This can be made temporary by deleting the project once access is no longer needed or revoking approval for the project after the need for access is gone.

Subscription policies

After you've created global data policies as described above, how do you actually give access to the tables?

Compliantly Open More Sensitive Data for ML and Analytics

Who is this for?

This guide is intended for users who want to open more data for access by creating more granular and powerful policies at the data layer.

Prerequisites

Data platform integration configured in Immuta
Users and data sources have been registered in Immuta

Goals

be more specific with your policies as described above and
mask using advanced privacy enhancing technologies (PETs) that allow you to get utility from data in a column while still preserving privacy in that same column.

When is this appropriate?

In essence, this use case is appropriate when you want to maintain or improve data accessibility while ensuring robust data protection, regardless of your current table grants.

When is this not appropriate?

Data sensitivity

Configuration steps

Follow these steps to learn more about and start using Immuta to compliantly open more sensitive data for ML and analytics:

Complete the Monitor and secure sensitive data platform query activity use case to configure Immuta.
Opt to review the Automate data access control decisions use case.
Manage user metadata. This step is critical to building scalable policy and understanding the considerations around how and what to capture. Tag your users with attributes and groups that are meaningful for Immuta global data policies.
Manage data metadata. This is the final setup step you must complete before authoring policy. Tag your columns with tags that are meaningful for Immuta global data policies.
Author policy. In this step, you will define your global data policy logic for granularly masking and redacting rows and columns. Optionally test and deploy policy.

Author Policy

In the use case, we covered at length. Subscription policies control table level access.

In this use case, we are focused one level deeper: columns and rows within a table that can be protected in a more granular manner with .

Global data policy: column masking

With Immuta, you are able to , or even mask cells within a given column per row using (also termed conditional masking).

Using tags

Masking conflicts

If you build policy using tags, isn't there a good chance that multiple masking policies could land on the same column? Yes.

Principle of least privilege

Masking techniques

Dealing with data types

This adds a large deal of effort to policy authoring. With Immuta, rather than explicitly requiring you to cover all possible data types when building a masking policy, Immuta has . This allows Immuta to apply the masking policy by changing, as minimally as possible, the masking type to something that is possible against that data type, while still maintaining the required privacy level provided by the original masking type.

Global data policy: row-level

You may hear this policy called row filter or row access policy. The idea is to redact rows at query time based on the user running the query.

Immuta allows you to using a global scope by leveraging tags just like with masking policies.

Using tags

Unlike masking policies, the use of tags for redacting rows not only impacts which tables are targeted with the policy, but also the logic of the policy. When , you must choose the column to "compare against" when making a decision on if the row should be visible to the user or not at query time. The tag can drive this decision. This allows you to author a single row-level policy that targets many different tables (or ).

Advanced functions

Sometimes customization is required for scenarios that require more complex logic. When in the policy builder, these are called policies. Within your custom WHERE, Immuta also provides several different functions you can leverage for powerful and scalable policies:

@groupsContains('SQL Column or Expression') allows you to compare the value of a column to see if the user possesses any groups with a matching name (case sensitive).
@attributeValuesContains('Attribute Name', 'SQL Column or Expression') allows you to compare the value of a column to see if the user possesses any attribute values under a specific key with a matching name (case sensitive).
@purposesContains('SQL Column or Expression') allows you to compare the value of a column to see if the user is acting under a matching purpose (case sensitive).
@columnTagged('Tag Name') allows you to target tags instead of physical columns when using one of the above functions.

Here's a simple example that targets a physical column COUNTRY comparing it to the querying user's country attribute key's values:

@attributeValuesContains('country', 'COUNTRY')

This could also be written to instead use the @columnTagged function instead of the physical column name:

@attributeValuesContains('country', @columnTagged('Discovered.Country'))

Allowing this policy to be reused across many different tables that may have the COUNTRY column name spelled differently.

Data policy temporary overrides

You may want to override and grant access to unmasked data to an individual for a very specific reason. Our recommendation is to use to create exceptions to global data policies.

There is some up front work that needs to occur to make this possible. A user with would need to create for access to different data types unmasked. As part of creating the purposes, they may want to alter the the user must agree to when acting under that purpose.

Then, the masking or row-level policies would need to be updated to to the policy, for example:

...for everyone except users acting under purpose [some legitimate purpose(s)]

Once that is done, users can and add to the project both of the following:

Data sources with the policies they want to be excluded from and
Purposes

However, that project does nothing until approved by a user with . Once that approval is complete, the user wanting the exception must they will only use the data for that purpose and then, using the Immuta UI, . Once switched to that purpose, the exception(s) will occur for the user.

This can be made temporary by deleting the project once access is no longer needed or revoking approval for the project after the need for access is gone.

Subscription policies

After you've created global data policies as described above, how do you actually give access to the tables?

This is a good question, and it really depends if you already have a table-access process you are already happy with or not. If you do, keep using it. If you don't, we recommend you create a that opens all tables up to anyone, as the point of this use case is not to focus on coarse grained table-level access but instead fine-grained access using global data policies.

When creating new tables, you should follow the best practices to ensure there's while you wait for policy uptime through Immuta schema monitoring.

Managing Data Metadata

Prerequisites

Your schema metadata is registered using either of the Detect use cases:

Monitor and secure sensitive data platform query activity use case (Snowflake or Databricks)
General Immuta configuration use case (if not using Snowflake nor Databricks).

Enriching your data metadata with tags

This is because you must know the contents and sensitivity of every column in your data ecosystem to follow this use case. With orchestrated RBAC, you tag your columns with access logic baked in. ABAC means you tag your columns with facts: what is in the column. It is feasible to do the latter, extremely hard to do the former (unless you use tag lineage, described below), especially in an data ecosystem with constant change.

Understanding that, read the Managing data metadata guide.

However, there are some considerations specific to this use case with regard to data metadata.

Compliantly Open More Sensitive Data for ML and Analytics

Who is this for?

Prerequisites

Goals

When is this appropriate?

When is this not appropriate?

Data sensitivity

Data navigation

Configuration steps

Managing User Metadata

Managing Data Metadata

Prerequisites

Enriching your data metadata with tags

Automated data tagging

Schema monitoring enabled

Data tags are facts

Tag lineage

Author Policy

Global data policy: column masking

Using tags

Masking conflicts

Principle of least privilege

Masking techniques

Dealing with data types

Global data policy: row-level

Using tags

Advanced functions

Data policy temporary overrides

Subscription policies

Compliantly Open More Sensitive Data for ML and Analytics

Who is this for?

Prerequisites

Goals

When is this appropriate?

When is this not appropriate?

Data sensitivity

Data navigation

Configuration steps

Author Policy

Global data policy: column masking

Using tags

Masking conflicts

Principle of least privilege

Masking techniques

Dealing with data types

Global data policy: row-level

Using tags

Advanced functions

Data policy temporary overrides

Subscription policies

Managing User Metadata

Managing Data Metadata

Prerequisites

Enriching your data metadata with tags

Automated data tagging

Schema monitoring enabled

Data tags are facts

Tag lineage