1 of 31

Projects and Purpose-Based Access Control

Projects combine users and data sources under a common purpose, which can then be used to restrict access to data and streamline collaboration.

There are three main use cases Immuta projects can help with:

For any of these use cases, project workspaces can be created to allow users to .

This section includes conceptual, how-to, and reference guides for using projects to enforce purpose-based access controls on your data.

This section includes conceptual, how-to, and reference guides that explain how equalized entitlements work in Immuta and how you can use them to effectively collaborate across your department without risk of data leaks.

This section includes conceptual and how-to guides that explain how to effectively use masked joins for your business use case.

This section includes conceptual, how-to, and reference guides for writing data to projects and sharing that data with other Immuta users with proper access controls enforced.

Projects and Purpose Controls

Purpose-based access control is a method of data access control that makes access decisions based on the reason a user or tool intends to use the data, which provides flexibility data governance teams need to build a high-powered, granular access control model. Furthermore, most regulations (like GDPR and HIPAA) include purpose clauses that require sensitive data only be collected and used for precise reasons.

For example, the GDPR’s Purpose Limitation Principle states that “Personal data should only be collected and processed for a legitimate specific purpose.” Furthermore, the regulation claims that the specific purpose “should be expressed in an unambiguous, transparent, and simple manner” in order to be compliant. The goal of this clause is to ensure that sensitive information is not being unnecessarily collected, stored, and exposed to risk by organizations that use it. With purpose-based access control, organizations can exhibit the granular, purpose-based control over data access that ensures compliance with these standards.

Immuta projects allow you to connect purposes to data sources and users to enforce purpose-based access controls on your data.

Getting started with purposed-based access control

This getting started guide outlines how to quickly implement purpose-based access controls for your business use case using Immuta projects, purposes, and global data policies.

How-to guides

The how-to guides in this section illustrate how to create and manage projects and purposes.

Create a project: Create a project to group data sources and users.
Create a purpose: Create a purpose for your project to enforce access restrictions on the project data sources.
Adjust a policy: Redistribute the noise of k-anonymization across multiple columns of a data source within a project to make specific columns more useful for analysis.
Project management: This section guides project owners, governors, and project managers through managing project settings, data sources, and members.

Reference guides

Projects and purposes: Immuta projects allow you to connect purposes to data sources and users to enforce purpose-based access controls on your data.
Policy adjustments: Project owners can use policy adjustments to increase a data set's utility while retaining the amount of k-anonymization that upholds de-identification requirements. With this feature enabled, users can redistribute the noise across multiple columns of a data source within a project to make specific columns more useful for their analysis.

Concept guide

Why use purposes?: This explanatory guide contains a conceptual overview of purposes in Immuta and the business value achieved by using purpose-based access controls.

Getting Started

Purpose-based access control

Purpose-based access control makes access decisions based on the purpose for which a given user or tool intends to use the data. This method of data access also provides flexibility for you to override policies and grant access to unmasked data to an individual for a very specific reason. Immuta recommends using purposes to create exceptions to global data policies.

There is some up-front work that needs to occur to make this possible.

A user with the GOVERNANCE Immuta permission creates legitimate purposes for access to different data types unmasked. As part of creating the purposes, they may want to alter the acknowledgement statement the user must agree to when acting under that purpose.
A data owner or governor updates the masking or row-level policies to include those purposes as exceptions to the policy.
Users create a project and connect the project to both the policy and the purpose by
- adding data sources with the policies they want users to be excluded from and
- adding the purposes to the project
However, that project does nothing until the purpose is approved by a user with the PROJECT_MANAGEMENT Immuta permission.
Once that approval is complete, the user wanting the exception must acknowledge they will only use the data for that purpose.
Using the Immuta UI, the user switches to that project context. Once switched to that project, the approved exceptions occur for the user.

These exceptions can be made temporary by deleting the project once access is no longer needed or un-approving the purpose for the project after the need for access is gone.

How-to Guides

Create a Project

Requirement: CREATE_PROJECT Immuta permission

Create a new project

Navigate to the Projects tab under Data in the navigation menu, and click the New Project button.
Fill out the Basic Information:
1. Enter a name for your project in the Project Name field.
2. Opt to complete the Project Description field to help identify your project.
3. Opt to enter project Documentation to provide context for members.
Select the purposes and any policy adjustments:
1. Choose to select a purpose from the list of purposes or create a new purpose for the project.
2. To create a new purpose, click Create Purpose and complete the prompts in the modal.
Note that all purposes added to a project will need to be created or approved by a user with the . Once purposes have been applied to a project, only these users can add data sources to the project.
Add a workspace configuration: Select your workspace configuration from the Workspace Configuration dropdown menu: Databricks or Snowflake.
- Databricks: Opt to edit the sub-directory in the Workspace Directory field (this sub-directory auto-populates as the project name) and enter the Workspace Database Name.
- Snowflake: Name the Workspace Schema. By default, the schema name is based off of the project name, but you can change it here. Your project workspace will exist within this schema under Snowflake under the database configured by the application admin.
  1. Use the dropdown menu to select the Hostname. Projects can only be configured to use one Snowflake host.
  2. Select one or more Warehouses to be available to project members when they are working in the workspace.
Add data sources to the project using the dropdown menu. .
Click Affirm and Create.

Project naming convention

Use a naming convention for projects that reflects the naming convention for databases. (e.g., If the project in Dev is called: “my_project” name the project “dev_my_project.") The data will end up in the project database prefix, so you can trace the source and make edits upstream in that project as necessary.

Set the project subscription policy

Projects are private by default but can be made public and shared with other users by changing the subscription policies setting. Governors are the only users who can manage subscription policies for projects with purposes.

In the project, click the Policies tab.
Click Edit Subscription Policy.
Select the group of users who will have access:
- Allow anyone: Selecting this option makes the project visible to everyone. Opt to require manual subscription by selecting the checkbox. This will require the users to manually subscribe to the project to gain access.
- Allow anyone who asks (and is approved): Selecting this option makes the project visible in search results, but users must request access and be granted permission. This restriction supports multiple approving parties, so project owners can allow more than one approver or users with specified permission types to approve other users who request access to the project.
  1. Click Anyone or An individual selected by user from the first dropdown menu.
    Note: If you choose An individual selected by user, when users request access to a project they will be prompted to identify an approver with the permission specified in the policy.
  2. Select the USER_ADMIN, GOVERNANCE, or AUDIT permission from the subsequent dropdown menu. You can add more than one approving party by selecting + Add Another Approver.
- Allow users with specific groups/attributes: Selecting this option allows users with the specified groups and attributes to join the project.
  1. Choose whether to build the policy off user groups or user attributes:
    is a member of group: Type the group name and select the group.
    possesses attribute: Type the attribute and select it. Then select the value from the dropdown menu.
  2. Opt to + Add Another Condition. When adding another condition, choose how the conditions will be required. If you select or, only one of the conditions must apply to a user for them to subscribe to the project. If you select and, all of the conditions must apply.
  3. Opt to allow users who do not meet the restrictions defined in the policy to still be able to discover the project by selecting the Allow Project Discovery checkbox.
  4. Once saved, users with the proper authorizations will be automatically subscribed. Opt to require users to manually subscribe to the project by selecting the Require Manual Subscription checkbox.
- Allow individually selected users: Selecting this option hides the project from the search results. Project owners must manually add and remove users, and the Private label will appear next to the project name.
Click Save to finish your policy.

Add users or groups to the project

In the project, click the Members tab.
Click the Add Members button.
Start typing a user's or group's name in the Add Members modal and select it from the dropdown that appears.
Opt to add an expiration to the subscription by entering the number of days until the access will expire.
Select the role.
Click Add.

Current project members will receive notifications that new users have been added to the project. A similar entry will be posted to the project's activity pane.

Create and Manage Purposes

Requirement: GOVERNANCE or PROJECT_MANAGEMENT Immuta permission

Create a purpose

Click the Governance icon and select Purposes in the navigation menu.
Click + Add Purpose.
Complete the Purpose Name field, and opt to customize the acknowledgement statement or add a description.
Click Create.

Create sub-purposes

Click the Governance icon and select Purposes in the navigation menu.
Open the dropdown menu and click View or Edit in the actions column of the purpose you want to add sub-purposes to.
Click Add Sub-Purposes.
Enter a name in the Enter nested purpose field in the Sub-Purpose Builder.
Click the arrow of the purpose or sub-purpose(s) to continue adding nested purpose fields.
Click Save.

A list of sub-purposes will populate. You can manage these sub-purposes by clicking Edit in the Actions column at any time.

Add a purpose to an existing project

Click the Data icon in the navigation menu and select Projects.
Go to the project overview page and click Add Purposes.
Select the purpose from the dropdown menu or click Create Purpose.
Complete the prompts in the modal and then click Save.

For the purpose to go into effect, it must be approved by a user with the GOVERNANCE or PROJECT_MANAGEMENT Immuta permission.

Customize acknowledgement statements

Click the Governance icon and select Purposes in the navigation menu.
Open the dropdown menu and click Edit in the Actions column of the purpose you would like to customize.
Click Edit above the acknowledgement statement, customize the text, and then click Confirm.

The page displays the updated statement, which now will be used by all projects and purposes. The updated statement will also be used by any new members joining existing projects containing purposes with default statements.

By default, sub-purposes will inherit the acknowledgment statements of their parent purposes. However, you can customize the acknowledgement statement for an individual sub-purpose as well by following the process above.

Manage purpose requests

Navigate to your user profile page and click the Requests tab.
Approve or deny the purpose request in the actions column.

Delete a purpose

Click the Governance icon and select Purposes in the navigation menu.
Click the more actions icon in the Actions column of the purpose or sub-purpose you want to delete.
Select Delete, and then click Confirm to approve the deletion.

Adjust a Policy

Requirements

You must own the project
The project must contain Snowflake data sources, as k-anonymization policies are only supported in the Snowflake integration.

Equalize the project

Navigate to the project.
Click the Policies tab.
In the Project Equalization section, click the toggle button to On.

Add a purpose with noise reduction

Navigate to the project Overview tab.
Click the Add Purposes button in the center of the page.
Select the desired purpose(s) from the dropdown menu. The project must have one purpose with noise reduction for policy adjustments to function. Noise reduction is indicated by the policy adjustment amount highlighted in gray. If there is no policy adjustment shown, it is the default, none.
Click Save.
The purpose will be staged until a user with the GOVERNANCE or PROJECT_MANAGEMENT permission approves and activates it. If you have one of these permissions, the staging will be skipped.
After the purpose has been approved, click I Agree to agree to the terms of the purpose.
Note: All members of the project must agree to the terms of the purpose; if they decline, they will be removed from the project.

Add data sources with k-anonymization policies

Navigate to the Policy Adjustment tab.
Select the data source from the dropdown menu. You will receive a No Adjustments Available message if there are no columns in the data sources that are associated with adjustable policies.
Select the columns from the dropdown menu.

Adjust column weights and percent null

In the priority ranking window, give weight from most to least important columns. The higher the weight the more usability will be provided, while still providing de-identification through the other columns.
Opt to select Keep Fields in the Clear for specific columns.
After assigning the weight and ensuring that the remaining weight is zero, click the Adjust button.
Check that the percent NULL is appropriate for your usability. If you are content with the percent NULL, move on to the next step; if you are not satisfied with the percent NULL, repeat the previous steps until satisfied.
Click Apply after you have made the acceptable adjustments.

Keep fields in the clear

After you ensure that your data source has two columns with k-anonymization policies applied,

Navigate to your project and click the Overview tab.
Click Add Purposes and select a purpose with a Noise Reduction and Fields in Clear tags, or create a new purpose with Noise Reduction and Fields in Clear enabled.
Click Save, and then click I Agree.
Navigate to the Policy Adjustment tab.
Select the data source(s) and columns from the subsequent dropdown menus.
Select the Keep in the Clear checkbox next to columns to be kept in the clear.
Adjust the weight for the remaining columns. The Remaining Weight must equal 0.
Click Adjust and then Apply.

The values for the fields you selected will now be visible to users acting under the project.

Project Management

Manage Projects and Project Settings

Manage project purposes

Requirement: You must own the project or have the GOVERNANCE or PROJECT_MANAGEMENT Immuta permission

Add a purpose to a project

Navigate to the Project Overview tab.
Click + Add Purposes.
Create a new purpose in the modal or select purposes from the dropdown menu.
Click Save, and then click I Agree.

Remove a purpose from a project

Navigate to the Project Overview tab.
Scroll to the purposes section and click Remove.
Click I Agree

Enable project equalization

See the Manage project equalization guide.

Enable masked joins

See the Enable masked joins guide.

Manage project documentation

Requirement: CREATE_PROJECT Immuta permission (and you must own the project)

Click the Project Overview tab.
Click the Edit button in the Documentation section.
Document the details of your project in the text box that appears, and then click Save.

Manage project tags

Requirement: CREATE_PROJECT (and you must own the project), GOVERNANCE, or PROJECT_MANAGEMENT Immuta permission

Select a project and navigate to the Project Overview tab.
Scroll to the Tags section and click the Add Tags button.
Begin typing the tag name in the window that appears, and then select the tag from the dropdown menu.
After selecting all relevant tags, click the Add button.

Remove tags from a project

Requirement: CREATE_PROJECT (and you must own the project), GOVERNANCE, or PROJECT_MANAGEMENT Immuta permission

Navigate to the Project Overview tab.
Scroll to the Tags section and click on the tag that you want to remove to open its side sheet.
Click Remove.
Click Confirm to delete the tag.

Disable, enable, or delete a project

Requirements:

CREATE_PROJECT (and you must own the project), GOVERNANCE, or PROJECT_MANAGEMENT Immuta permission to disable or enable a project
CREATE_PROJECT Immuta permission to delete a project

Disable a project

Click the Data icon in the navigation menu and select Projects.
Select the My Projects tab.
Click the more options icon next to the project and select Disable.

Enable a project

Click the Data icon in the navigation and select Projects.
Select the My Projects tab.
Click the more options icon next to the project and select Enable.

Delete a project

Deleting a project permanently removes it from Immuta. Projects must first be disabled before they can be deleted.

Click the Data icon in the navigation menu and select Projects.
Select the My Projects tab.
Click the more options icon next to the disabled project and select Delete.
Click Confirm.

Manage Project Data Sources

Requirements:

If project equalization or masked joins is enabled, you must own the project.
If a purpose has been added to and approved for the project, you must have the GOVERNANCE or PROJECT_MANAGEMENT Immuta permission.
Otherwise, you must be a project member.

Add data sources to a project

Navigate to the Project Overview tab.
Click the Add Data Sources button.
Start typing the name of a data source you'd like to include in the project.
Select the data source from the list of auto-completed options in the dropdown menu.
Repeat this process to add additional data sources to the list. You can remove them using the more options icon.
Opt to re-equalize the project by clicking the toggle on.
When complete, click the Save button.

Add data sources by purpose

You can automatically add all data sources to a project that contain a Limit usage to purpose policy that matches the purpose of that project.

Select a Project, and click the Add Data Sources button.
Click Add By Purpose.
All data sources matching the project's purpose(s) will populate. Review this list, and then click Save.

Bulk add data sources to a project

Navigate to the Data Sources page.
Select the checkboxes for the data sources you want in a project.
Select the bulk action Add To Current Project to add to your current project or Add To New Project.
If you select Add To New Project, create the project on the next page:
1. Enter the project name and description in the text boxes.
2. Opt to add a purpose to the project using the dropdown.
3. Select any additional data sources to include from the dropdown.
4. Click Create.

Manage Project Members

Requirement: You must own the project

Edit a member role

On the Members tab, click the Role of the member whose role you want to change.
Select a different role: subscribed or owner.

Remove members from a project

On the Members tab, click the Deny button next to the user or group you want to remove.
Complete the Reasoning field in the window that appears, and then click Submit.

Reference Guides

Projects and Purposes

Immuta projects combine users and data sources under a common purpose. Sometimes this purpose is for a single user to organize their data sources or to control an entire schema of data sources through a single projects screen; however, most often this is an Immuta purpose for which the data has been approved to be used and will restrict access to data and streamline team collaboration.

Project components

Project members: The users who will create, manage, and use projects.
Purposes: Purposes allow governors to define exceptions to policies based on how a user will use the data.
Data sources: Projects create an environment within Immuta where the user will only see the data sources within the project, despite their subscription to other data sources.
Subscription policy: Project users with the appropriate permissions can set a subscription policy on a project to control the users who can join.
Documentation: Project users can create documentation within the Immuta project page, allowing for an easy and consistent trail of communication.
Equalization: Project equalization ensures that the data in the project looks identical to all members, regardless of their level of access to data.
Workspaces: With equalization enabled, project users can create Snowflake or Databricks workspaces where users can view and write data.
Derived data sources: Derived data sources are the data sources that come from workspaces. Once they are created, they automatically inherit policies from their parent sources.
Masked joins: Within a project, users can join masked columns.

Members

The features and capabilities of each user differ based on the user's role within the project and within Immuta. Roles and their capabilities are outlined below.

Project role

capabilities

Project owner

Users with the CREATE_PROJECT permission can

Project governor

Users with the GOVERNANCE permission can

Project manager

Users with the PROJECT _MANAGEMENT permission can

Project member

Once subscribed to a project, all users can

remove data sources they’ve added to the project

Purposes

Best practice: purposes

Consider purposes as attributes. Attributes identify a user, and purposes identify why that user should have access.

Data governors and users with the PROJECT_MANAGEMENT permission are responsible for configuring and approving project purposes and acknowledgement statements.

Purposes help define the scope and use of data within a project and allow users to meet purpose restrictions on policies. Governors create and manage purposes and their sub-purposes, which project owners then add to their project(s) and use to drive data policies. Project owners can also create a purpose; however, they remain in a staged state until a governor or project manager approves the purpose request.

Sub-purposes

Purposes can be constructed as a hierarchy, containing nested sub-purposes, much like tags in Immuta. This design allows more flexibility in managing purpose-based restriction policies and transparency in the relationships among purposes.

For example, if the purpose Research included Marketing, Product, and Onboarding as sub-purposes, a governor could write the following global policy:

Limit usage to purpose(s) Research for everyone on data sources tagged PHI.

This hierarchy allows you to create this as a single purpose instead of creating separate purposes, which must then each be added to policies as they evolve.

Now, any user acting under the purpose or sub-purpose of Research - whether Research.Marketing or Research.Onboarding will meet the criteria of this policy. Consequently, purpose hierarchies eliminate the need for a governor to rewrite these global policies when sub-purposes are added or removed. If new projects with new Research purposes are added, the relevant global policy will automatically be enforced.

Acknowledgement statements

Projects with purposes require owners and subscribers to acknowledge that they will only use the data for those purposes by affirming or rejecting acknowledgement statements. Each purpose has its own acknowledgement statement, and a project with multiple purposes requires users to accept more than one acknowledgement statement. Immuta keeps a record of each project member's response to the acknowledgement statement(s) and records the purpose, the time of the acknowledgement, and the text of the acknowledgement. Purposes can use Immuta's default acknowledgement statement or one customized by a data governor.

If users accept the statement, they become a project member. If they reject the acknowledgement statement, they are denied access to the project.

Project contexts

When a user is working within the context of a project, they will only see the data in that project. This helps to prevent data leaks when users collaborate. Users can switch project contexts to access various data sources while acting under the appropriate purpose. By default, there will be no project selected, even if the user belongs to one or more projects in Immuta.

When users change project contexts, queries reflect users as acting under the purposes of that project, which may allow additional access to data if there are purpose restrictions on the data source(s). This process also allows organizations to track not just whether a specific data source is being used, but why.

Schema projects

Schema projects are a collection of data sources that Immuta creates when multiple data sources belong to the same schema. For information about schema projects, see the Schema project guide.

Policy Adjustments

Public preview

This feature is in public preview. It is available to all customers and can be enabled on the Immuta app settings page.

Prerequisites

Project owners can use policy adjustments to increase a data set's utility while retaining the amount of k-anonymization that upholds de-identification requirements. With this feature enabled, users can redistribute the noise across multiple columns of a data source within a project to make specific columns more useful for their analysis. Since these adjustments only occur within the project and do not change the individual data policies, data users must be acting under the project to see the adjustments in the data source.

Policy adjustment example

For example, a policy might mask these data source columns with k-anonymization: Income, Education, EmploymentStatus, Gender, and Location Code. When the analyst examines the data, the percent NULL has been predetermined by Immuta with an equal weight across all of these columns. However, if the analyst's work hinges on the EmploymentStatus column, the project owner can adjust the weights on the policy adjustment tab in the project to make the necessary data (EmploymentStatus) less NULL.

For columns that are already well-disclosed (meaning they already have a low percent null), the same percent null will display even when you drastically change the weight distribution.

Increasing the weight of a column that is already well-disclosed will not change the outcome. Generally, the biggest impact will be seen when you increase the weights of the largest percent null column. (The only exception to this is if that column already has a lot of nulls in the remote database.)

Keep fields in the clear

This feature provides an option to allow fields in the clear when creating a purpose, permitting specified analysts to bypass k-anonymization in specific circumstances.

When any purpose with the allow fields in the clear property enabled is approved for use within a project, a project member can proceed through the policy adjustment workflow and specify columns to be unmasked.

Concept Guide

Why Use Purposes?

In today’s world of modern privacy regulations, deciding what a single user can see is not just about who they are, but what they are doing. For example, the same user may not be able to see credit card information normally, but if they are doing fraud detection work, they are allowed to.

This may sound silly because it’s the same person doing the analysis, so why should we make this distinction? This gets into a larger discussion about controls. When most think about controls, we think about data controls - how do we hide enough information (hide rows, mask columns) to lower our risk. There’s a second control called contextual controls; what this amounts to is having a user agree they will only use data for a certain purpose and not step beyond that purpose. Combining contextual controls with data controls is the most effective way to reduce your overall risk.

In addition to the data controls you’ve seen, Immuta is also able to enforce contextual controls through what we term purposes. You are able to assign exceptions to policies, and those exceptions can be the purpose of the analysis in addition to who the user is (and what attributes they have). This is done through Immuta projects; projects contain data sources, have members, and are also protected by policy, but most importantly, projects can also have a purpose which can act as an exception to data policy. Projects can also be a self-service mechanism for users to access data for predetermined purposes without having to involve humans for ad hoc approvals.

Create exceptions to policies

The goal of Immuta is to modernize the management of data policies in organizations. One key aspect of modernization is to remove day-to-day human involvement in policy decision making, which is fragile and subjective. One decision process is how and when to make exceptions to policies.

This decision process could be something like: “because Morgan is analyzing employee attrition and retention, they should be able to see employee satisfaction survey data.” Notice it is not “because Morgan is HR, they should be able to see employee satisfaction survey data.” It’s not who Morgan is, but what Morgan is doing that should allow them heightened access. The purpose for data access drives the policy decision and can be approved objectively because you are not approving who can see data, but what can be done with it. However, if Morgan has to ask permission every time there is a new survey, this becomes a subjective and time-consuming process for the organization.

This is where purposes and purpose-based access control can help. Purposes allow you to define exceptions to rules as the purpose for which the user is acting. The key point here is you can define these purposes ahead of time, before any user actually tries to get an exception. Immuta projects are valuable not just for purpose-based access control, but because of the documentation trail they provide, the collaboration they allow, and the data access process they automate.

Additionally, to ensure that purposes are being used correctly and applied accurately, project purposes must be approved before the purpose is active in a project. That approver will see the lists of tables, the project members, and project documentation and decide if they want to approve it or not. This approval is recorded by Immuta, creating a documentation trail. After a purpose is approved within a project, any changes to the members or data sources will require the purpose to be re-approved, ensuring that the project continues to be compliant to an organization's requirements.

Workflow

Purpose-based access control makes access decisions based on the purpose for which a given user or tool intends to use the data, which reduces risk and aligns with many privacy regulations, such as GDPR and CCPA. This method of data access also provides flexibility for you to override policies and grant access to unmasked data to an individual for a very specific reason, without time-consuming and ad hoc manual approvals.

Immuta recommends using to create exceptions to global data policies.

See this for instructions on how to implement purpose-based access control for your organization.

Equalized Access

Project equalization improves collaboration by ensuring that the data in the project looks identical to all members, regardless of their level of access to data. When enabled, this feature automatically equalizes all permissions so that no project member has more access to data than the member with the least access.

How-to guide

: Enable project equalization and manage equalized entitlements to create a common level of access for all project members.

Reference guide

: This guide describes the design and behavior of project equalization.

Concept guide

: This explanatory guide offers an example use case for project equalization to highlight its business value.

Manage Project Equalization How-to Guide

Use project equalization so that all project members see the same data, and re-equalize projects if new members or data sources are added to the project.

Requirement: You must own the project

Enable project equalization

In the project, click the Policies tab.
In the Project Equalization section, click the toggle button to On.
Note: Only project owners can add data sources to the project if this feature is enabled.

Manage equalized entitlements

Click Edit next to Equalized Entitlements.
In the Equalized Entitlements Builder, select either is a member of a group or possesses attribute from the user condition dropdown menu.
- If you selected is a member of a group, select the appropriate group from the resulting dropdown.
- If you selected possesses attribute, select the appropriate key and value from the subsequent dropdown menus.
Click Save.

Use the recommended equalized entitlements

Use Immuta's recommended to protect your data in projects. Changing these entitlements creates two potential disadvantages:

If you add entitlements, members might see more data as a whole, but at least some members of the project will be out of compliance.
If you remove entitlements, the project will be open to users with fewer privileges, but this change might make less data visible to all project members. Removing entitlements is only recommended if you foresee new users joining with less access to data than the current members.

To view members' compliance status after changing the equalized entitlements,

Navigate to the Members tab from the Project Overview page.
Click the Not In Compliance text to view the details about the user's status.

Users who are not in compliance will be unable to view data sources within the project until the compliance issues are resolved.

To revert entitlements to those recommended by Immuta,

Click Edit next to Equalized Entitlements.
Click Use Recommended.
Click Confirm.

Manage validation frequency

Click Edit in the Validation Frequency section.
Enter an integer in the first field of the Validation Frequency modal that appears.
Select Days or Hours in the next dropdown.
Click Save.

Disable project equalization

Navigate to the Policies tab.
In the Project Equalization section, click the toggle button to Off.
Click Yes, Turn Off in the confirmation window.

Equalized Access Reference Guide

Project equalization

The same security restrictions regarding data sources are applied to projects: project members still need to be subscribed to data sources to access data, and only users with appropriate attributes and credentials can see the data if it contains any row-level or masking security.

However, project equalization improves collaboration by ensuring that the data in the project looks identical to all members, regardless of their level of access to data. When enabled, this feature automatically equalizes all permissions so that no project member has more access to data than the member with the least access. For a tutorial on enabling equalization, navigate to the Manage equalization guide. Note: Only project owners can add data sources to the project if this feature is enabled.

Once project equalization is enabled, the subscription policy for the project is locked and can only be adjusted by the project owner by changing the equalized entitlements. For users to access data sources within the project (and for the equalization to take effect), users must switch their context to the project.

Equalized entitlements

This setting adjusts the minimum entitlements (i.e., users' groups and attributes) required to join the project and to access data within the project. When project equalization is enabled, equalized entitlements default to Immuta's recommended settings, but project owners can edit these settings by adding or removing parts of the entitlements. However, making these changes entails two potential disadvantages:

If you add entitlements, members might see more data as a whole, but at least some members of the project will be out of compliance. The status of users' compliance is visible from the members tab within the project.
If you remove entitlements, the project will be open to users with fewer privileges, but this change might make less data visible to all project members. Removing entitlements is only recommended if you foresee new users joining with less access to data than the current members.

Validation frequency

This setting determines how often user credentials are validated, which is critical if users share data with project members outside of Immuta, as they need a way to verify that those members' permissions are still valid.

Project equalization and subscription policies

Once project equalization is enabled, the project subscription policy builder locks and can only be adjusted by manually editing the equalized entitlements. Then, the subscription policy will combine with the entitlement settings, depending on the policy type.

The way entitlements and approvals combine differs depending on the policy type; for clarity, the table below illustrates various scenarios for each type. Every row demonstrates how a specific project subscription policy changes after project equalization is enabled (when an equalized entitlement is set and when no entitlement is set) and how the policy reverts if project equalization is subsequently disabled.

Original policy

Equalized policy (example entitlement: member of group Accounting)

Equalized policy (no entitlement)

Policy after disabling equalization

Anyone

Allow user to subscribe when user is a member of group Accounting

Individual users you select

Allow users to subscribe when approved by anyone with permission owner (of this project)

Allow users to subscribe when they satisfy all of the following: is a member of group Accounting and is approved by anyone with permission owner (of this project)

Allow users to subscribe when approved by anyone with permission owner (of this project)

Allow users to subscribe to the project when user is a member of group Legal

Allow users to subscribe to the project when user is a member of group Accounting

Individual users you select

Allow users to subscribe to the project when user is a member of group Accounting

Individual users you select

For example, consider the subscription policy of the following sample project, Fraud Prevention, before project equalization is enabled:

Fraud prevention
Subscription policy: Allow users to subscribe when approved by anyone with permission owner (of this project).

After enabling project equalization, the following equalized entitlement is recommended by Immuta: User is a member of group Claims and Billing Department.

In this particular example, the equalized subscription policy contains the equalized entitlement and the approval of the original policy, so users must satisfy both conditions to subscribe:

the user must be a member of the group Claims and Billing Department
the user must be approved by anyone with permission Owner (of this project).

Why Use Project Equalization?

Let’s say Sally and Bob are working together on a project, and Sally has more data access than Bob. She can see PII and Bob cannot, but they both can see credit card numbers. Without Immuta, an admin would have to know what tables they intend to use, scrub all those tables of PII, and then give Sally and Bob access to those new tables in a place where they can safely work on the data and save any output they create.

With Immuta, this is a lot easier. Sally or Bob could create the project, add the tables as data sources, invite the other person to be a project member, and equalize it. The equalization will compare the members of the project to the data policies on the tables in the project and find the intersection: Sally and Bob both can see credit card numbers. That intersection becomes the equalization setting. Once the project is equalized in this example, Sally will lose access to PII, but both Sally and Bob will retain access to credit card numbers.

Now Sally and Bob want to do some transformation on the data and write it somewhere. This is where project workspaces come into play. Once workspaces are configured in a Snowflake or Databricks Spark integration, Immuta will create a schema in the remote database dedicated to this project where Sally and Bob can write their output. Within this workspace

Only members of the project will have access to that schema to write to or read from. (For example, in Snowflake Immuta limits it to a particular role it creates in Snowflake.) This is important, because if someone who can’t see credit card numbers somehow had access to where Sally and Bob were writing, they would gain access to data (the credit card numbers) they shouldn’t see.
They will only be able to access the tables that are in the project. (This may be critical if someone approved the purpose for only those tables.)

When determining where you should give analysts WRITE access, you need to consider the entire universe of where they have READ access, and that universe is constantly changing. This is an impossible proposition for you to manage without Immuta projects.

Key takeaways

Users at different levels of access can work together without help from an admin (to scrub the data).
Customers can avoid data leaks on analyst writes.

Masked Joins

Masked joins allow masked columns to be joined within the context of a project.

How-to guide

Enable a masked joins guide: Enable masked joins for data sources within your project.

Concept guide

Why use masked joins?: This explanatory guide offers an example use case for implementing masked joins to highlight their business value.

Enable Masked Joins How-to Guide

While masked joins are allowed, only project owners can add data sources to the project. Additionally, masked columns can only be joined if they are masked using hashing.

In order to join mask columns across data sources, those data sources must be linked by a project.

Requirement: You must own the project

Create a project or select an existing project.
Navigate to the Overview tab.
Click the Allow Masked Joins toggle.
Click Confirm.

Why Use Masked Joins?

Sometimes, accessing two tables separately doesn't violate compliance regulations, but accessing these two tables when they are joined creates a serious privacy problem. Immuta can help avoid creating these toxic combinations of data.

Tables are joined on a key, a column in one table that matches a column on the other table and allows the join. To prevent joining those tables, you would mask the keys so they no longer match one another. This is an important distinction: you do not make data anonymous only because it’s directly sensitive (a direct identifier) or because it’s indirectly sensitive (an indirect identifier) but potentially because it’s a join key that you may not want to be used for joining.

Because of this risk, Immuta uses a unique salt for hashing per table when masking a column to break referential integrity by default, making sure the masked values aren’t able to join. This means you can’t join on two masked columns unless you tell Immuta you want to allow that. To do so, you have to add those tables with the masked keys to a project and enable masked joins. When you enable masked joins in a project, Immuta uses a consistent salt across all data sources in that project, which returns referential integrity and allows joining.

Key takeaway

Projects give you control over toxic data combinations.

Writing to Projects

With project equalization enabled, project users can create Snowflake or Databricks Spark project workspaces where users can view and write data.

How-to guides

: Create a project workspace to allow Snowflake users subscribed to the project to write data to the project.
: Create a project workspace to allow Databricks users subscribed to the project to write data to the project.
: Write data to a project when working in the context of a Snowflake or Databricks project workspace.
: Create a derived data source to share the data you've written with other Immuta users.

Reference guides

: This reference guide describes the components and design of project workspaces for Snowflake and Databricks and defines derived data sources.
: This reference guide lists the available functions for switching your project context in Databricks.

How-to Guides

Create and Manage Snowflake Project Workspaces

After workspaces are configured, project owners can enable workspaces within their projects. This feature allows project members to write data to the project and share this data with other users as derived data sources.

Requirement: You must own the project

Prerequisites:

Snowflake integration configured with workspaces enabled.
Snowflake tables are registered in Immuta.
External IDs have been connected with an IAM or manually mapped in for Snowflake.
Data sources registered by excepted roles: Snowflake workspaces generate static views with the credentials used to register the table as an Immuta data source. Those tables must be registered in Immuta by an excepted role so that policies applied to the backing tables are not applied to the project workspace views.

Create a Snowflake workspace

Navigate to the Policies tab and enable project equalization by clicking the Project Equalization slider to on.
Scroll to the Workspace section and click Create.
Select Snowflake from the Workspace Configuration dropdown menu.
Name the Workspace Schema. By default, the schema name is based off of the project name, but you can change it here. Your project workspace will exist within this schema under Snowflake under the database configured by the Application Admin.
Use the dropdown menu to select the Hostname. Projects can only be configured to use one Snowflake host.
Select one or more Warehouses to be available to project members when they are working in the Snowflake workspace.
Click Create to enable the workspace.

Delete a workspace

Scroll to the Workspace section on the policies tab and click the toggle to disable the workspace.
Click Delete in the workspace section.
Choose one of the following options in the modal:
- Purge Generic Workspace Data: Permanently delete data, while the data used by derived data sources is preserved. Note: If you created a derived data source that references a view on top of a table in Snowflake that isn't a derived data source, that table will be deleted and break the derived data source.
- Purge Everything & Delete Derived Data Sources: Permanently delete data and purge all derived data sources.
Click Delete.

Create and Manage Databricks Spark Project Workspaces

After workspaces are configured, project owners can enable workspaces within their projects. This feature allows project members to to projects and share this data with other users as .

Requirement: You must own the project

Prerequisites:

Create a Databricks Spark workspace

Databricks cluster configuration

Before creating a workspace, the cluster must send its configuration to Immuta; to do this, run a simple query on the cluster (i.e., show tables). Otherwise, an error message will occur when you attempt to create a workspace.

Navigate to the Policies tab and enable Project Equalization by clicking the Project Equalization slider to on.
Scroll to the Workspace section and click Create.
Select Databricks from the Workspace Configuration dropdown menu.
Opt to edit the sub-directory in the Workspace Directory field; this sub-directory auto-populates as the project name.
Enter the Workspace Database Name.
Click Create to enable the workspace.

Databricks cluster configuration

Delete a workspace

Scroll to the Workspace section on the policies tab and click the toggle to disable the workspace.
Click Delete in the workspace section.
Choose one of the following options in the modal:
- Purge Generic Workspace Data: Permanently delete data, while the data used by derived data sources is preserved.
- Purge Everything & Delete Derived Data Sources: Permanently delete data and purge all derived data sources.
Click Delete.

Write Data to the Workspace

Once the workspace is created, project members will see relevant data sources when working under the project context.

Switch your project context.
Write data to the project workspace in Snowflake or Databricks:
- Snowflake: Select the role created by the project workspace. The role created will be a combination of the database name (configured by the application admin) and the schema name. Then, write data to this location.
- Databricks: Write data to the directory and database created in Databricks for the project workspace.

Now that data has been written to the workspace, users can share this data with others by making it a derived data source in Immuta.

Create a derived data source

Deprecation notice: Support for this feature has been deprecated.

Select a project.
Select the data source from which the new data was created.
Select Table for the virtual population option.
Click Edit and select the tables you created, and then click Apply.
Opt to edit the Basic Information fields, and then click Create.

Reference Guides

Writing to Projects

Project workspaces

With equalization enabled, project users can create Snowflake or Databricks workspaces where users can view and write data. Then, those users can create derived data sources to share this data with other users.

Snowflake project workspaces

Combining Immuta projects and Snowflake workspaces allows users to access and write data directly in Snowflake.

With Snowflake workspaces, Immuta enforces policy logic on registered tables and represents them as secure views in Snowflake. Since secure views are static, creating a secure view for every unique user in your organization for every table in your organization would result in secure view bloat; however, Immuta addresses this problem by virtually grouping users and tables and equalizing users to the same level of access, ensuring that all members of the project see the same view of the data. Consequently, all members share one secure view.

While interacting directly with Snowflake secure views in these workspaces, users can write within Snowflake and create derived data sources, all the while collaborating with other project members at a common access level. Because these derived data sources will inherit all of the appropriate policies, that data can then be shared outside the project. Additionally, derived data sources use the credentials of the Immuta system Snowflake account, which will allow them to persist after a workspace is disconnected.

Snowflake workspaces can be used on their own or with the Snowflake integration.

Policy enforcement

Immuta enforces policy logic on data and represents it as secure views in Snowflake. Because projects group users and tables and equalize members to the same level of access, all members will see the same view of the data and, consequently, will only need one secure view. Changes to policies immediately propagate to relevant secure views.

Snowflake project workspace workflow

An Immuta user with the CREATE_PROJECT permission creates a new project with Snowflake data sources.
The Immuta project owner enables project equalization which balances every project members’ access to the data to be the same.
The Immuta project owner creates a Snowflake project workspace which automatically generates a subfolder in the root path specified by the application admin and remote database associated with the project.
Project members can access data sources within the project and use WRITE to create derived tables. To ensure equalization, users will only see data sources within their project as long as they are working in the Snowflake Context.
The CREATE_DATA_SOURCE_IN_PROJECT permission is given to specific users so they can expose their derived tables in the Immuta project; the derived tables will inherit the policies, and then the data can be shared outside the project.
If a project member leaves a project or a project is deleted, that Snowflake Context will be removed from the user's Snowflake account.

Root Directory Details

Immuta only supports a single root location, so all projects will write to a subdirectory under this single root location.
If an administrator changes the default directory, the Immuta user must have full access to that directory. Once any workspace is created, this directory can no longer be modified.

Mapping projects to secure views

Immuta projects are represented as Session Contexts within Snowflake. As they are linked to Snowflake, projects automatically create corresponding

roles in Snowflake: IMMUTA_[project name]
schemas in the Snowflake IMMUTA database: [project name]
secure views in the project schema for any table in the project

To switch projects, users have to change their Snowflake Session Context to the appropriate Immuta project. If users are not entitled to a data source contained by the project, they will not be able to access the Context in Snowflake until they have access to all tables in the project. If changes are made to a user's attributes and access level, the changes will immediately propagate to the Snowflake Context.

Because users access data only through secure views in Snowflake, it significantly decreases the amount of role management for administrators in Snowflake. Organizations should also consider having a user in Snowflake who is able to create databases and make GRANTs on those databases and having separate users who are able to read and write from those tables.

Benefits

Few roles to manage in Snowflake; that complexity is pushed to Immuta, which is designed to simplify it.
A small set of users has direct access to raw tables; most users go through secure views only, but raw database access can be segmented across departments.
Policies are built by the individual database administrators within Immuta and are managed in a single location, and changes to policies are automatically propagated across thousands of tables’ secure views.
Self-service access to data based on data policies.
Users work in various contexts in Snowflake natively, based on their collaborators and their purpose, without fear of leaking data.
All policies are enforced natively in Snowflake without performance impact.
- Security is maintained through Snowflake primitives (roles and secure views).
- Performance and scalability is maintained (no proxy).
Policies can be driven by metadata, allowing massive scale policy enforcement with only a small set of actual policies.
Derived tables can be shared back out through Immuta, improving collaboration.
User access and removal are immediately reflected in secure views.

Databricks Spark project workspaces

Using Immuta projects and project equalization, Databricks Spark project workspaces are a space where every project member has the same level of access to data. This equalized access allows collaboration without worries about data leaks. Not only can project members collaborate on data, but they can also write protected data to the project.

Users will only be able to access the directory and database created for the workspace when acting under the project. The Immuta Spark SQL Session will apply policies to the data, so any data written to the workspace will already be compliant with the restrictions of the equalized project, where all members see data at the same level of access. When users are ready to write data to the project, they should use the SparkSQL session to copy data into the workspace.

Databricks project workspace workflow

An Immuta user with the CREATE_PROJECT permission creates a new project with Databricks data sources.
The Immuta project owner enables project equalization which balances every project members’ access to the data to be the same.
The Immuta project owner creates a Databricks project workspace which automatically generates a subfolder in the root path specified by the application admin and remote database associated with the project.
The Immuta project members query equalized data within the context of the project, collaborate, and write data, all within Databricks.
The Immuta project members use their newly written derived data and register the derived tables in Immuta as derived data sources. These derived data sources inherit the necessary Immuta policies to be securely shared outside of the project.

Root directory details

Immuta only supports a single root location, so all projects will write to a subdirectory under this single root location.
If an administrator changes the default directory, the Immuta user must have full access to that directory. Once any workspace is created, this directory can no longer be modified.
Administrators can place a configuration value in the cluster configuration (core-site.xml) to mark that cluster as unavailable for use as a workspace.

Read and write data

When acting in the workspace project, users can read data using calls like spark.read.parquet("immuta:///some/path/to/a/workspace").
To write delta lake data to a workspace and then expose that delta table as a data source in Immuta, you must specify a table when creating the derived data source (rather than a directory) in the workspace for the data source.

Supported cloud providers

Microsoft Azure

Immuta currently supports the abfss schema for Azure General Purpose V2 Storage Accounts. this includes support for Azure Data Lake Gen 2. When configuring Immuta workspaces for Databricks on Azure, the Azure Databricks workspace ID must be provided. More information about how to determine the workspace ID for your workspace can be found in the Databricks documentation. It is also important that the additional configuration file is included on any clusters that wish to use Immuta workspaces with credentials for the container in Azure Storage that contains Immuta workspaces.

Google Cloud Platform

Immuta currently supports the gs schema for Google Cloud Platform. The primary difference between Databricks on Google Cloud Platform and Databricks on AWS or Azure is that it is deployed to Google Kubernetes Engine. Databricks handles automatically provisioning and auto scaling drivers and executors to pods on Google Kubernetes Engine, so Google Cloud Platform admin users can view and monitor the Google Kubernetes resources in the Google Cloud Platform.

Caveats and limitations

Stage Immuta installation artifacts in Google Storage, not DBFS: The DBFS FUSE mount is unavailable, and the IMMUTA_SPARK_DATABRICKS_DBFS_MOUNT_ENABLED property cannot be set to true to expose the DBFS FUSE mount.
Stage the Immuta init script in Google Storage: Init scripts in DBFS are not supported.
Stage third party libraries in DBFS: Installing libraries from Google Storage is not supported.
Install third party libraries as cluster-scoped: Notebook-scoped libraries have limited support. See the Databricks Libraries page for more details.
Maven library installation is only supported in Databricks Runtime 8.1+.
/databricks/spark/conf/spark-env.sh is mounted as read-only:
- Set sensitive Immuta configuration values directly in immuta_conf.xml: Do not use environment variables to set sensitive Immuta properties. Immuta is unable to edit the spark-env.sh file because it is read-only; therefore, remove environment variables and keep them from being visible to end users.
- Use /immuta-scratch directly: The IMMUTA_LOCAL_SCRATCH_DIR property is unavailable.
Allow the Kubernetes resource to spin down before submitting another job: Job clusters with init scripts fail on subsequent runs.
The DBFS CLI is unavailable: Other non-DBFS Databricks CLI functions will still work as expected.

Writing data to projects: supported metastore providers for Databricks

To write data to a table in Databricks through an Immuta workspace, use one of the following supported provider types for your table format:

avro
csv
delta
orc
parquet

Derived data sources

Deprecation notice: Support for this feature has been deprecated.

A derived data source is a data source that is created within an equalized project and contains data from its parent sources. Consequently, when the derived data source is created, it will inherit the data policies from its parent data sources to keep the data secure.

Policy inheritance for derived data sources is a feature unique to the environment that an equalized project creates. Within the equalized project, every user sees the same data and work can be shared and collaborated on without any risk of a user viewing more than they should. When a derived data source is created, it inherits the data policies from its parent sources and a subscription policy is created from the equalized entitlements on the project, allowing project members to safely share secure data.

Example

Consider these data sources, within an equalized Project 1, that each contain subscription and data policies:

Data source A
- Subscription policy: Allow users to subscribe to the data source when user is a member of group Medical Claims
- Data policies:
  - Mask by making null the value in the column(s) city except for members of group Legal
  - Mask by making null the value in the column(s) gender for everyone
Data source B
- Subscription policy: Allow users to subscribe to the data source when user is approved by anyone with permission owner and anyone with permission governance
- Data policy: Limit usage to purpose(s) Research for everyone

If a user creates a derived data source, Data Source C, from these two data sources, Data Source C will inherit these policies, which will be unchangeable:

Data source C
- Subscription policy: Allow user to subscribe when they satisfy all of the following:
  - is a member of group Legal and is a member of group Medical Claims
  - is approved by anyone with permission owner (of data source B) and anyone with permission governance
- Data policy: Limit usage to purpose(s) Research for everyone

Derived data sources inherit policies from parent sources

Sensitive data discovery applies Discovered tags to derived data sources; however, because they inherit policies from their parent sources, the global policies that contain these tags will not apply to derived data sources.

Behavior

Notice that one of the data policies in Data Source A, mask by making null the value in the column(s) gender for everyone, is not included in data source C. This is because the creator could not have seen the values in the parent sources; therefore, there are no values in the derived data source to be masked.
Most local data policies will not need to be present in the derived data source with the exception of limit usage to purpose(s) policies. And no global policies will be added to a derived data source.
Data source C's policies are reliant on which groups are in the project, and as the groups change so do the policies.
For example, if there were a data user in the project who was not in the Legal group, then that trait would not be needed in the subscription policy because, with equalization, those values would not be visible to the project members in the parent data source.
The subscription and data policies in the derived data source will always be the minimum required permissions and traits because of project equalization.
Derived data source policies will not adapt with the parent data sources. Any changes in the parent data source policies will be logged in the Relationships tab of the derived data source page, but will not be changed in the derived data source policies.
The data owner may choose to add new local data policies to the derived data source to keep up with any changes, but the inherited policies are not adjustable.
Any changes within the parent data source's data will not trickle down into the derived data source. After the creation of the derived data source, they stay connected for auditing and relationships, not for updating content.

Using data outside the project

If members use data outside the project to create their data source, they must first add that data to the project and re-derive the data source through the project connection. When creating a derived data source, members are prompted to certify that their data is derived from the parent data sources they selected upon creation.

For detailed instructions on creating a derived data source, navigate to Create a derived data source.

Project UDFs (Databricks)

You can switch project contexts and view a list of your current project or available projects through UDFs in Spark.

Available functions

UDF

Description

immuta.set_current_project(id)

Sets the user's current project to the project ID denoted by the id parameter. This UDF must be called in its own notebook cell to ensure the changes take effect.

immuta.set_current_project() (no parameters)

Sets the user's current project to None.

immuta.clear_caches()

Clears all client caches for the current user's ImmutaClient instance. This can be used when a user would like to invalidate cached items, like data source subscription information or if the state of Immuta has changed and the cache is outdated. For backward compatibility, this UDF is also available at default.immuta_clear_caches()

default.immuta_clear_metastore_cache()

Clears the cluster-wide Metastore cache. This UDF can only be run by a privileged user.

Virtual tables

To view a list of your current project or available projects in a Spark job, you can query these virtual tables.

Virtual Table

Query

Return

immuta.get_current_project

select * from immuta.get_current_project

This virtual table returns a single row with "name" and "id" columns that show your currently selected project.

immuta.list_projects

select * from immuta.list_projects

This virtual table returns rows with "name," "id," and "current_project" columns. Each row is a different project to which you are subscribed (and can use as your current project). The "current_project" row will be true for the row defining the project that you have set as your current project.

Writing to Projects

Project workspaces

Snowflake project workspaces

Combining Immuta projects and Snowflake workspaces allows users to access and write data directly in Snowflake.

Snowflake workspaces can be used on their own or with the Snowflake integration.

Policy enforcement

Snowflake project workspace workflow

An Immuta user with the CREATE_PROJECT permission creates a new project with Snowflake data sources.
The Immuta project owner enables project equalization which balances every project members’ access to the data to be the same.
The Immuta project owner creates a Snowflake project workspace which automatically generates a subfolder in the root path specified by the application admin and remote database associated with the project.
Project members can access data sources within the project and use WRITE to create derived tables. To ensure equalization, users will only see data sources within their project as long as they are working in the Snowflake Context.
The CREATE_DATA_SOURCE_IN_PROJECT permission is given to specific users so they can expose their derived tables in the Immuta project; the derived tables will inherit the policies, and then the data can be shared outside the project.
If a project member leaves a project or a project is deleted, that Snowflake Context will be removed from the user's Snowflake account.

Root Directory Details

Immuta only supports a single root location, so all projects will write to a subdirectory under this single root location.
If an administrator changes the default directory, the Immuta user must have full access to that directory. Once any workspace is created, this directory can no longer be modified.

Mapping projects to secure views

Immuta projects are represented as Session Contexts within Snowflake. As they are linked to Snowflake, projects automatically create corresponding

roles in Snowflake: IMMUTA_[project name]
schemas in the Snowflake IMMUTA database: [project name]
secure views in the project schema for any table in the project

Benefits

Few roles to manage in Snowflake; that complexity is pushed to Immuta, which is designed to simplify it.
A small set of users has direct access to raw tables; most users go through secure views only, but raw database access can be segmented across departments.
Policies are built by the individual database administrators within Immuta and are managed in a single location, and changes to policies are automatically propagated across thousands of tables’ secure views.
Self-service access to data based on data policies.
Users work in various contexts in Snowflake natively, based on their collaborators and their purpose, without fear of leaking data.
All policies are enforced natively in Snowflake without performance impact.
- Security is maintained through Snowflake primitives (roles and secure views).
- Performance and scalability is maintained (no proxy).
Policies can be driven by metadata, allowing massive scale policy enforcement with only a small set of actual policies.
Derived tables can be shared back out through Immuta, improving collaboration.
User access and removal are immediately reflected in secure views.

Databricks Spark project workspaces

Databricks project workspace workflow

An Immuta user with the CREATE_PROJECT permission creates a new project with Databricks data sources.
The Immuta project owner enables project equalization which balances every project members’ access to the data to be the same.
The Immuta project owner creates a Databricks project workspace which automatically generates a subfolder in the root path specified by the application admin and remote database associated with the project.
The Immuta project members query equalized data within the context of the project, collaborate, and write data, all within Databricks.
The Immuta project members use their newly written derived data and register the derived tables in Immuta as derived data sources. These derived data sources inherit the necessary Immuta policies to be securely shared outside of the project.

Root directory details

Immuta only supports a single root location, so all projects will write to a subdirectory under this single root location.
If an administrator changes the default directory, the Immuta user must have full access to that directory. Once any workspace is created, this directory can no longer be modified.
Administrators can place a configuration value in the cluster configuration (core-site.xml) to mark that cluster as unavailable for use as a workspace.

Read and write data

When acting in the workspace project, users can read data using calls like spark.read.parquet("immuta:///some/path/to/a/workspace").
To write delta lake data to a workspace and then expose that delta table as a data source in Immuta, you must specify a table when creating the derived data source (rather than a directory) in the workspace for the data source.

Supported cloud providers

Microsoft Azure

Google Cloud Platform

Caveats and limitations

Stage Immuta installation artifacts in Google Storage, not DBFS: The DBFS FUSE mount is unavailable, and the IMMUTA_SPARK_DATABRICKS_DBFS_MOUNT_ENABLED property cannot be set to true to expose the DBFS FUSE mount.
Stage the Immuta init script in Google Storage: Init scripts in DBFS are not supported.
Stage third party libraries in DBFS: Installing libraries from Google Storage is not supported.
Install third party libraries as cluster-scoped: Notebook-scoped libraries have limited support. See the Databricks Libraries page for more details.
Maven library installation is only supported in Databricks Runtime 8.1+.
/databricks/spark/conf/spark-env.sh is mounted as read-only:
- Set sensitive Immuta configuration values directly in immuta_conf.xml: Do not use environment variables to set sensitive Immuta properties. Immuta is unable to edit the spark-env.sh file because it is read-only; therefore, remove environment variables and keep them from being visible to end users.
- Use /immuta-scratch directly: The IMMUTA_LOCAL_SCRATCH_DIR property is unavailable.
Allow the Kubernetes resource to spin down before submitting another job: Job clusters with init scripts fail on subsequent runs.
The DBFS CLI is unavailable: Other non-DBFS Databricks CLI functions will still work as expected.

Writing data to projects: supported metastore providers for Databricks

To write data to a table in Databricks through an Immuta workspace, use one of the following supported provider types for your table format:

avro
csv
delta
orc
parquet

Derived data sources

Deprecation notice: Support for this feature has been deprecated.

Example

Consider these data sources, within an equalized Project 1, that each contain subscription and data policies:

Data source A
- Subscription policy: Allow users to subscribe to the data source when user is a member of group Medical Claims
- Data policies:
  - Mask by making null the value in the column(s) city except for members of group Legal
  - Mask by making null the value in the column(s) gender for everyone
Data source B
- Subscription policy: Allow users to subscribe to the data source when user is approved by anyone with permission owner and anyone with permission governance
- Data policy: Limit usage to purpose(s) Research for everyone

If a user creates a derived data source, Data Source C, from these two data sources, Data Source C will inherit these policies, which will be unchangeable:

Data source C
- Subscription policy: Allow user to subscribe when they satisfy all of the following:
  - is a member of group Legal and is a member of group Medical Claims
  - is approved by anyone with permission owner (of data source B) and anyone with permission governance
- Data policy: Limit usage to purpose(s) Research for everyone

Derived data sources inherit policies from parent sources

Behavior

Notice that one of the data policies in Data Source A, mask by making null the value in the column(s) gender for everyone, is not included in data source C. This is because the creator could not have seen the values in the parent sources; therefore, there are no values in the derived data source to be masked.
Most local data policies will not need to be present in the derived data source with the exception of limit usage to purpose(s) policies. And no global policies will be added to a derived data source.
Data source C's policies are reliant on which groups are in the project, and as the groups change so do the policies.
For example, if there were a data user in the project who was not in the Legal group, then that trait would not be needed in the subscription policy because, with equalization, those values would not be visible to the project members in the parent data source.
The subscription and data policies in the derived data source will always be the minimum required permissions and traits because of project equalization.
Derived data source policies will not adapt with the parent data sources. Any changes in the parent data source policies will be logged in the Relationships tab of the derived data source page, but will not be changed in the derived data source policies.
The data owner may choose to add new local data policies to the derived data source to keep up with any changes, but the inherited policies are not adjustable.
Any changes within the parent data source's data will not trickle down into the derived data source. After the creation of the derived data source, they stay connected for auditing and relationships, not for updating content.

Using data outside the project

For detailed instructions on creating a derived data source, navigate to Create a derived data source.