Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
There are three main use cases Immuta projects can help with:
The goal of Immuta is to modernize the management of data policies in organizations. One key aspect of modernization is to remove day-to-day human involvement in policy decision making, which is fragile and subjective. One decision process is how and when to make exceptions to policies.
This decision process could be something like: “because Morgan is analyzing employee attrition and retention, they should be able to see employee satisfaction survey data.” Notice it is not “because Morgan is HR, they should be able to see employee satisfaction survey data.” It’s not who Morgan is, but what Morgan is doing that should allow them heightened access. The purpose for data access drives the policy decision and can be approved objectively because you are not approving who can see data, but what can be done with it. However, if Morgan has to ask permission every time there is a new survey, this becomes a subjective and time-consuming process for the organization.
This is where purposes and purpose-based access control can help. Purposes allow you to define exceptions to rules as the purpose for which the user is acting. The key point here is you can define these purposes ahead of time, before any user actually tries to get an exception. Immuta projects are valuable not just for purpose-based access control, but because of the documentation trail they provide, the collaboration they allow, and the data access process they automate.
Additionally, to ensure that purposes are being used correctly and applied accurately, project purposes are approved before the purpose is active in a project. That approver will see the lists of tables, the project members, and project documentation and decide if they want to approve it or not. This approval is recorded by Immuta, creating a documentation trail. After a purpose is approved within a project, any changes to the members or data sources will require the purpose to be re-approved, ensuring that the project continues to be compliant to an organization's requirements.
Create a project with a purpose and add users, data sources, and documentation.
A Governor or Project manager approves the purpose.
Query data while acting in the context of the project.
Exceptions to policies can be built ahead of time with purposes.
Users can independently create projects to use purposes as exceptions to policies.
A permissioned user can then approve that purpose to a specific project, when asked, based on the project details.
Let’s say Sally and Bob are working together on a project, and Sally has more data access than Bob. She can see PII and Bob cannot, but they both can see credit card numbers. Without Immuta, an admin would have to know what tables they intend to use, scrub all those tables of PII, and then give Sally and Bob access to those new tables in a place where they can safely work on the data and save any output they create.
With Immuta, this is a lot easier. Sally or Bob could create the project, add the tables as data sources, invite the other person to be a project member, and equalize it. The equalization will compare the members of the project to the data policies on the tables in the project and find the intersection: Sally and Bob both can see credit card numbers. That intersection becomes the equalization setting. Once the project is equalized in this example, Sally will lose access to PII, but both Sally and Bob will retain access to credit card numbers.
Now Sally and Bob want to do some transformation on the data and write it somewhere. This is where project workspaces come into play. Once workspaces are configured in a Snowflake or Databricks integration, Immuta will create a schema in the native database dedicated to this project where Sally and Bob can write their output. Within this workspace
Only members of the project will have access to that schema to write to or read from. (For example, in Snowflake Immuta limits it to a particular role it creates in Snowflake.) This is important, because if someone who can’t see credit card numbers somehow had access to where Sally and Bob were writing, they would gain access to data (the credit card numbers) they shouldn’t see.
They will only be able to access the tables that are in the project. (This may be critical if someone approved the purpose for only those tables.)
After reading and writing data within the workspace, users can create derived data sources. With derived data sources, Sally and Bob can securely share the data they have written by registering the derived table or view they created within the project and expose that to people outside of the project. Immuta will ensure the proper policy is in place on that shared table based on the equalization setting of the project.
When determining where you should give analysts WRITE access, you need to consider the entire universe of where they have READ access, and that universe is constantly changing. This is an impossible proposition for you to manage without Immuta projects.
Users at different levels of access can work together without help from an admin (to scrub the data).
Customers can avoid data leaks on analyst writes.
Sometimes, accessing two tables separately doesn't violate compliance regulations, but accessing these two tables when they are joined creates a serious privacy problem. Immuta can help avoid creating these toxic combinations of data.
Tables are joined on a key, a column in one table that matches a column on the other table and allows the join. To prevent joining those tables, you would mask the keys so they no longer match one another. This is an important distinction: you do not make data anonymous only because it’s directly sensitive (a direct identifier) or because it’s indirectly sensitive (an indirect identifier) but potentially because it’s a join key that you may not want to be used for joining.
Because of this, when masking a column (for the policy types that support it) Immuta breaks referential integrity by default, making sure the masked values aren’t able to join. This means you can’t join on two masked columns unless you tell Immuta you want to allow that. To do so, you have to add those tables with the masked keys to a project and enable Masked Joins.
Projects give you control over toxic data combinations.
For details about the Immuta project features and components, see the Projects and Purposes page.
Audience: Project members
Content Summary: This page explains Snowflake project workspaces, which allow users to access protected data in and write data to Snowflake.
See the Pre-Configuration Checklist for details on prerequisites and the Configuration page for installation instructions.
Combining Immuta projects and Snowflake workspaces allows users to access and write data directly in Snowflake.
With Snowflake workspaces, Immuta enforces policy logic on registered tables and represents them as secure views in Snowflake. Since secure views are static, creating a secure view for every unique user in your organization for every table in your organization would result in secure view bloat; however, Immuta addresses this problem by virtually grouping users and tables and equalizing users to the same level of access, ensuring that all members of the project see the same view of the data. Consequently, all members share one secure view.
While interacting directly with Snowflake secure views in these workspaces, users can write within Snowflake and create derived data sources, all the while collaborating with other project members at a common access level. Because these derived data sources will inherit all of the appropriate policies, that data can then be shared outside the project. Additionally, derived data sources use the credentials of the Immuta system Snowflake account, which will allow them to persist after a workspace is disconnected.
Snowflake workspaces can be used on their own or with the Snowflake integration.
Immuta enforces policy logic on data and represents it as secure views in Snowflake. Because projects group users and tables and equalize members to the same level of access, all members will see the same view of the data and, consequently, will only need one secure view. Changes to policies immediately propagate to relevant secure views.
Immuta projects are represented as Session Contexts within Snowflake. As they are linked to Snowflake, projects automatically create corresponding
roles in Snowflake: IMMUTA_[project name]
schemas in the Snowflake IMMUTA database: [project name]
secure views in the project schema for any table in the project
To switch projects, users have to change their Snowflake Session Context to the appropriate Immuta project. If users are not entitled to a data source contained by the project, they will not be able to access the Context in Snowflake until they have access to all tables in the project. If changes are made to a user's attributes and access level, the changes will immediately propagate to the Snowflake Context.
Because users access data only through secure views in Snowflake, it significantly decreases the amount of role management for administrators in Snowflake. Organizations should also consider having a user in Snowflake who is able to create databases and make GRANTs on those databases and having separate users who are able to read and write from those tables.
Few roles to manage in Snowflake; that complexity is pushed to Immuta, which is designed to simplify it.
A small set of users has direct access to raw tables; most users go through secure views only, but raw database access can be segmented across departments.
Policies are built by the individual database administrators within Immuta and are managed in a single location, and changes to policies are automatically propagated across thousands of tables’ secure views.
Self-service access to data based on data policies.
Users work in various contexts in Snowflake natively, based on their collaborators and their purpose, without fear of leaking data.
All policies are enforced natively in Snowflake without performance impact.
Security is maintained through Snowflake primitives (roles and secure views).
Performance and scalability is maintained (no proxy).
Policies can be driven by metadata, allowing massive scale policy enforcement with only a small set of actual policies.
Derived tables can be shared back out through Immuta, improving collaboration.
User access and removal are immediately reflected in secure views.
Projects combine users and data sources under a common purpose, which can then be used to restrict access to data and streamline collaboration. When project equalization is enabled, for example, users working under the same project will see the same data, regardless of their varying levels of access. Additionally, project workspaces allow users to write data back to Immuta and share their analysis with other users.
Best practices for using Immuta projects
Tutorials contain call-outs with best practices throughout this section; however, here is an outline of the best practices when using projects.
Use a naming convention for projects that reflects the naming convention for databases. (e.g., If the project in Dev is called: “my_project” name the project “dev_my_project.") The data will end up in the project database prefix, so you can trace the source and make edits upstream in that project as necessary.
Use project equalization so that all project members see the same data, and re-equalize projects if new members or data sources are added to the project.
Use Immuta's recommended equalized entitlements to protect your data in projects.
Use project workspaces to allow users to write data back to Immuta.
Consider purposes as attributes. Attributes identify a user, and purposes identify why that user should have access.
This section includes conceptual, reference, and how-to guides for projects. Some of these guides are provided below. See the left navigation for a complete list of resources.
Immuta project workspaces:
Audience: Data Governors
Content Summary: Data Governors are responsible for configuring settings and purposes for projects in Immuta. This guide details the process of managing purposes, viewing purpose-based restriction details, and configuring acknowledgement statements.
Additional Tutorials Contents:
Customizing Acknowledgement Statements
Viewing Purpose-Based Restrictions Details
Deleting Purpose-Based Restrictions
Manage Purpose Requests
Use Case
Compliance Requirement: Users can only WRITE to specified locations in Dev and share this data with other Dev users.
In addition to creating a project to meet this requirement, the compliance team needs to ensure that Dev data is only being used for specific purposes. To connect purposes to data, the compliance team must create purposes and acknowledgement statements that can be applied to projects and their respective data sources.
Click the Governance icon in the left sidebar, and then click the Purposes tab.
Click the Add Purpose button.
Type in the name of the new purpose in the new empty text box, opt to customize the acknowledgement statement or add a description, and then click Create.
If descriptions are added to purposes, the description will be visible to users when they select the View Details icon when looking at a policy or on the project page.
Click the Governance icon in the left sidebar and select the Purposes tab.
Open the dropdown menu and click Edit in the Actions column of the purpose you would like to add sub-purposes to.
Scroll to the bottom of this page and click Add Sub-Purposes.
Enter a name in the Enter nested purpose field in the Sub-Purpose Builder.
Click the arrow to the right of the purpose or sub-purpose(s) to continue adding nested purpose fields.
Click Save.
A list of sub-purposes will populate at the bottom of the page. You can manage these sub-purposes by clicking Edit in the Actions column at any time.
Customizing Acknowledgment Statements
Click the Governance icon in the left sidebar and select the Purposes tab.
Open the dropdown menu and click Edit in the Actions column of the purpose you would like to customize.
Click View or Edit above the acknowledgement statement, customize the text, and then click Confirm.
The page displays the updated statement, which now will be used by all projects and purposes. The updated statement will also be used by any new members joining existing projects containing purposes with default statements.
Customizing Acknowledgement Statements for Sub-Purposes
By default, sub-purposes will inherit the acknowledgment statements of their parent purposes.
To customize the acknowledgement statement for an individual sub-purpose,
Click the Governance icon in the left sidebar and select the Purposes tab.
Click Edit in the Actions column of the parent containing the sub-purpose you would like to customize.
In the Sub-Purposes section, click Edit in the Actions column of the sub-purpose you would like to manage.
Customize the Acknowledgement Statement in the dialog that appears, opt to require users to reacknowledge the sub-purpose, and then click Confirm.
Viewing Purpose-Based Restrictions Details
Navigate to the Governance page, and then click the Purposes tab.
A list of all purposes is displayed. To filter the list, enter the name of the purpose in the Filter Purposes By Name text box.
Click a purpose from the results to view its details.
Details about the selected purpose include the date the purpose was created, when it was last modified, and the number of projects that purpose is associated with.
Deleting Purpose-Based Restrictions
Navigate to the Governance page, and then click on the Purposes tab.
Click the dropdown menu* icon in the Actions column of the purpose or sub-purpose you want to delete.
Select Delete, and then click Confirm to approve the deletion.
Any project that contained the deleted purpose will be treated as no longer compliant, so the project's name will display in red on the My Project and Project Overview pages. Project Owners can get details for fixing the issue by hovering their cursor over the project name.
Manage Purpose Requests
Navigate to your user profile page and click the Requests tab.
Approve or deny the purpose request in the Actions column.
Requirement: Immuta permission CREATE_PROJECT
Best practice: project naming convention
Use a naming convention for projects that reflects the naming convention for databases. (e.g., If the project in Dev is called: “my_project” name the project “dev_my_project.") The data will end up in the project database prefix, so you can trace the source and make edits upstream in that project as necessary.
Navigate to the Projects tab under Data in the sidebar, and click the New Projects button.
Fill out the Basic Information:
Enter a name for your project in the Project Name field.
Opt to complete the Project Description field to help identify your project.
Opt to enter project Documentation to provide context for members.
Select the purposes and any policy adjustments:
Choose to select a purpose from the list of purposes or create a new purpose for the project.
To create a new purpose, click Create Purpose and fill out the modal.
Note that all purposes added to a project will need to be created by a data governor or a user with the PROJECT_MANAGEMENT permission, and once purposes have been applied to a project, only these users can add data sources to the project.
Add a native workspace configuration: Select your workspace configuration from the Workspace Configuration dropdown menu: Databricks or Snowflake.
Databricks: Opt to edit the sub-directory in the Workspace Directory field (this sub-directory auto-populates as the project name) and enter the Workspace Database Name.
Snowflake: Name the Workspace Schema. By default, the schema name is based off of the project name, but you can change it here. Your project workspace will exist within this schema under Snowflake under the database configured by the application admin.
Use the dropdown menu to select the Hostname. Projects can only be configured to use one Snowflake host.
Select one or more Warehouses to be available to project members when they are working in the native workspace.
Add data sources to the project using the dropdown menu. Data sources can also be added after the project is created.
Click Affirm and Create.
Projects are private by default but can be made public and shared with other users by changing the subscription policies setting. Governors are the only users who can manage subscription policies for projects with purposes.
In the project, click the Policies tab.
Click Edit Subscription Policy.
Select the group of users who will have access. Click the tabs below for a definition of and specific instructions for each:
Selecting this option makes the project visible to everyone. Opt to require manual subscription by selecting the checkbox. This will require the users to manually subscribe to the project to gain access.
Selecting this option makes the project visible in search results, but users must request access and be granted permission. This restriction supports multiple approving parties, so project owners can allow more than one approver or users with specified permission types to approve other users who request access to the project.
Click anyone or an individual selected by user from the first dropdown menu.
Note: If you choose an individual selected by user, when users request access to a project they will be prompted to identify an approver with the permission specified in the policy.
Select the USER_ADMIN, GOVERNANCE, or AUDIT permission from the subsequent dropdown menu. You can add more than one approving party by selecting + Add Another Approver.
Choose whether to build the policy off user groups or user attributes:
is a member of group: Type the group name and select the group.
possesses attribute: Type the attribute and select it. Then select the value from the dropdown menu.
Opt to + Add Another Condition. When adding another condition, choose how the conditions will be required. If you select or, only one of the conditions must apply to a user for them to subscribe to the project. If you select and, all of the conditions must apply.
Opt to allow users who do not meet the restrictions defined in the policy to still be able to discover the project by selecting the Allow Project Discovery checkbox.
Once saved, users with the proper authorizations will be automatically subscribed. Opt to require users to manually subscribe to the project by selecting the Require Manual Subscription checkbox.
Selecting this option hides the project from the search results. Project owners must manually add and remove users, and the Private label will appear next to the project name.
Click Save to finish your policy.
In the project, click the Members tab.
Click the Add Members button.
Start typing a user's or group's name in the Add Members modal and select it from the dropdown that appears.
Opt to add an expiration to the subscription by entering the number of days until the access will expire.
Select the role.
Click Add.
Current project members will receive notifications that new users have been added to the project. A similar entry will be posted to the project's activity pane.
Use project equalization so that all project members see the same data, and re-equalize projects if new members or data sources are added to the project.
In the project, click the Policies tab.
In the Project Equalization section, click the toggle button to On.
Note: Only project owners can add data sources to the project if this feature is enabled.
Best practice: use the recommended equalized entitlements
Use Immuta's recommended equalized entitlements to protect your data in projects. Changing these entitlements creates two potential disadvantages:
If you add entitlements, members might see more data as a whole, but at least some members of the project will be out of compliance.
If you remove entitlements, the project will be open to users with fewer privileges, but this change might make less data visible to all project members. Removing entitlements is only recommended if you foresee new users joining with less access to data than the current members.
Click Edit next to Equalized Entitlements.
In the Equalized Entitlements Builder, select either is a member of a group or possesses attribute from the user condition dropdown menu.
If you selected is a member of a group, select the appropriate group from the resulting dropdown.
If you selected possesses attribute, select the appropriate key and value from the subsequent dropdown menus.
Click Save.
To view members' compliance status after changing the equalized entitlements,
Navigate to the Members tab from the Project Overview page.
Click the Not In Compliance text to view the details about the user's status.
Users who are not in compliance will be unable to view data sources within the project until the compliance issues are resolved.
To revert entitlements to those recommended by Immuta,
Click Edit next to Equalized Entitlements.
Click Use Recommended.
Click Confirm.
Update the validation frequency to specify how often users must log into Immuta to retain access to the project.
Click Edit in the Validation Frequency section.
Enter an integer in the first field of the Validation Frequency modal that appears.
Select Days or Hours in the next dropdown.
Click Save.
Navigate to the Policies tab.
In the Project Equalization section, click the toggle button to Off.
Click Yes, Turn Off in the confirmation window.
Project owners or governors can disable projects, which hides the project from everyone but the project owner, or enable projects. However, only the project owner can delete a project. After a project is deleted, it cannot be enabled.
Click the Data icon and select Projects in the sidebar.
Select the My Projects tab.
Click the three-dot menu icon next to the project and select Disable.
A label will appear next to the project indicating it has been disabled, and a notification will be sent out to all subscribers.
Click the Data icon and select Projects in the sidebar.
Select the My Projects tab.
Click the three-dot menu icon next to the project and select Enable.
The label indicating the project was disabled will disappear, and a notification will be sent out to all subscribers.
Deleting a project permanently removes it from Immuta. Projects must first be disabled before they can be deleted.
Click the Data icon and select Projects in the sidebar.
Select the My Projects tab.
Click the three-dot menu icon next to the disabled project and select Delete.
Click Confirm.
The project is now removed from Immuta, and a notification will be sent out to all subscribers.
Enabling, this feature allows masked columns to be joined within a project.
Navigate to the Project Overview tab.
Click the Allow Masked Joins toggle on.
Click Confirm.
Note: While masked joins are allowed, only project owners can add data sources to the project. Additionally, Immuta does not allow joining on columns masked by rounding, by making null, with a constant, or with a regex or on columns that have conditional masking policies applied to them.
Any project member can add data sources to a project, unless project equalization or masked joins is enabled; in those cases, only project owners can add data sources.
Set your current project to be the one you want new data sources in.
Navigate to the Data Sources page.
Select the checkboxes for the data sources you want in a project.
Select the bulk actions three-dot menu icon in the top right corner.
Click Add To Current Project.
Navigate to the Project Overview tab.
Click the Add Data Sources button.
Start typing the name of a data source you'd like to include in the project.
Select the data source from the list of auto-completed options in the dropdown menu.
Repeat this process to add additional data sources to the list. You can remove them using the three-dot menu icon.
Opt to re-equalize the project by clicking the toggle on.
When complete, click the Save button at the bottom of the list.
You can automatically add all data sources to a project that contain a Limit usage to purpose policy that matches the purpose of that project.
Select a Project, and click the Add Data Sources button.
Click Add By Purpose.
All data sources matching the project's purpose(s) will populate at the bottom of the dialog. Review this list, and then click Save.
Deprecation notice
Support for this feature has been deprecated.
Project owners can create, reply to, and delete project discussions.
Navigate to the Discussions tab and click New Discussion.
Enter your text in the Start Discussion box, and then click Save.
Navigate to the Discussions tab and view open and resolved discussions by clicking the Open or Resolved button, respectively.
Click a discussion thread and enter your response in the Enter Reply field.
Click Reply to post your response.
Navigate to the Discussions tab and click the Open button to view all open discussions.
Click a discussion thread.
Click the Mark Resolved button.
This discussion thread will now be saved with other resolved threads, and users will still be able to reply to it by clicking the Resolved button on the Discussions tab.
To permanently delete a discussion thread,
Navigate to the Discussions tab and view open and resolved discussions by clicking the Open or Resolved button, respectively.
Click the Delete button for the discussion you want to delete.
Click Delete in the confirmation window that appears.
The discussion thread and all of its comments are now deleted.
To delete a single reply,
Select a discussion thread.
Click the Delete button of the reply or comment you want to delete.
Click Delete to permanently delete the comment.
Project owners can update the documentation for a project at any time. If no documentation is entered, the project name displays in this section of the project overview tab by default.
Click the Project Overview tab.
Click the Edit button in the Documentation section.
Document the details of your project in the text box that appears, and then click Save.
Note: Styling with Markdown is supported.
Project owners can update user roles and remove users from a project.
On the Members tab, click the Role of the member whose role you want to change.
Select a different role: subscribed or owner.
Notifications will be sent to the affected members and project owners, and a similar entry will be posted in the project's activity pane.
On the Members tab, click the Deny button next to the user or group you want to remove.
Complete the Reasoning field in the window that appears, and then click Submit.
Notifications will be sent to the affected users and other project members, and a similar entry will be added to the project's activity pane.
Tags can be added to projects to drive search results and governor reports.
Select a project and navigate to the Project Overview tab.
Scroll to the Tags section and click the Add Tags button.
Begin typing the tag name in the window that appears, and then select the tag from the dropdown menu. A list of chosen tags will populate at the bottom of this window.
After selecting all relevant tags, click the Add button.
Navigate to the Project Overview tab.
Scroll to the Tags section and click on the tag that you want to remove to open its side sheet.
Click Remove.
Click Confirm to delete the tag.
Immuta projects combine users and data sources under a common purpose. Sometimes this purpose is for a single user to organize their data sources or to control an entire schema of data sources through a single projects screen; however, most often this is an Immuta purpose for which the data has been approved to be used and will restrict access to data and streamline team collaboration.
For more information on the value projects can add to your organization, see the Project Use Cases page.
Project Members: The users who will create, manage, and use projects.
Purposes: Purposes allow Governors to define exceptions to policies based on how a user will use the data.
Data Sources: Projects create an environment within Immuta where the user will only see the data sources within the project, despite their subscription to other data sources.
Subscription Policy: Project users with the appropriate permissions can set a subscription policy on a project to control the users who can join.
Documentation and Discussions: Project users can create documentation and have discussions within the Immuta project page, allowing for an easy and consistent trail of communication.
Equalization: Project Equalization improves collaboration by ensuring that the data in the project looks identical to all members, regardless of their level of access to data.
Workspaces: With equalization enabled, project users can create Snowflake or Databricks workspaces where users can view and write back data to Immuta.
Derived Data Sources: Derived data sources are the data sources that come from workspaces. Once they are created, they automatically inherit policies from their parent sources.
Masked Joins: Within a project, users can join masked columns.
The features and capabilities of each user differ based on the user's role within the project and within Immuta. Roles and their capabilities are outlined below.
Project Role | Capabilities |
---|---|
Best Practice: Purposes
Consider purposes as attributes. Attributes identify a user, and purposes identify why that user should have access.
Data Governors and users with the PROJECT_MANAGEMENT permission are responsible for configuring and approving project purposes and acknowledgement statements.
Purposes help define the scope and use of data within a project and allow users to meet purpose restrictions on policies. Governors create and manage purposes and their sub-purposes, which project owners then add to their project(s) and use to drive Data Policies. Project owners can also create a purpose; however, they remain in a staged state until a Governor or Project Manager approves the purpose request.
Purposes can be constructed as a hierarchy, containing nested sub-purposes, much like tags in Immuta. This design allows more flexibility in managing purpose-based restriction policies and transparency in the relationships among purposes.
For example, if the purpose Research
included Marketing
, Product
, and Onboarding
as sub-purposes, a governor could write the following global policy:
Limit usage to purpose(s) Research for everyone on data sources tagged PHI.
This hierarchy allows you to create this as a single purpose instead of creating separate purposes, which must then each be added to policies as they evolve.
Now, any user acting under the purpose or sub-purpose of Research
- whether Research.Marketing
or Research.Onboarding
will meet the criteria of this policy. Consequently, purpose hierarchies eliminate the need for a Governor to rewrite these Global Policies when sub-purposes are added or removed. If new projects with new Research purposes are added, the relevant Global Policy will automatically be enforced.
Projects with purposes require owners and subscribers to acknowledge that they will only use the data for those purposes by affirming or rejecting acknowledgement statements. Each purpose has its own acknowledgement statement, and a project with multiple purposes requires users to accept more than one acknowledgement statement. Immuta keeps a record of each project member's response to the acknowledgement statement(s) and records the purpose, the time of the acknowledgement, and the text of the acknowledgement. Purposes can use Immuta's default acknowledgement statement or one customized by a Data Governor.
If users accept the statement, they become a project member. If they reject the acknowledgement statement, they are denied access to the project.
When a user is working within the context of a project, they will only see the data in that project. This helps to prevent data leakage when users collaborate. Users can switch project contexts to access various data sources while acting under the appropriate purpose. By default, there will be no project selected, even if the user belongs to one or more projects in Immuta.
When users change project contexts, all SQL queries or blob fetches that run through Immuta will reflect users as acting under the purposes of that project, which may allow additional access to data if there are purpose restrictions on the data source(s). This process also allows organizations to track not just whether a specific data source is being used, but why.
The same security restrictions regarding data sources are applied to projects; project members still need to be subscribed to data sources to access data, and only users with appropriate attributes and credentials can see the data if it contains any row-level or masking security.
However, Project Equalization improves collaboration by ensuring that the data in the project looks identical to all members, regardless of their level of access to data. When enabled, this feature automatically equalizes all permissions so that no project member has more access to data than the member with the least access. For a tutorial on enabling equalization, navigate to Create a Project.
Note: Only project owners can add data sources to the project if this feature is enabled.
Once Project Equalization is enabled, the Subscription Policy for the project is locked and can only be adjusted by the project owner by changing the Equalized Entitlements.
This setting adjusts the minimum entitlements (i.e., users' groups and attributes) required to join the project and to access data within the project. When Project Equalization is enabled, Equalized Entitlements default to Immuta's recommended settings, but project owners can edit these settings by adding or removing parts of the entitlements. However, making these changes entails two potential disadvantages:
If you add entitlements, members might see more data as a whole, but at least some members of the project will be out of compliance. The status of users' compliance is visible from the Members tab within the project.
If you remove entitlements, the project will be open to users with fewer privileges, but this change might make less data visible to all project members. Removing entitlements is only recommended if you foresee new users joining with less access to data than the current members.
This setting determines how often user credentials are validated, which is critical if users share data with project members outside of Immuta, as they need a way to verify that those members' permissions are still valid.
Once Project Equalization is enabled, the project Subscription Policy builder locks and can only be adjusted by manually editing the Equalized Entitlements. Then, the Subscription Policy will combine with the entitlement settings, depending on the policy type.
Combinations by Policy Type
The way entitlements and approvals combine differs depending on the policy type; for clarity, the table below illustrates various scenarios for each type. Every row demonstrates how a specific project Subscription Policy changes after Project Equalization is enabled (when an equalized entitlement is set and when no entitlement is set) and how the policy reverts if Project Equalization is subsequently disabled.
Example
For example, consider the Subscription Policy of the following sample project, Fraud Prevention, before Project Equalization is enabled:
Fraud Prevention
Subscription Policy: Allow users to subscribe when approved by anyone with permission Owner (of this project).
After enabling Project Equalization, the following Equalized Entitlement is recommended by Immuta: User is a member of group Claims and Billing Department.
In this particular example, the Equalized Subscription Policy contains the Equalized Entitlement and the approval of the original policy, so users must satisfy both conditions to subscribe:
the user must be a member of the group Claims and Billing Department
the user must be approved by anyone with permission Owner (of this project).
This feature allows masked columns to be joined within the context of a project. However, joining on columns masked by rounding, by making null, with a constant or regex or on columns that have conditional masking policies applied to them is not supported and will be blocked.
Note: Masked columns cannot be joined across data sources that are not linked by a project.
For instructions on enabling Masked Joins, navigate to Create a Project.
Schema projects are a collection of data sources that Immuta creates when multiple data sources belong to the same schema. For information about schema projects, see the Schema Project page.
Audience: Project members
Content Summary: This page outlines prerequisites and provides an overview of the integration process for Snowflake project workspaces.
See the page for information on the utility of project workspaces and the page for installation instructions.
.
.
External IDs have been connected with an IAM or in for Snowflake.
Data sources registered by : Snowflake workspaces generate static views with the credentials used to register the table as an Immuta data source. Those tables must be registered in Immuta by an Excepted Role so that policies applied to the backing tables are not applied to the project workspace views.
An Immuta User with the CREATE_PROJECT
permission with Snowflake data sources.
The Immuta Project Owner which balances every Project Members’ access to the data to be the same.
The Immuta Project Owner which automatically generates a subfolder in the root path specified by the Application Admin and remote database associated with the project.
Project members can access data sources within the project and use WRITE to create derived tables. To ensure equalization, users will only see data sources within their project as long as they are working in the Snowflake Context.
The CREATE_DATA_SOURCE_IN_PROJECT permission is given to specific users so they can ; the derived tables will inherit the policies, and then the data can be shared outside the project.
If a project member leaves a project or a project is deleted, that Snowflake Context will be removed from the user's Snowflake account.
Immuta only supports a single root location, so all projects will write to a subdirectory under this single root location.
If an administrator changes the default directory, the Immuta user must have full access to that directory. Once any workspace is created, this directory can no longer be modified.
Audience: Project Owners and members
Content Summary: This tutorial configures a .
Use Case
Compliance Requirement: Users can only WRITE to specified locations in Dev, and these users need to share this data with other Dev users.
After Dev users have analyzed data, they need to write content back to Immuta and share it with other Dev users. To allow them to write data back to Immuta, project owners need to create workspaces for their projects. Then, users can share the data they've written to Immuta with other users as derived data sources.
Workspaces can be enabled in the New Project modal when creating a new project, but project owners can enable this feature at any point on the project's Policies tab.
Navigate to the Policies tab and enable Project Equalization by clicking the Project Equalization slider to on.
Scroll to the Native Workspace section and click Create.
Select Snowflake from the Workspace Configuration dropdown menu.
Name the Workspace Schema. By default, the schema name is based off of the project name, but you can change it here. Your project workspace will exist within this schema under Snowflake under the database configured by the Application Admin.
Use the dropdown menu to select the Hostname. Projects can only be configured to use one Snowflake host.
Select one or more Warehouses to be available to project members when they are working in the Snowflake workspace.
Click Create to enable the workspace.
Once the workspace is created, project members will see relevant data sources in the Snowflake UI when working under the project context.
Select the Role created by the project workspace. The role created will be a combination of the database name (configured by the Application Admin) and the schema name (set in the previous section by the project owner).
Create a table in Snowflake.
Now that data has been written to the workspace, users can share this data with others by making it a derived data source in Immuta.
Scroll to the Native Workspace section on the Policies tab and click the toggle to disable the workspace.
Click Delete in the Native Workspace section.
Choose one of the following options in the modal:
Purge Generic Workspace Data: permanently delete data, while the data used by derived data sources is preserved. Note: If you created a derived data source that references a view on top of a table in Snowflake that isn't a derived data source, that table will be deleted and break the derived data source.
Purge Everything & Delete Derived Data Sources: permanently delete data and purge all derived data sources.
Click Delete.
If you had project workspaces that were created before Immuta 2022.1.0, you need to perform this migration.
Navigate to the Policies tab of you project.
Toggle the switch to disable the workspace, and choose from the purge options.
Refresh the page.
Toggle the switch the enable the workspace, and fill out the modal.
Audience: Project members
Content Summary: This page explains Databricks workspaces, which allow users to access and write to protected data directly in Databricks.
See the for details on prerequisites and see the page for installation instructions.
Databricks project workspaces allow users to access data on cluster without having to go through the . Using and , Databricks project workspaces are a space where every project member has the same level of access to data. This equalized access allows collaboration without worries about data leaks. Not only can project members collaborate on data, but they can also write protected data back to Immuta.
Users will only be able to access the directory and database created for the workspace when acting under the project. The Immuta Spark SQL Session will apply policies to the data, so any data written to the workspace will already be compliant with the restrictions of the equalized project, where all members see data at the same level of access. When users are ready to write data back to Immuta, they should use the SparkSQL session to copy data into the workspace.
Immuta currently supports the s3a
schema for Amazon S3. When using Databricks on Amazon S3 either a key pair for S3 needs to be specified in the additional configuration that has access to the workspace bucket/prefix or an instance role must be applied to the cluster with access.
Immuta currently supports the abfss
schema for Azure General Purpose V2 Storage Accounts. this includes support for Azure Data Lake Gen 2. When configuring Immuta workspaces for Databricks on Azure, the Azure Databricks Workspace ID must be provided. More information about how to determine the Workspace ID for your workspace can be found in the . It is also important that the additional configuration file is included on any clusters that wish to use Immuta workspaces with credentials for the container in Azure Storage that contains Immuta workspaces.
Immuta currently supports the gs
schema for Google Cloud Platform. The primary difference between Databricks on Google Cloud Platform and Databricks on AWS or Azure is that it is deployed to Google Kubernetes Engine. Databricks handles automatically provisioning and auto scaling drivers and executors to pods on Google Kubernetes Engine, so Google Cloud Platform admin users can view and monitor the Google Kubernetes resources in the Google Cloud Platform.
Stage Immuta installation artifacts in Google Storage, not DBFS: The DBFS FUSE mount is unavailable, and the IMMUTA_SPARK_DATABRICKS_DBFS_MOUNT_ENABLED
property cannot be set to true
to expose the DBFS FUSE mount.
Stage the Immuta init script in Google Storage: Init scripts in DBFS are not supported.
Stage third party libraries in DBFS: Installing libraries from Google Storage is not supported.
Maven library installation is only supported in Databricks Runtime 8.1+.
/databricks/spark/conf/spark-env.sh
is mounted as read-only:
Set sensitive Immuta configuration values directly in immuta_conf.xml
: Do not use environment variables to set sensitive Immuta properties. Immuta is unable to edit the spark-env.sh
file because it is read-only; therefore, remove environment variables and keep them from being visible to end users.
Use /immuta-scratch
directly: The IMMUTA_LOCAL_SCRATCH_DIR
property is unavailable.
Allow the Kubernetes resource to spin down before submitting another job: Job clusters with init scripts fail on subsequent runs.
The DBFS CLI is unavailable: Other non-DBFS Databricks CLI functions will still work as expected.
To write data back to a table in Databricks through an Immuta workspace, use one of the following supported provider types for your table format:
avro
csv
delta
orc
parquet
Audience: Project members
Content Summary: This page outlines prerequisites and provides an overview of the integration process for Databricks project workspaces.
See the page for information on the utility of project workspaces and see the page for installation instructions.
.
.
.
External IDs have been mapped in for Databricks.
Cluster configuration: Before creating a workspace, the cluster must send its configuration to Immuta; to do this, run a simple query on the cluster (i.e., show tables
). Otherwise, an error message will occur when you attempt to create a workspace.
An Immuta User with the CREATE_PROJECT
permission with Databricks data sources.
The Immuta Project Owner which balances every Project Members’ access to the data to be the same.
The Immuta Project Owner which automatically generates a subfolder in the root path specified by the Application Admin and remote database associated with the project.
The Immuta Project Members query equalized data within the context of the project, collaborate, and write data back to Immuta, all within Databricks.
The Immuta Project Members use their newly written derived data and . These derived data sources inherit the necessary Immuta policies to be securely shared outside of the project.
Immuta only supports a single root location, so all projects will write to a subdirectory under this single root location.
If an administrator changes the default directory, the Immuta user must have full access to that directory. Once any workspace is created, this directory can no longer be modified.
Administrators can place a configuration value in the cluster configuration (core-site.xml
) to mark that cluster as unavailable for use as a workspace.
When acting in the workspace project, users can read data using calls like spark.read.parquet("immuta:///some/path/to/a/workspace")
.
To write delta lake data to a workspace and then expose that delta table as a data source in Immuta, you must specify a table when creating the derived data source (rather than a directory) in the workspace for the data source.
Audience: Data Owners and Data Users
Content Summary: This page describes how derived data sources inherit policies from their parent sources. For a tutorial, see .
A derived data source is a data source that is created within an equalized project and contains data from its parent sources. Consequently, when the derived data source is created, it will inherit the needed Data Policies from those parent data sources to keep the data secure.
Policy inheritance for derived data sources is a feature unique to the environment that an equalized project creates. Within the equalized project every data user sees the same data and work can be shared and collaborated on without any risk of a user viewing more than they should. When a derived data source is created, it inherits the Data Policies from its parent sources and a Subscription Policy is created from the equalized entitlements on the project, allowing project members to safely share secure data.
Let's look at an example.
Consider these data sources, within an equalized Project 1, that each contain Subscription and Data Policies:
Data Source A
Subscription Policy: Allow users to subscribe to the data source when user is a member of group Medical Claims
Data Policies:
Mask by making null the value in the column(s) city except for members of group Legal
Mask by making null the value in the column(s) gender for everyone
Data Source B
Subscription Policy: Allow users to subscribe to the data source when user is approved by anyone with permission Owner and anyone with permission Governance
Data Policy: Limit usage to purpose(s) Research for everyone
If a user creates a derived data source, Data Source C, from these two data sources, Data Source C will inherit these policies, which will be unchangeable:
Data Source C
Subscription Policy: Allow user to subscribe when they satisfy all of the following:
is a member of group Legal and is a member of group Medical Claims
is approved by anyone with permission Owner (of Data Source B) and anyone with permission Governance
Data Policy: Limit usage to purpose(s) Research for everyone
Derived Data Sources Inherit Policies from Parent Sources
Sensitive Data Discovery applies Discovered tags to derived data sources; however, because they inherit policies from their parent sources, the Global Policies that contain these tags will not apply to derived data sources.
Notice that one of the Data Policies in Data Source A, mask by making null the value in the column(s) gender for everyone, is not included in Data Source C. This is because the creator could not have seen the values in the parent sources; therefore, there are no values in the derived data source to be masked.
Most Local Data Policies will not need to be present in the derived data source with the exception of limit usage to purpose(s) policies. And no Global Policies will be added to a derived data source.
Data Source C's policies are reliant on which groups are in the project, and as the groups change so do the policies.
For example, if there were a data user in the project who was not in the Legal group, then that trait would not be needed in the Subscription Policy because, with equalization, those values would not be visible to the project members in the parent data source.
All of the project members are in the groups Medical and Legal, so those groups are a part of the Subscription Policy.
Some project members have both groups Medical and Legal; however, one project member only has the group Medical Claims. Once the project has been equalized, everyone sees the same amount as the member with the least permissions; so the Subscription Policy only has to have the traits of that member.
The Subscription and Data Policies in the derived data source will always be the minimum required permissions and traits because of project equalization.
Data Source C's policies will not adapt with the parent data sources. Any changes in the parent data source policies will be logged in the Relationships tab of the derived data source page but will not be changed in the derived data source policies.
The data owner may choose to add new Local Data Policies to the derived data source to keep up with any changes, but the inherited policies are not adjustable.
Any changes within the parent data source's data will not trickle down into the derived data source. After the creation of the derived data source, they stay connected for auditing and relationships, not for updating content.
If members use data outside the project to create their data source, they must first add that data to the project and re-derive the data source through the project connection. When creating a derived data source, members are prompted to certify that their data is derived from the parent data sources they selected upon creation.
Original Policy | Equalized Policy (Example Entitlement: member of group Accounting) | Equalized Policy (No Entitlement) | Policy After Disabling Equalization |
---|---|---|---|
Install third party libraries as cluster-scoped: Notebook-scoped libraries have limited support. See the page for more details.
For detailed instructions on creating a derived data source, navigate to .
Project Owner
Users with the CREATE_PROJECT
permission can
enable Project Equalization and Masked Joins
disable, delete, and enable projects
create Snowflake or Databricks workspaces
Project Governor
Users with the GOVERNANCE
permission can
manage project members, project tags, and project subscription policies
add and remove project data sources
Project Manager
Users with the PROJECT _MANAGEMENT
permission can
add and remove project data sources
Project Member
Once subscribed to a project, all users can
add data sources to the project (unless Project Equalization or Masked Joins is enabled)
Anyone
Allow user to subscribe when user is a member of group Accounting
Individual Users You Select
Individual Users You Select
Allow users to subscribe when approved by anyone with permission Owner (of this project)
Allow users to subscribe when they satisfy all of the following: is a member of group Accounting and is approved by anyone with permission Owner (of this Project)
Allow users to subscribe when approved by anyone with permission Owner (of this project)
Allow users to subscribe when approved by anyone with permission Owner (of this project)
Allow users to subscribe to the project when user is a member of group Legal
Allow users to subscribe to the project when user is a member of group Accounting
Individual Users You Select
Individual Users You Select
Individual Users You Select
Allow users to subscribe to the project when user is a member of group Accounting
Individual Users You Select
Individual Users You Select
Audience: Project Owners and members
Content Summary: This page outlines how to create a derived data source to share data you have written back to Immuta.
After workspaces are configured by a System Administrator, users with the CREATE_PROJECT permission can enable workspaces within their projects. This feature allows project members to write data back to Immuta and share this data with other users as derived data sources.
Prerequisites:
project workspaces have been configured by an Application Admin.
data has been written back to a project workspace.
Additional Tutorials Contents:
Change Project Contexts
Comment on General Project Discussions
Create a Project-based API Key
Create Project-based SQL Connections
Leave a Project
Manage Project Data Sources
Search for Projects
Use Case
Compliance Requirement: Users can only WRITE to specified locations in Dev, and these users need to share this data with other Dev users.
After Dev users have analyzed data and have written data back to the native workspace in dev_my_project
, they need to be able to share it with other Dev users. To do so, these users will create a derived data source. The steps below use Snowflake to illustrate this process, but the same steps tutorial can be followed to create derived data sources in Hadoop and Databricks native workspaces.
From the My Projects page, select dev_my_project
and click Add Derived Data Source on the Overview or Data Sources tab.
Select the data source from which the new data was created (in this scenario, the Taxi_Timeshift
table). Note: You will not need to enter connection information when creating a data source within a native workspace.
Select Table for the Virtual Population option. Note: For a derived data source, you cannot change the backing data platform from the parent source.
Click Edit and select the table(s) you created in Snowflake, and click Apply.
Opt to edit the Basic Information fields, and then click Create.
You may need to take other actions within a project as a member. See the sections below for specific tutorials on other project member capabilities.
Click the dropdown menu in the top right corner of the console.
Select a project. Once selected, the current project will display at all times in the top right corner of the console.
If you unsubscribe from the project, this display will default to No Current Project.
Deprecation notice
Support for this feature has been deprecated.
Select a project and click the Discussions tab.
Click Open to view open discussions or Resolved to view resolved discussions.
Click a discussion to view comments.
Click New Discussion to create a new discussion.
Type in your comment or question and click Save.
A notification will be sent to all project members so that they can view your comment or question and reply.
As a project member, you can only delete a discussion thread or reply that you have written.
To permanently delete your own discussion thread,
Navigate to the Discussions tab and view open and/or resolved discussions by clicking the Open or Resolved button, respectively.
Click the discussion thread you would like to delete, and then click Delete in the upper right corner of the discussion window.
Click Delete in the confirmation window that appears.
The discussion thread and all of its comments are now deleted.
To delete your own reply,
Select a discussion thread.
Click Delete in the upper right corner of the reply or comment you would like to delete.
Click Confirm to permanently delete the comment.
Any project member can create project-based API keys, which are used for authenticating external tools with Immuta.
To create a project-based API key,
Navigate to the Project Overview tab.
Click the Get API Key button at the bottom of the left panel.
An API Key modal will display with your requested information. Please store these credentials somewhere secure. If you misplace this information, you will have to generate a new key and re-authenticate all services connected to Immuta via this key.
Click the Close button.
Project SQL accounts are unique to each project and only provide access to the data sources in that project. Project SQL credentials cannot be retrieved from Immuta if they are lost. Credentials can only be re-generated using the instructions below. When a user generates new SQL credentials for a project, any existing SQL credentials for that project the user may have had are revoked.
To create a project-based SQL connection,
Navigate to the Project Overview tab.
Click SQL Connection in the upper right corner.
A window will display with the connection information. Store these credentials somewhere secure. If you misplace them, you will have to generate a new account and re-authenticate all services connected to Immuta via this account.
Click Close.
Navigate to the Project Overview tab.
Click the Leave Project button in the upper right corner, and then click Confirm.
After leaving a project, that project is removed from the list in the My Projects tab.
Any project member can add data sources to a project, unless the Project Equalization or Allow Masked Joins features are enabled; in those cases only project owners can add data sources to the project.
Select the project, and then navigate to the Project Overview tab.
Click the Add Data Sources button in the Data Sources section in the center pane.
Start typing the name of a data source you'd like to include in the project.
Select the data source from the list of auto-completed options in the dropdown menu.
Repeat this process to add additional data sources to the list. Click Remove to remove them.
When complete, click the Add button at the bottom of the list.
You can automatically add all data sources to a project that contain a Limit usage to purpose policy that matches the purpose of that project.
Select a Project, and click the Add Data Sources button on the Data Sources tab.
Click Add By Purpose in the top right of the dialog.
All data sources matching the project's purpose(s) will populate at the bottom of the dialog. Review this list, and then click Save.
Select a project and navigate to the Overview or Data Sources tab.
Click Add Derived Data Source.
Begin typing in the Search by Name or Description text box, and then select the data source(s) from which your new data source will be derived.
Click Save.
Follow the instructions for creating a data source.
As a project member, you can only delete data sources you've added to the project.
To remove a data source you've added,
Select a project, and then click the Data Sources tab.
Click the Remove Data Source icon in the Actions column of the data source you want to remove.
Click Confirm in the window that appears.
Immuta's UI provides a list of all projects, excluding those that have been set to private. Users can search for projects by keyword, tag, or data source.
To access a list of all projects, click the projects icon in the left sidebar, and then click the All Projects tab.
To filter projects by keyword, type one or more keywords into the search box at the top of the page, and select a keyword from the auto-completed results. If a list does not display, then no keywords matching that text currently exist.
To filter projects by tags, type one or more tag names into the search box at the top of the page and select a tag from the list of auto-completed results. If a list of tags does not display, then no tags matching that text currently exist.
To filter projects by data sources, type one or more data source names into the search box at the top of the page and select a data source from the list of auto-completed results. If a list of data sources does not display, then no data sources matching that text currently exist.
Audience: Data Users
Content Summary: This page outlines the available functions to view and switch your projects using UDFs in Databricks. A tutorial is also provided to illustrate how to switch project contexts.
Use Project UDFs in Databricks
Currently, caches are not all invalidated outside of Databricks because Immuta caches information pertaining to a user's current project in the NameNode plugin and in Vulcan. Consequently, this feature should only be used in Databricks.
You can switch project contexts and view a list of your current project or available projects through UDFs in Spark.
UDF | Description |
---|---|
To view a list of your current project or available projects in a Spark job, you can query these virtual tables.
Virtual Table | Query | Return |
---|---|---|
View your available projects by running the following query in Spark: select * from immuta.list_projects. In the resulting table, note the values listed in the id column; this value will be used at the parameter in the following step.
Run select immuta.set_current_project(<id>
). This UDF must be called in its own notebook cell to ensure the changes take effect.
Your project context will be switched, and that project's data sources and workspaces will now be visible. To set your project context to None, run select immuta.set_current_project()
with no parameters.
Note: Since the UDFs are not actually registered with the FunctionRegistry, if you call DESCRIBE FUNCTION immuta.set_current_project
, you won't get back the documentation for the UDF.
Public preview
This feature is in public preview. It is available to all customers and can be enabled on the .
Project owners can use policy adjustments to increase a data set's utility while retaining the amount of k-anonymization that upholds de-identification requirements. With this feature enabled, users can redistribute the noise across multiple columns of a data source within a project to make specific columns more useful for their analysis. Since these adjustments only occur within the project and do not change the individual data policies, data users must be acting under the project to see the adjustments in the data source.
Navigate to for a tutorial.
For example, a policy might mask these data source columns with k-anonymization: Income
, Education
, EmploymentStatus
, Gender
, and Location Code
. When the analyst examines the data, the percent NULL has been predetermined by Immuta with an equal weight across all of these columns. However, if the analyst's work hinges on the EmploymentStatus
column, the project owner can adjust the weights on the policy adjustment tab in the project to make the necessary data (EmploymentStatus
) less NULL.
For columns that are already well-disclosed (meaning they already have a low percent null), the same percent null will display even when you drastically change the weight distribution.
Increasing the weight of a column that is already well-disclosed will not change the outcome. Generally, the biggest impact will be seen when you increase the weights of the largest percent null column. (The only exception to this is if that column already has a lot of native nulls in the remote database.)
This feature provides an option to allow fields in the clear when creating a purpose, permitting specified analysts to bypass k-anonymization in specific circumstances.
When any purpose with the allow fields in the clear property enabled is approved for use within a project, a project member can proceed through the workflow and specify columns to be unmasked. However, the seeded Re-identification Prohibited.Expert Determination.DUAM purpose is specific to HIPAA Expert Determination and automatically has the allow fields in the clear functionality enabled.
Navigate to for a tutorial.
Deprecation notice
Support for this feature has been deprecated.
This feature must be enabled on the .
Like , project owners can use Expert Determination to increase a data set's utility while retaining the amount of k-anonymization that upholds de-identification requirements. With this feature enabled, users can redistribute the noise across multiple columns of a data source within a project to make specific columns more useful for their analysis.
While Policy Adjustments are available on all equalized projects with a noise reduction purpose applied, Expert Determination is only available on projects with Re-identification Prohibited.HIPAA De-identification or Re-identification Prohibited.Expert Determination purposes applied, since Expert Determination is specific to the HIPAA De-identification Global Policy.
Once a policy has been adjusted, Expert Determination provides a downloadable report that contains a statistical analysis of the data source to assess the very small re-identification probability indicated by the purpose.
Navigate to for a tutorial.
This feature provides an Allow Fields in the clear option in the create purpose modal, permitting specified analysts to bypass k-anonymization in specific circumstances. The seeded purpose Re-identification Prohibited.Expert Determination.DUAM is specific to HIPAA Expert Determination and automatically has the Show fields in the clear functionality enabled.
Navigate to for a tutorial.
Audience: Project Owners and members
Content Summary: This tutorial configures a .
Databricks Cluster Configuration
Before creating a workspace, the cluster must send its configuration to Immuta; to do this, run a simple query on the cluster (i.e., show tables
). Otherwise, an error message will occur when you attempt to create a workspace.
Navigate to the Policies tab and enable Project Equalization by clicking the Project Equalization slider to on.
Scroll to the Native Workspace section and click Create.
Select Databricks from the Workspace Configuration dropdown menu.
Opt to edit the sub-directory in the Workspace Directory field; this sub-directory auto-populates as the project name.
Enter the Workspace Database Name.
Click Create to enable the workspace.
Scroll to the Native Workspace section on the Policies tab and click the toggle to disable the workspace.
Click Delete in the Native Workspace section.
Choose one of the following options in the modal:
Purge Generic Workspace Data: permanently delete data, while the data used by derived data sources is preserved. Note: If you created a derived data source that references a view on top of a table in Snowflake that isn't a derived data source, that table will be deleted and break the derived data source.
Purge Everything & Delete Derived Data Sources: permanently delete data and purge all derived data sources.
Click Delete.
Audience: Project Owners
Content Summary: This page details how to adjust policies and in projects.
This feature must be enabled on the before completing this tutorial.
Policy Adjustments Prerequisites:
The left side of the Policy Adjustment tab has a checklist of requirements. Each requirement changes from a yellow yield icon to green check mark as the user completes them.
Navigate to the project.
Click the Policies tab.
In the Project Equalization section, click the toggle button on the far right to On.
Recommended Purposes
When using Expert Determination, use the Re-identification Prohibited.Expert Determination and Re-identification Prohibited.HIPAA De-identification purposes.
Navigate to the project Overview tab.
Click the Add Purposes button in the center of the page.
Select the desired purpose(s) from the dropdown menu. The project must have one purpose with noise reduction for policy adjustments to function. Noise reduction is indicated by the policy adjustment amount highlighted in gray. If there is no policy adjustment shown, it is the default, none.
Click Save.
The purpose will be staged until a user with the GOVERNANCE or PROJECT_MANAGEMENT permission approves and activates it. If you have one of these permissions, the staging will be skipped.
After the purpose has been approved, click I Agree to agree to the terms of the purpose.
Note: All members of the project must agree to the terms of the purpose; if they decline, they will be removed from the project.
Navigate to the Policy Adjustment tab.
Select the data source from the dropdown menu. You will receive a No Adjustments Available message if there are no columns in the data sources that are associated with adjustable policies.
Select the columns from the dropdown menu.
Once the columns have been selected, both the Add Data Sources with K-Anonymization Policies and the Select Data Source and Columns requirements will be fulfilled and change to a green check mark.
In the priority ranking window, give weight from most to least important columns. The higher the weight the more usability will be provided, while still providing de-identification through the other columns.
After assigning the weight and ensuring that the remaining weight is zero, click the Adjust button.
Check that the percent NULL is appropriate for your usability. If you are content with the percent NULL, move on to the next step; if you are not satisfied with the percent NULL, repeat the previous steps until satisfied.
Click Apply after you have made the acceptable adjustments.
After you ensure that your data source has two columns with k-anonymization policies applied,
Navigate to your project and click the Overview tab.
Click Add Purposes and select a purpose with a Noise Reduction and Fields in Clear tags, or create a new purpose with Noise Reduction and Fields in Clear enabled.
Click Save, and then click I Agree.
Navigate to the Policy Adjustment tab.
Select the data source(s) and columns from the subsequent dropdown menus.
Select the Keep in the Clear checkbox next to columns to be kept in the clear.
Adjust the weight for the remaining columns. The Remaining Weight must equal 0.
Click Adjust and then Apply.
The values for the fields you selected will now be visible to users acting under the project.
Opt to select for specific columns.
immuta.set_current_project(id)
Sets the user's current project to the project ID denoted by the id parameter. This UDF must be called in its own notebook cell to ensure the changes take effect.
immuta.set_current_project() (no parameters)
Sets the user's current project to None.
immuta.clear_caches()
Clears all client caches for the current user's ImmutaClient instance. This can be used when a user would like to invalidate cached items, like data source subscription information or if the state of Immuta has changed and the cache is outdated. For backward compatibility, this UDF is also available at default.immuta_clear_caches()
default.immuta_clear_metastore_cache()
Clears the cluster-wide Metastore cache. This UDF can only be run by a privileged user.
immuta.get_current_project
select * from immuta.get_current_project
This virtual table returns a single row with "name" and "id" columns that show your currently selected project.
immuta.list_projects
select * from immuta.list_projects
This virtual table returns rows with "name," "id," and "current_project" columns. Each row is a different project to which you are subscribed (and can use as your current project). The "current_project" row will be true for the row defining the project that you have set as your current project.
Deprecation notice
Support for this policy has been deprecated.
This feature must be enabled on the App Settings page before completing this tutorial.
Expert Determination Prerequisites:
The left side of the Expert Determination tab has a checklist of requirements that changes from a yellow yield icon to a green check mark as the user completes them.
Navigate to the project.
Click the Policies tab.
In the Project Equalization section, click the toggle button on the far right to On.
Navigate to the project Overview tab.
Click the Add Purposes button in the center of the page.
Select the Re-identification Prohibited.Expert Determination purpose.
Click Save.
The purpose will be staged until a user with the GOVERNANCE or PROJECT_MANAGEMENT permission approves and activates it. If you have one of these permissions, the staging will be skipped.
After the purpose has been approved, click I Agree to agree to the terms of the purpose.
Note: All members of the project must agree to the terms of the purpose; if they decline, they will be removed from the project.
Navigate to the Expert Determination tab.
Select the data source from the dropdown menu.
Select the columns from the dropdown menu.
Once the columns have been selected, both the Add Data Sources with K-Anonymization Policies and the Select Data Source and Columns requirements will be fulfilled and change to a green check mark.
No Adjustments Available
If you receive the No Adjustments Available message, then there are no columns that are associated with adjustable policies under the HIPAA De-identification Global Policy. If you believe there are,
ensure that the HIPAA De-identification Global Policy is active.
ensure that the HIPAA De-identification Global Policy has been certified on that data source and that there are no policy conflicts.
verify that at least two columns have been correctly tagged Discovered.Identifier Indirect in the Data Dictionary.
In the priority ranking window, give weight from most to least important columns. The higher the weight the more usability will be provided, while still providing de-identification through the other columns.
Opt to select Keep Fields in the Clear for specific columns.
After assigning the weight and ensuring that the remaining weight is zero, click the Adjust button.
Check that the percent NULL is appropriate for your usability. If you are content with the percent NULL, move on to the next step; if you are not satisfied with the percent NULL, repeat the previous steps until satisfied.
Click Certify after you have made the acceptable adjustments.
Click Download Report. This will open a new tab in your browser with the report and the option to Save as PDF.
Save the report.
After you ensure that the HIPAA De-identification Global policy is Active, certified, and has no conflicts on the data source in your project,
Navigate to your project and click the Overview tab.
Click Add Purposes and select Re-identification Prohibited.Expert Determination.DUAM from the dropdown menu in the modal.
Click Save, and then click I Agree.
Navigate to the Expert Determination tab.
Select the data source and columns from the subsequent dropdown menus.
Select the Keep in the Clear checkbox next to columns to be kept in the clear.
Adjust the weight for the remaining columns. The Remaining Weight must equal 0.
Click Adjust and then Certify.
Click Confirm to save your adjustments.
The values for the fields you selected will now be visible to users acting under the project.