Projects and Purposes
Audience: Data Owners, Data Users, and Data Governors
Content Summary: This overview describes concepts related to and major features of projects.
The features and capabilities of each user differ based on the user's role within the project and within Immuta. Roles and their capabilities are outlined below.
Project Purposes and Acknowledgement Statements
Data Governors and users with the PROJECT_MANAGEMENT permission are responsible for configuring and approving project purposes and acknowledgement statements.
Best Practice: Purposes
Consider purposes as attributes. Attributes identify a user, and purposes identify why that user should have access.
Purposes help define the scope and use of data within a project and allow users to meet purpose restrictions on policies. Governors create and manage purposes and their sub-purposes, which project owners then add to their project(s) and use to drive Data Policies. If project owners create a purpose, they remain in a staged state until a Governor or Project Manager approves the purpose request.
Purposes can be constructed as a hierarchy, meaning that purposes can contain nested sub-purposes, much like tags in Immuta. This design allows more flexibility in managing purpose-based restriction policies and transparency in the relationships among purposes.
For example, consider this organization of the sub-purposes of Research:
Instead of creating separate purposes, which must then each be added to policies as they evolve, a Governor could write the following Global Policy:
Limit usage to purpose(s) Research for everyone on data sources tagged PHI.
Now, any user acting under the purpose or sub-purpose of
Research - whether
Research.MedicalClaims - will meet the criteria of this policy. Consequently,
purpose hierarchies eliminate
the need for a Governor to re-write these Global Policies when sub-purposes are added or removed. Furthermore, if new
projects with new Research purposes are added, for example, the relevant Global Policy will automatically be enforced.
Projects with purposes require owners and subscribers to acknowledge that they will only use the data for those purposes by affirming or rejecting acknowledgement statements. This practice ensures that project members are aware of and agree to all purpose-based restrictions before accessing the project's content. Each purpose is associated with its own acknowledgement statement, so a project with multiple purposes requires users to accept more than one acknowledgement statement. Immuta keeps a record of whether each project member has agreed to the acknowledgement statement(s) and records the purpose associated to the acknowledgement, the time of the acknowledgement, and the text of the acknowledgement. All purposes are associated with the default acknowledgement statement unless their statement has been customized by a Data Governor.
If users accept the statement, they become a project member. If they reject the acknowledgement statement, they are denied access to the project.
Switching Project Contexts
When a user is working within the context of a project, they will only see the data in that project. This helps to prevent data leakage when users collaborate. Users can switch project contexts to access various data sources while acting under the appropriate purpose. By default, there will be no project selected, even if the user belongs to one or more projects in Immuta.
When users change project contexts, all SQL queries or blob fetches that run through Immuta will reflect users as acting under the purposes of that project, which may allow additional access to data if there are purpose restrictions on the data source(s). This process also allows organizations to track not just whether a specific data source is being used, but why.
Use Project UDFs in Databricks
Currently, caches are not all invalidated outside of Databricks because Immuta caches information pertaining to a user's current project in the NameNode plugin and in Vulcan. Consequently, this feature should only be used in Databricks.
You can also switch project contexts, and view a list of your current project or available projects, through UDFs in Spark.
To view a list of your current project or available projects in a Spark job, you can query these virtual tables.
|immuta.get_current_project||select * from immuta.get_current_project||This virtual table returns a single row with "name" and "id" columns that show your currently selected project.|
|immuta.list_projects||select * from immuta.list_projects||This virtual table returns rows with "name," "id," and "current_project" columns. Each row is a different project to which you are subscribed (and can use as your current project). The "current_project" row will be true for the row defining the project that you have set as your current project.|
To see how to use the functions below, navigate to the Switching Project Contexts with UDFs tutorial.
|immuta.set_current_project(id)||Sets the user's current project to the project ID denoted by the id parameter. This UDF must be called in its own notebook cell to ensure the changes take effect.|
|immuta.set_current_project() (no parameters)||Sets the user's current project to None.|
|immuta.clear_caches()||Clears all client caches for the current user's ImmutaClient instance. This can be used when a user would like to invalidate cached items, like data source subscription information or if the state of Immuta has changed and the cache is outdated. For backward compatibility, this UDF is also available at default.immuta_clear_caches()|
|default.immuta_clear_metastore_cache()||Clears the cluster-wide Metastore cache. This UDF can only be run by a privileged user.|
The same security restrictions regarding data sources are applied to projects; project members still need to be subscribed to data sources to access data, and only users with appropriate attributes and credentials can see the data if it contains any row-level or masking security.
However, Project Equalization improves collaboration by ensuring that the data in the project looks identical to all members, regardless of their level of access to data. When enabled, this feature automatically equalizes all permissions so that no project member has more access to data than the member with the least access. For a Project Equalization tutorial, navigate to Create a Project.
Note: Only project owners can add data sources to the project if this feature is enabled.
Once Project Equalization is enabled, the Subscription Policy for the project is locked and can only be adjusted by the project owner by changing the Equalized Entitlements. Click the tabs below for more information about the settings related to Project Equalization.
This setting adjusts the minimum entitlements (i.e., users' groups and attributes) required to join the project and to access data within the project. When Project Equalization is enabled, Equalized Entitlements default to Immuta's recommended settings, but project owners can edit these settings by adding or removing parts of the entitlements. However, making these changes entails two potential disadvantages:
If you add entitlements, members might see more data as a whole, but at least some members of the project will be out of compliance. The status of users' compliance is visible from the Members tab within the project.
If you remove entitlements, the project will be open to users with fewer privileges, but this change might make less data visible to all project members. Removing entitlements is only recommended if you foresee new users joining with less access to data than the current members.
This setting determines how often user credentials are validated, which is critical if users share data with project members outside of Immuta, as they need a way to verify that those members' permissions are still valid. Validation Frequency provides those means of verification.
Project Equalization and Subscription Policies
Once Project Equalization is enabled, the project Subscription Policy builder locks and can only be adjusted by manually editing the Equalized Entitlements. Then, the Subscription Policy will combine with the entitlement settings, depending on the policy type.
Combinations by Policy Type
The way entitlements and approvals combine differs depending on the policy type; for clarity, the table below illustrates various scenarios for each type. Every row demonstrates how a specific project Subscription Policy changes after Project Equalization is enabled (when an equalized entitlement is set and when no entitlement is set) and how the policy reverts if Project Equalization is subsequently disabled.
|Original Policy||Equalized Policy (Example Entitlement: member of group Accounting)||Equalized Policy (No Entitlement)||Policy After Disabling Equalization|
|Anyone||Allow user to subscribe when user is a member of group Accounting||Individual Users You Select||Individual Users You Select|
|Allow users to subscribe when approved by anyone with permission Owner (of this project)||Allow users to subscribe when they satisfy all of the following: is a member of group Accounting and is approved by anyone with permission Owner (of this Project)||Allow users to subscribe when approved by anyone with permission Owner (of this project)||Allow users to subscribe when approved by anyone with permission Owner (of this project)|
|Allow users to subscribe to the project when user is a member of group Legal||Allow users to subscribe to the project when user is a member of group Accounting||Individual Users You Select||Individual Users You Select|
|Individual Users You Select||Allow users to subscribe to the project when user is a member of group Accounting||Individual Users You Select||Individual Users You Select|
For example, consider the Subscription Policy of the following sample project, Fraud Prevention, before Project Equalization is enabled:
Subscription Policy: Allow users to subscribe when approved by anyone with permission Owner (of this project).
After enabling Project Equalization, the following Equalized Entitlement is recommended by Immuta: User is a member of group Accounting.
In this particular example, the Equalized Subscription Policy contains the Equalized Entitlement and the approval of the original policy, so users must satisfy both conditions to subscribe:
- the user must be a member of the group Accounting and
- the user must be approved by anyone with permission Owner (of this project).
This feature allows masked columns to be joined within the context of a project. However, joining on columns masked by rounding, by making null, with a constant or regex or on columns that have conditional masking policies applied to them is not supported and will be blocked.
Note: Masked columns cannot be joined across data sources that are not linked by a project.
For instructions on enabling Masked Joins, navigate to Create a Project.
Schema projects are different from user created projects in several ways, but mainly in that they are automatically created and managed by Immuta. They group all the data sources of the schema, and when new data sources are created, manually or with schema detection, they are automatically added to the schema project.
Schema projects are created with any table-backed data source; when you create the data source, you choose the project name at that time. The user creating the data source does not need the CREATE_PROJECT permission to have the project auto-create because no data sources can be added by the owner. Instead, new data sources are managed by Immuta. However, the user can manage Subscription policies for schema projects, but they cannot apply Data policies or purposes to them.
The schema settings, such as schema evolution and connection information, can be edited from the project overview tab. Note: Deleting the project will delete all of the data sources within it as well.