Skip to content

Projects

Audience: Data Owners, Data Users, and Data Governors

Content Summary: Projects allow users to logically group work by linking data sources and can be created by Data Users who want to efficiently organize their work or by Data Owners who want to provide special access to data to specific users. Additionally, Data Governors can act as project owners for any project in their organization.

This overview describes concepts related to and major features of projects; the Project Owner, Project Member, and Project Governor guides provide tutorials for each of these user types.

Project Roles

The features and capabilities of each user differ based on the user's role within the project and within Immuta. Roles and their capabilities are outlined below.

Project Owner Capabilities

Users with the CREATE_PROJECT permission are considered owners of the projects they create and have the following capabilities:

Governor Capabilities

Governors have the following capabilities for any project in their organization, even for projects that are private or that they are not members of:

Project Member Capabilities

Once subscribed to a project, all Immuta users have the following capabilities as project members:

Project Purposes and Acknowledgement Statements

The Data Governor is responsible for configuring project purposes and acknowledgement statements.

  • Purposes: Purposes help define the scope and use of data within a project and allow users to meet purpose restrictions on policies. Governors create and manage purposes and their sub-purposes, which project owners then add to their project(s) and use to drive Data Policies.

    Purposes Tab

  • Acknowledgement Statements: Projects containing purposes require owners and subscribers to acknowledge that they will only use the data for those purposes by affirming or rejecting acknowledgement statements. If users accept the statement, they become a project member. If they reject the acknowledgement statement, they are denied access to the project. Once acknowledged, data accessed under the provision of a project will be audited and the purposes will be noted. Immuta provides default acknowledgement statements, but Data Governors can customize these statements in the Purposes tab.

    Project Member Acknowledgement

Sub-Purposes

Purposes can be constructed as a hierarchy, meaning that purposes can contain nested sub-purposes, much like tags in Immuta. This design allows more flexibility in managing purpose-based restriction policies and transparency in the relationships among purposes.

For example, consider this organization of the sub-purposes of Research:

Sub-Purpose Builder

Instead of creating separate purposes, which must then each be added to policies as they evolve, a Governor could write the following Global Policy:

Limit usage to purpose(s) Research for everyone on data sources tagged PHI.

Now, any user acting under the purpose or sub-purpose of Research - whether Research.Marketing, Research.Onboarding.Customer, or Research.MedicalClaims - will meet the criteria of this policy. Consequently, purpose hierarchies eliminate the need for a Governor to re-write these Global Policies when sub-purposes are added or removed. Furthermore, if new projects with new Research purposes are added, for example, the relevant Global Policy will automatically be enforced.

Switching Project Contexts

The Immuta UI provides a simple way to switch project contexts so that users can access various data sources while acting under the appropriate purpose. By default, there will be no project selected, even if the user belongs to one or more projects in Immuta.

When users change project contexts, all SQL queries or blob fetches that run through Immuta will reflect users as acting under the purposes of that project, which may allow additional access to data if there are purpose restrictions on the data source(s). This process also allows organizations to track not just whether a specific data source is being used, but why.

Project Equalization

The same security restrictions regarding data sources are applied to projects; project members still need to be subscribed to data sources in order to access data, and only users with appropriate authorizations and credentials will be able to see the data if it contains any row-level or masking security.

However, Project Equalization improves collaboration by ensuring that the data in the project looks identical to all members, regardless of their level of access to data. When enabled, this feature automatically equalizes all permissions so that no project member has more access to data than the member with the least access.

Project Equalization

Note: Only project owners can add data sources to the project if this feature is enabled.

For instructions on enabling Project Equalization, navigate to the Project Owner guide.

Project Equalization and Subscription Policies

Once Project Equalization is enabled, the project Subscription Policy builder locks and can only be adjusted by manually editing the Equalized Entitlements. Then, the Subscription Policy will combine with the entitlement settings, depending on the policy type.

For example, consider the Subscription Policy of the following sample project, Fraud Prevention, before Project Equalization is enabled:

Fraud Prevention

Subscription Policy: Allow users to subscribe when approved by anyone with permission Owner (of this project).

After enabling Project Equalization, the following Equalized Entitlement is recommended by Immuta: User is a member of group Accounting.

Equalized Entitlements

In this particular example, the Equalized Subscription Policy contains the Equalized Entitlement and the approval of the original policy, so users must satisfy both conditions to subscribe:

  • the user must be a member of the group Accounting and
  • the user must be approved by anyone with permission Owner (of this project).

Combined Subscription and Entitlements

However, the way entitlements and approvals combine differs depending on the policy type; for clarity, the table below illustrates various scenarios for each type. Every row demonstrates how a specific project Subscription Policy changes after Project Equalization is enabled (when an equalized entitlement is set and when no entitlement is set) and how the policy reverts if Project Equalization is subsequently disabled.

Original Policy Equalized Policy (Example Entitlement: member of group Accounting) Equalized Policy (No Entitlement) Policy After Disabling Equalization
Anyone Allow user to subscribe when user is a member of group Accounting Individual Users You Select Individual Users You Select
Allow users to subscribe when approved by anyone with permission Owner (of this project) Allow users to subscribe when they satisfy all of the following: is a member of group Accounting and is approved by anyone with permission Owner (of this Project) Allow users to subscribe when approved by anyone with permission Owner (of this project) Allow users to subscribe when approved by anyone with permission Owner (of this project)
Allow users to subscribe to the project when user is a member of group Legal Allow users to subscribe to the project when user is a member of group Accounting Individual Users You Select Individual Users You Select
Individual Users You Select Allow users to subscribe to the project when user is a member of group Accounting Individual Users You Select Individual Users You Select

Equalized Entitlements

This setting adjusts the minimum entitlements (i.e., users' groups and authorizations) required to join the project and to access data within the project. When Project Equalization is enabled, Equalized Entitlements default to Immuta's recommended settings, but project owners can edit these settings by adding or removing parts of the entitlements. However, making these changes entails two potential disadvantages:

  • If you add entitlements, members might see more data as a whole, but at least some members of the project will be out of compliance. The status of users' compliance is visible from the Members tab within the project.

    Compliance Status

  • If you remove entitlements, the project will be open to users with fewer privileges, but this change might make less data visible to all project members. Removing entitlements is only recommended if you foresee new users joining with less access to data than the current members.

Validation Frequency

This setting determines how often user credentials are validated, which is critical if users share data with project members outside of Immuta, as they need a way to verify that those members' permissions are still valid. Validation Frequency provides those means of verification.

Masked Joins

This feature allows masked columns to be joined within the context of a project.

Masked Joins

Note: Masked columns cannot be joined across data sources that are not linked by a project.

For instructions on enabling Masked Joins, navigate to the Project Owner guide.

Derived Data Sources

Demo: Automated Policy Inheritance

When Project Equalization is enabled, members can use data sources within the project to create a derived data source, which dynamically inherits the Subscription Policies and purpose restriction Data Policies from the parent source(s).

For example, consider these data sources, which each contain a Subscription and Data Policy:

Data Source A

Subscription Policy: Allow users to subscribe to the data source when user is a member of group Medical Claims

Data Policy: Mask by making null the value in the column(s) address except for members of group Legal

Data Source B

Subscription Policy: Allow users to subscribe to the data source when user is approved by anyone with permission Owner and anyone with permission Governance

Data Policy: Limit usage to purpose(s) Research for everyone

If a user creates a derived data source, Data Source C, from these two data sources, Data Source C will inherit these policies, which will be unchangeable:

Data Source C

Subscription Policy: Allow user to subscribe when they satisfy all of the following:

  • is a member of group Legal and is a member of group Medical Claims
  • is approved by anyone with permission Owner (of Data Source B) and anyone with permission Governance

Data Policy: Limit usage to purpose(s) Research for everyone

Note: If members use data outside the project to create their data source, they must first add that data to the project and re-derive the data source through the project connection. When creating a derived data source, members are prompted to certify that their data is derived from the parent data sources they selected upon creation.

For detailed instructions on creating a derived data source, navigate to the Project Owner Guide.

HDFS Native Workspace

This workspace allows native access to data on cluster without having to go through the Immuta SparkSession or Immuta Query Engine. Within a project, users can enable HDFS Native Workspace, which creates a workspace directory in HDFS (and a corresponding database in the Hive metastore) where users can write files.

After a Project Owner creates a workspace, users will only be able to access this HDFS directory and database when acting under the project, and they should use the SparkSQL session to copy data into the workspace. The Immuta Spark SQL Session will apply policies to the data, so any data written to the workspace will already be compliant with the restrictions of the equalized project, where all members see data at the same level of access.

Once derived data is ready to be shared outside the workspace, it can be exposed as a derived data source in Immuta. At that point, the derived data source will inherit policies appropriately, and it will then be available through Immuta outside the project and can be used in future project workspaces by different teams in a compliant way.

Workspace Requirements

Administrators

  • Administrators can opt to configure where all Immuta projects are kept in HDFS (default is /user/immuta/workspace). Note: If an administrator changes the default directory, the Immuta user must have full access to that directory. Once any workspace is created, this directory can no longer be modified.
  • Administrators can place a configuration value in the cluster configuration (core-site.xml) to mark that cluster as unavailable for use as a workspace.

Project Owners

  • Once a project is equalized, Project Owners can enable a workspace for the project.
    • If more than one cluster is configured, Immuta will prompt for which to use.
    • Once enabled, the full URI of where that workspace is located will display on the project page.
    • Project Owners can also add connection information for Hive and/or Impala to allow Hive or Impala workspace sources to be created. The connection information provided and the Kerberos credentials configured for Immuta will be used for each derived Hive or Impala data source. The connection string for Hive or Impala will be displayed on the project page with the full URI.
  • Project Owners can disable the workspace at any time.
    • When disabled, the workspace will not allow reading/writing from project members any longer.
    • Data sources living in this directory will still exist and their access will not be changed. (Subscribed users will still have access as usual.)
    • All data in this directory will still exist, regardless of whether it belongs to a data source or not.
    • Project Owners can purge all data in the workspace after it has been disabled. Project Owners can
      • Purge all non-data-source data only.
      • Purge all data (including data source data).
        • When purging all data source data, sources can either be disabled or fully deleted.

Project Members

  • When a user is acting under the project context, Immuta will provide them read/write access to the project HDFS directory (using HDFS ACLs). If there are Immuta data sources already exposed in that directory, the user will bypass the namenode plugin if acting under the project for the data in that directory.
  • Once a user is not acting under the project, all access to that directory will be revoked and they can only access data in that project as official Immuta data sources, if any exist.
  • When users with the CREATE_DATA_SOURCE_IN_PROJECT permission create a derived data source with workspace enabled, they will be prompted with a modified create data source workflow:
    • The user will select the directory (starting with the project root directory) of the data they are exposing.
    • If the directory contains parquet or ORC files, then Hive, Impala, and HDFS will be an option for the data source; otherwise, only HDFS will be available.
    • Users will not be asked for the connection information because the Immuta user connection will be used to create the data source, which will ensure join pushdown and that the data source will work even when the user isn’t acting in the project. Note: Hive or Impala workspace sources are only available if the Project Owner added Hive or Impala connection information to the workspace.
    • If Hive or Impala is selected as the data source type, Immuta will infer schema/partitions from files and generate create table statements for Hive.
    • Once the data source is created, policy inheritance will take effect.

Note: To avoid data source collisions, Immuta will not allow HDFS and Hive/Impala data sources to be backed from the same location in HDFS.