Skip to content

Projects

Audience: Data Owners, Data Users, and Data Governors

Content Summary: Projects allow users to logically group work by linking data sources and can be created by Data Users who want to efficiently organize their work or by Data Owners who want to provide special access to data to specific users. Additionally, Data Governors can act as project owners for any project in their organization.

This overview describes concepts related to and major features of projects; the Project Owner, Project Member, and Project Governor guides provide tutorials for each of these user types.

Project Roles

The features and capabilities of each user differ based on the user's role within the project and within Immuta. Roles and their capabilities are outlined below.

Project Owner Capabilities

Users with the CREATE_PROJECT permission are considered owners of the projects they create and have the following capabilities:

Governor Capabilities

Governors have the following capabilities for any project in their organization, even for projects that are private or that they are not members of:

Project Member Capabilities

Once subscribed to a project, all Immuta users have the following capabilities as project members:

Project Purposes and Acknowledgement Statements

The Data Governor is responsible for configuring project purposes and acknowledgement statements.

  • Purposes: Purposes help define the scope and use of data within a project and allow users to meet purpose restrictions on policies. Governors create and manage purposes and their sub-purposes, which project owners then add to their project(s) and use to drive Data Policies.

    Purposes Tab

  • Acknowledgement Statements: Projects containing purposes require owners and subscribers to acknowledge that they will only use the data for those purposes by affirming or rejecting acknowledgement statements. If users accept the statement, they become a project member. If they reject the acknowledgement statement, they are denied access to the project. Once acknowledged, data accessed under the provision of a project will be audited and the purposes will be noted. Immuta provides default acknowledgement statements, but Data Governors can customize these statements in the Purposes tab.

    Project Member Acknowledgement

Sub-Purposes

Purposes can be constructed as a hierarchy, meaning that purposes can contain nested sub-purposes, much like tags in Immuta. This design allows more flexibility in managing purpose-based restriction policies and transparency in the relationships among purposes.

For example, consider this organization of the sub-purposes of Research:

Sub-Purpose Builder

Instead of creating separate purposes, which must then each be added to policies as they evolve, a Governor could write the following Global Policy:

Limit usage to purpose(s) Research for everyone on data sources tagged PHI.

Now, any user acting under the purpose or sub-purpose of Research - whether Research.Marketing, Research.Onboarding.Customer, or Research.MedicalClaims - will meet the criteria of this policy. Consequently, purpose hierarchies eliminate the need for a Governor to re-write these Global Policies when sub-purposes are added or removed. Furthermore, if new projects with new Research purposes are added, for example, the relevant Global Policy will automatically be enforced.

Switching Project Contexts

The Immuta UI provides a simple way to switch project contexts so that users can access various data sources while acting under the appropriate purpose. By default, there will be no project selected, even if the user belongs to one or more projects in Immuta.

When users change project contexts, all SQL queries or blob fetches that run through Immuta will reflect users as acting under the purposes of that project, which may allow additional access to data if there are purpose restrictions on the data source(s). This process also allows organizations to track not just whether a specific data source is being used, but why.

Project Equalization

The same security restrictions regarding data sources are applied to projects; project members still need to be subscribed to data sources in order to access data, and only users with appropriate attributes and credentials will be able to see the data if it contains any row-level or masking security.

However, Project Equalization improves collaboration by ensuring that the data in the project looks identical to all members, regardless of their level of access to data. When enabled, this feature automatically equalizes all permissions so that no project member has more access to data than the member with the least access.

Project Equalization

Note: Only project owners can add data sources to the project if this feature is enabled.

For instructions on enabling Project Equalization, navigate to the Project Owner guide.

Project Equalization and Subscription Policies

Once Project Equalization is enabled, the project Subscription Policy builder locks and can only be adjusted by manually editing the Equalized Entitlements. Then, the Subscription Policy will combine with the entitlement settings, depending on the policy type.

For example, consider the Subscription Policy of the following sample project, Fraud Prevention, before Project Equalization is enabled:

Fraud Prevention

Subscription Policy: Allow users to subscribe when approved by anyone with permission Owner (of this project).

After enabling Project Equalization, the following Equalized Entitlement is recommended by Immuta: User is a member of group Accounting.

Equalized Entitlements

In this particular example, the Equalized Subscription Policy contains the Equalized Entitlement and the approval of the original policy, so users must satisfy both conditions to subscribe:

  • the user must be a member of the group Accounting and
  • the user must be approved by anyone with permission Owner (of this project).

Combined Subscription and Entitlements

However, the way entitlements and approvals combine differs depending on the policy type; for clarity, the table below illustrates various scenarios for each type. Every row demonstrates how a specific project Subscription Policy changes after Project Equalization is enabled (when an equalized entitlement is set and when no entitlement is set) and how the policy reverts if Project Equalization is subsequently disabled.

Original Policy Equalized Policy (Example Entitlement: member of group Accounting) Equalized Policy (No Entitlement) Policy After Disabling Equalization
Anyone Allow user to subscribe when user is a member of group Accounting Individual Users You Select Individual Users You Select
Allow users to subscribe when approved by anyone with permission Owner (of this project) Allow users to subscribe when they satisfy all of the following: is a member of group Accounting and is approved by anyone with permission Owner (of this Project) Allow users to subscribe when approved by anyone with permission Owner (of this project) Allow users to subscribe when approved by anyone with permission Owner (of this project)
Allow users to subscribe to the project when user is a member of group Legal Allow users to subscribe to the project when user is a member of group Accounting Individual Users You Select Individual Users You Select
Individual Users You Select Allow users to subscribe to the project when user is a member of group Accounting Individual Users You Select Individual Users You Select

Equalized Entitlements

This setting adjusts the minimum entitlements (i.e., users' groups and attributes) required to join the project and to access data within the project. When Project Equalization is enabled, Equalized Entitlements default to Immuta's recommended settings, but project owners can edit these settings by adding or removing parts of the entitlements. However, making these changes entails two potential disadvantages:

  • If you add entitlements, members might see more data as a whole, but at least some members of the project will be out of compliance. The status of users' compliance is visible from the Members tab within the project.

    Compliance Status

  • If you remove entitlements, the project will be open to users with fewer privileges, but this change might make less data visible to all project members. Removing entitlements is only recommended if you foresee new users joining with less access to data than the current members.

Validation Frequency

This setting determines how often user credentials are validated, which is critical if users share data with project members outside of Immuta, as they need a way to verify that those members' permissions are still valid. Validation Frequency provides those means of verification.

Masked Joins

This feature allows masked columns to be joined within the context of a project.

Masked Joins

Note: Masked columns cannot be joined across data sources that are not linked by a project.

For instructions on enabling Masked Joins, navigate to the Project Owner guide.

Derived Data Sources

Demo: Automated Policy Inheritance

When Project Equalization is enabled, members can use data sources within the project to create a derived data source, which dynamically inherits the Subscription Policies and purpose restriction Data Policies from the parent source(s).

For example, consider these data sources, which each contain a Subscription and Data Policy:

Data Source A

Subscription Policy: Allow users to subscribe to the data source when user is a member of group Medical Claims

Data Policy: Mask by making null the value in the column(s) address except for members of group Legal

Data Source B

Subscription Policy: Allow users to subscribe to the data source when user is approved by anyone with permission Owner and anyone with permission Governance

Data Policy: Limit usage to purpose(s) Research for everyone

If a user creates a derived data source, Data Source C, from these two data sources, Data Source C will inherit these policies, which will be unchangeable:

Data Source C

Subscription Policy: Allow user to subscribe when they satisfy all of the following:

  • is a member of group Legal and is a member of group Medical Claims
  • is approved by anyone with permission Owner (of Data Source B) and anyone with permission Governance

Data Policy: Limit usage to purpose(s) Research for everyone

Note: If members use data outside the project to create their data source, they must first add that data to the project and re-derive the data source through the project connection. When creating a derived data source, members are prompted to certify that their data is derived from the parent data sources they selected upon creation.

For detailed instructions on creating a derived data source, navigate to the Project Owner Guide.

Native Workspaces

HDFS

This workspace allows native access to data on cluster without having to go through the Immuta SparkSession or Immuta Query Engine. Within a project, users can enable HDFS Native Workspace, which creates a workspace directory in HDFS (and a corresponding database in the Hive metastore) where users can write files.

After a Project Owner creates a workspace, users will only be able to access this HDFS directory and database when acting under the project, and they should use the SparkSQL session to copy data into the workspace. The Immuta Spark SQL Session will apply policies to the data, so any data written to the workspace will already be compliant with the restrictions of the equalized project, where all members see data at the same level of access.

Once derived data is ready to be shared outside the workspace, it can be exposed as a derived data source in Immuta. At that point, the derived data source will inherit policies appropriately, and it will then be available through Immuta outside the project and can be used in future project workspaces by different teams in a compliant way.

Requirements

Administrators
  • Administrators can opt to configure where all Immuta projects are kept in HDFS (default is /user/immuta/workspace). Note: If an administrator changes the default directory, the Immuta user must have full access to that directory. Once any workspace is created, this directory can no longer be modified.
  • Administrators can place a configuration value in the cluster configuration (core-site.xml) to mark that cluster as unavailable for use as a workspace.
Project Owners
  • Once a project is equalized, Project Owners can enable a workspace for the project.
    • If more than one cluster is configured, Immuta will prompt for which to use.
    • Once enabled, the full URI of where that workspace is located will display on the project page.
    • Project Owners can also add connection information for Hive and/or Impala to allow Hive or Impala workspace sources to be created. The connection information provided and the Kerberos credentials configured for Immuta will be used for each derived Hive or Impala data source. The connection string for Hive or Impala will be displayed on the project page with the full URI.
  • Project Owners can disable the workspace at any time.
    • When disabled, the workspace will not allow reading/writing from project members any longer.
    • Data sources living in this directory will still exist and their access will not be changed. (Subscribed users will still have access as usual.)
    • All data in this directory will still exist, regardless of whether it belongs to a data source or not.
    • Project Owners can purge all data in the workspace after it has been disabled. Project Owners can
      • Purge all non-data-source data only.
      • Purge all data (including data source data).
        • When purging all data source data, sources can either be disabled or fully deleted.
Project Members
  • When a user is acting under the project context, Immuta will provide them read/write access to the project HDFS directory (using HDFS ACLs). If there are Immuta data sources already exposed in that directory, the user will bypass the namenode plugin if acting under the project for the data in that directory.
  • Once a user is not acting under the project, all access to that directory will be revoked and they can only access data in that project as official Immuta data sources, if any exist.
  • When users with the CREATE_DATA_SOURCE_IN_PROJECT permission create a derived data source with workspace enabled, they will be prompted with a modified create data source workflow:
    • The user will select the directory (starting with the project root directory) of the data they are exposing.
    • If the directory contains parquet or ORC files, then Hive, Impala, and HDFS will be an option for the data source; otherwise, only HDFS will be available.
    • Users will not be asked for the connection information because the Immuta user connection will be used to create the data source, which will ensure join pushdown and that the data source will work even when the user isn’t acting in the project. Note: Hive or Impala workspace sources are only available if the Project Owner added Hive or Impala connection information to the workspace.
    • If Hive or Impala is selected as the data source type, Immuta will infer schema/partitions from files and generate create table statements for Hive.
    • Once the data source is created, policy inheritance will take effect.

Note: To avoid data source collisions, Immuta will not allow HDFS and Hive/Impala data sources to be backed from the same location in HDFS.

Snowflake

Snowflake workspaces allow users to access protected data directly in Snowflake without having to go through the Immuta SparkSession or Immuta Query Engine. Within these workspaces, users can interact directly with Snowflake secure views, create derived data sources, and collaborate with other project members at a common access level. Because these derived data sources will inherit all appropriate policies, that data can then be shared outside the project. Additionally, derived data sources use the credentials of the Immuta system Snowflake account, which will allow them to persist after a workspace is disconnected.

Policy Enforcement

Immuta enforces policy logic on data and represents it as secure views in Snowflake. Because projects group users and tables and equalize members to the same level of access, all members will see the same view of the data and, consequently, will only need one secure view. Changes to policies immediately propagate to relevant secure views.

Mapping Projects to Secure Views

Immuta projects are represented as Session Contexts within Snowflake. As they are linked to Snowflake, projects automatically create corresponding

  • roles in Snowflake: IMMUTA_[project name]
  • schemas in the Snowflake IMMUTA database: [project name]
  • secure views in the project schema for any table in the project

If users switch projects, they simply change their Snowflake Session Context to the appropriate Immuta project. If users are not entitled to a data source contained by the project, they will not be able to access the Context in Snowflake until they have access to all tables in the project. If changes are made to a user's attributes, the changes will immediately propagate to the Snowflake context.

Using Immuta with an Existing Snowflake Account

The following steps allow Immuta to be used with existing Snowflake accounts.

  1. Immuta is configured to integrate with the organization’s Snowflake account and (optionally) share a single sign on (such as Okta), allowing users in Immuta to map to the same users in Snowflake. (Alternatively, that mapping can be inferred by using the same usernames in both Snowflake and Immuta.)

  2. CREATE_DATA_SOURCE permissions are granted to specific users to allow them to expose Snowflake table metadata and enforce policies.

  3. If tags are used to drive policies, users can manually add tags when tables are imported, Immuta can automatically tag sensitive data (if Sensitive Data Detection is enabled), or users can pull tags from external catalogs that are mapped to the tables being exposed.

  4. Policies are created and enforced on tables.

  5. The CREATE_PROJECT permission is granted to specific users so they can create their own Immuta projects and create the appropriate Snowflake contexts. These users can drive what projects and hence what Snowflake contexts exist. Note: When users leave a project or a project is deleted, that Snowflake context will be removed from their Snowflake accounts.

  6. The CREATE_DATA_SOURCE_IN_PROJECT permission is given to specific users so they can expose their derived tables in the project; the derived tables will inherit the policies, and then the data can be shared outside the project.

  7. Users access data only through secure views in Snowflake (via Immuta projects), which significantly decreases the amount of role management for administrators in Snowflake. Organizations should also consider having a user in Snowflake who is able to create databases and make GRANTs on those databases and having separate users who are able to read and write from those tables.

Benefits

  • Few roles to manage in Snowflake; that complexity is pushed to Immuta, which is designed to simplify it.
  • A small set of users has direct access to raw tables; most users go through secure views only, but raw database access can be segmented across departments.
  • Policies are built by the individual database administrators within Immuta and are managed in a single location, and changes to policies are automatically propagated across thousands of tables’ secure views.
  • Self-service access to data based on data policies.
  • Users work in various contexts in Snowflake natively, based on their collaborators and their purpose, without fear of leaking data.
  • All policies are enforced natively in Snowflake without performance impact.

    • Security is maintained through Snowflake primitives (roles and secure views).
    • Performance and scalability is maintained (no proxy).
  • Policies can be driven by metadata, allowing massive scale policy enforcement with only a small set of actual policies.

  • Derived tables can be shared back out through Immuta, improving collaboration.
  • User access and removal are immediately reflected in secure views.

Limitations

  • Snowflake workspaces do not support differential privacy policies. Any Snowflake sources with differential privacy policies applied will not be created within the native Snowflake workspace.
  • Native derived data sources can't be query-backed.