Skip to content

Projects

Audience: Data Owners, Data Users, and Data Governors

Content Summary: Projects allow users to logically group work by linking data sources in one place. These projects can be created by Data Users who want to efficiently organize their work, by Data Owners who want to provide special data access to specific users, or by Data Governors.

This overview describes concepts and major features of projects; the Project Owner, Project Member, and Project Governor guides provide tutorials for each of these user types.

Project Roles

The features and capabilities of each user differ based on the user's role within the project and within Immuta. Roles and their capabilities are outlined below.

Project Owner Capabilities

Users with the CREATE_PROJECT permission are considered owners of the projects they create and have the following capabilities:

Governor Capabilities

Governors have the following capabilities for any project in their organization, even for projects that are private or that they are not members of:

Project Member Capabilities

Once subscribed to a project, all Immuta users have the following capabilities:

Project Purposes and Acknowledgement Statements

The Data Governor is responsible for configuring project purposes and acknowledgement statements.

  • Purposes: Purposes help define the scope and use of data within a project. Governors create and manage purposes and sub-purposes; then project owners can add these purposes to their project(s), using them to drive Data Policies.

    Purposes Tab

  • Acknowledgement Statements: Projects containing purposes utilize acknowledgement statements. These require owners and subscribers to acknowledge that they will only use the data for those particular purposes by affirming or rejecting acknowledgement statements. If users accept the statement, they become a project member. Then data accessed under the provision of a project can be audited, and the purposes are noted. If they reject the acknowledgement statement, they are denied access to the project.

    Immuta provides default acknowledgement statements, but Data Governors can customize these statements in the Purposes tab.

    Project Member Acknowledgement

Sub-Purposes

Purposes can be constructed as a hierarchy, meaning that purposes can contain nested sub-purposes, much like tags in Immuta. This design allows more flexibility in managing purpose-based restriction policies and transparency in the relationships among purposes.

For example, consider this organization of the sub-purposes of Research:

Sub-Purpose Builder

Instead of creating separate purposes, that must each be managed separately, a Governor could write the following Global Policy:

Limit usage to purpose(s) Research for everyone on data sources tagged PHI.

Now, any user acting under the purpose or sub-purpose of Research - whether Research.Marketing, Research.Onboarding.Customer, or Research.MedicalClaims - meets the criteria of this policy. Purpose hierarchies eliminate the need for a Governor to re-write these Global Policies when sub-purposes are added or removed. Furthermore, if new projects with new parent purposes are added, then the relevant Global Policy is automatically enforced.

Switching Project Contexts

The Immuta UI provides a simple way to switch project contexts, giving users access various data sources while acting under the appropriate purpose. By default, there is no project selected, even if the user belongs to one or more projects in Immuta.

When users change project contexts, all SQL queries or blob fetches that run through Immuta reflect users as acting under the purposes of that project; this allows additional access to data if there are purpose restrictions on the data source(s). This process also allows organizations to track not just whether a specific data source is being used, but why.

Project Equalization

The same security restrictions regarding data sources are applied to projects. Project members need to be subscribed to the data sources in order to access data, and only users with appropriate attributes and credentials are able to see all the data if it contains any specific policies.

Project Equalization improves collaboration by taking those inconsistencies of access into consideration and ensuring that the data in the project looks identical to all members, regardless of their level of access. This feature automatically equalizes all permissions so that no project member has more access to data than the member with the least access.

Project Equalization

Note: Only project owners can add data sources to the project if this feature is enabled.

For instructions on enabling Project Equalization, navigate to the Project Owner guide.

Project Equalization and Subscription Policies

Once Project Equalization is enabled, the project Subscription Policy builder locks and can only be adjusted by manually editing the Equalized Entitlements. Then, the Subscription Policy combines with the entitlement settings, depending on the policy type.

For example, consider the Subscription Policy of the following sample project, Fraud Prevention, before Project Equalization is enabled:

Fraud Prevention

Subscription Policy: Allow users to subscribe when approved by anyone with permission Owner (of this project).

After enabling Project Equalization, the following Equalized Entitlement is recommended by Immuta: User is a member of group Accounting.

Equalized Entitlements

In this particular example, the Equalized Subscription Policy contains the Equalized Entitlement and the approval of the original policy; so users must satisfy both conditions to subscribe:

  • the user must be a member of the group Accounting
  • the user must be approved by anyone with permission Owner (of this project).

Combined Subscription and Entitlements

However, the way entitlements and approvals combine differs depending on the policy type; the table below illustrates various scenarios for each type. Every row demonstrates how a specific project Subscription Policy changes after Project Equalization is enabled (with and without equalized entitlement set) and how the policy reverts if Project Equalization is subsequently disabled.

Original Policy Equalized Policy (Example Entitlement: member of group Accounting) Equalized Policy (No Entitlement) Policy After Disabling Equalization
Anyone Allow user to subscribe when user is a member of group Accounting Individual Users You Select Individual Users You Select
Allow users to subscribe when approved by anyone with permission Owner (of this project) Allow users to subscribe when they satisfy all of the following: is a member of group Accounting and is approved by anyone with permission Owner (of this Project) Allow users to subscribe when approved by anyone with permission Owner (of this project) Allow users to subscribe when approved by anyone with permission Owner (of this project)
Allow users to subscribe to the project when user is a member of group Legal Allow users to subscribe to the project when user is a member of group Accounting Individual Users You Select Individual Users You Select
Individual Users You Select Allow users to subscribe to the project when user is a member of group Accounting Individual Users You Select Individual Users You Select

Equalized Entitlements

This setting adjusts the minimum entitlements (i.e., users' groups and attributes) required to join the project and to access data within the project. With Project Equalization enabled, Equalized Entitlements default to Immuta's recommended settings. Project owners can then edit by adding or removing parts of the entitlements; however, making these changes entails two potential disadvantages:

  • If you add entitlements, members might see more data as a whole, but at least some members of the project will be out of compliance. The status of users' compliance is visible from the Members tab within the project.

    Compliance Status

  • If you remove entitlements, the project opens to users with fewer privileges, but this change might make less data visible to all project members. Removing entitlements is only recommended if you foresee new users joining with less access to data than the current members.

Validation Frequency

This setting determines how often user credentials are validated. It is critical to use if users share data with project members outside of Immuta because they need a way to verify that those members' permissions are still valid.

Masked Joins

This feature allows masked columns to be joined within the context of a project.

Masked Joins

Note: Masked columns cannot be joined across data sources that are not linked by a project.

For instructions on enabling Masked Joins, navigate to the Project Owner guide.

Derived Data Sources

Demo: Automated Policy Inheritance

When Project Equalization is enabled, members can use data sources within the project to create a derived data source. This derived data source dynamically inherits the Subscription Policies and purpose restriction Data Policies from the parent source(s).

For example, consider these data sources, each containing a Subscription and Data Policy:

Data Source A

Subscription Policy: Allow users to subscribe to the data source when user is a member of group Medical Claims

Data Policy: Mask by making null the value in the column(s) address except for members of group Legal

Data Source B

Subscription Policy: Allow users to subscribe to the data source when user is approved by anyone with permission Owner and anyone with permission Governance

Data Policy: Limit usage to purpose(s) Research for everyone

If a user creates a derived data source from these two data sources, then that source (Data Source C) will inherit these policies, and these inherited policies are unchangeable:

Data Source C

Subscription Policy: Allow user to subscribe when they satisfy all of the following:

  • is a member of group Legal and is a member of group Medical Claims
  • is approved by anyone with permission Owner (of Data Source B) and anyone with permission Governance

Data Policy: Limit usage to purpose(s) Research for everyone

Note: If members use data outside the project to create their data source, then they must first add that data to the project and re-derive the data source through the project connection. When creating a derived data source, members are prompted to certify that their data is derived from the parent data sources they selected upon creation.

For detailed instructions on creating a derived data source, navigate to the Project Owner Guide.

Native Workspaces

HDFS

This workspace allows native access to data on cluster without having to go through the Immuta SparkSession or Immuta Query Engine. Within a project, users can enable HDFS Native Workspace, creating a workspace directory in HDFS (a corresponding database in the Hive metastore) where users can write files.

Accessing Data

After a Project Owner creates a workspace, users are only able to access this HDFS directory and database while acting under the project, and they should use the SparkSQL session to copy data into the workspace. The Immuta Spark SQL Session applies policies to the data; so any data written to the workspace is already compliant with the restrictions of the equalized project, letting all members see data at the same level of access.

Once derived data is ready to be shared outside the workspace, it can be exposed as a derived data source in Immuta. At that point, the derived data source inherit policies appropriately, and it is then available through Immuta outside the project and can be used in future project workspaces by different teams in a compliant way.

Requirements

Administrators
  • Administrators can opt to configure where all Immuta projects are kept in HDFS. The default is /user/immuta/workspace. Note: If an administrator changes the default directory, the Immuta user must have full access to that directory. Once any workspace is created, this directory can no longer be modified.
  • Administrators can place a configuration value in the cluster configuration (core-site.xml) to mark that cluster as unavailable for use as a workspace.
Project Owners
  • Once a project is equalized, Project Owners can enable a workspace for the project.
    • If more than one cluster is configured, Immuta prompts for which to use.
    • Once enabled, the full URI of where that workspace is located displays on the project page.
    • Project Owners can also add connection information for Hive or Impala to allow a workspace source to be created. The connection information provided and the Kerberos credentials configured for Immuta are used for each derived Hive or Impala data source. The connection string for Hive or Impala is displayed on the project page with the full URI.
  • Project Owners can disable the workspace at any time.
    • When disabled, the workspace does not allow reading/writing from project members any longer.
    • Data sources living in this directory still exist, and their access is not changed. Subscribed users still have access as usual.
    • All data in this directory still exists, regardless of whether it belongs to a data source or not.
    • After it has been disabled, Project Owners can purge all data in the workspace. They can purge all non-data-source data only or purge all data (including data source data).
      • When purging all data source data, sources can either be disabled or fully deleted.
Project Members
  • When a user is acting under the project context, Immuta provides them read/write access to the project HDFS directory (using HDFS ACLs). If there are Immuta data sources already exposed in that directory and the user is acting under the project for the data in that directory, then the user bypasses the namenode plugin.
  • Once a user is not acting under the project, all access to that directory is revoked, and they can only access data in that project as official Immuta data sources.
  • When users with the CREATE_DATA_SOURCE_IN_PROJECT permission create a derived data source with workspace enabled, they are prompted with a modified workflow:
    • The user selects the directory (starting with the project root directory) of the data they are exposing.
    • If the directory contains parquet or ORC files, then the data source options are: Hive, Impala, and HDFS. If the directory does not contain parquet or ORC files then only HDFS is available.
    • The Immuta user connection is used to create the data source. This ensures join pushdown and that the data source works even when the user is not acting in the project. Note: Hive or Impala workspace sources are only available if the Project Owner added Hive or Impala connection information to the workspace.
    • If Hive or Impala is selected as the data source type, then Immuta infers schema/partitions from files and generates create table statements for Hive.
    • Once the data source is created, policy inheritance takes effect.

Note: To avoid data source collisions, Immuta does not allow HDFS and Hive/Impala data sources to be backed from the same location in HDFS.

Snowflake

Snowflake workspaces allow users to access protected data directly in Snowflake without having to go through the Immuta Query Engine.

Accessing Data

Typically, Immuta applies policies by forcing users to query through the Query Engine, acting like a proxy in front of the database Immuta is protecting. However, this process becomes unnecessary with Snowflake's secure views. Immuta enforces policy logic on data representing it as secure views in Snowflake. Secure views are static but creating a secure view for every unique user in your organization for every table in your organization would result in secure view bloat. Immuta projects addresses this problem by virtually grouping users and tables and equalizing users to the same level of access; this ensures that all members of the project see the same view of the data. All members then share one secure view.

Beyond interacting directly with Snowflake secure views in these workspaces, users can create derived data sources and collaborate with other project members at a common access level. These derived data sources inherit all appropriate policies making that data sharable outside of the project. Additionally, derived data sources use the credentials of the Immuta system Snowflake account, and that allows them to persist after a workspace is disconnected.

Derived data sources can persist after a workspace is disconnected because they use the credentials of the Immuta system Snowflake account.

Policy Enforcement

Immuta enforces policy logic on data and represents it as secure views in Snowflake. All members see the same view of the data because the projects group users and tables to equalize members to the same level of access. Consequently, this makes only one secure view needed, and then changes to policies immediately propagate to relevant secure views.

Mapping Projects to Secure Views

Immuta projects are represented as Session Contexts within Snowflake. As they are linked to Snowflake, projects automatically create corresponding

  • roles in Snowflake: IMMUTA_[project name]
  • schemas in the Snowflake IMMUTA database: [project name]
  • secure views in the project schema for any table in the project

If users switch projects, they change their Snowflake Session Context to the appropriate Immuta project. If users are not entitled to a data source contained by the project, they are not able to access the Context in Snowflake until they have access to all tables in the project. If changes are made to a user's attributes, the changes immediately propagate to the Snowflake context.

Using Immuta with an Existing Snowflake Account

The following steps allow Immuta to be used with existing Snowflake accounts.

  1. Immuta is configured to integrate with the organization’s Snowflake account and to share a single sign on (such as Okta), allowing users in Immuta to map to the same users in Snowflake. Alternatively, that mapping can be inferred by using the same usernames in both Snowflake and Immuta.

  2. CREATE_DATA_SOURCE permissions are granted to specific users to allow them to expose Snowflake table metadata and enforce policies.

  3. Tags can be used to drive policies by users manually adding tags when tables are imported, Immuta automatically tagging sensitive data (if Sensitive Data Detection is enabled), or users pulling tags from external catalogs that are mapped to the tables being exposed.

  4. Policies are created and enforced on tables.

  5. The CREATE_PROJECT permission is granted to specific users so they can create their own Immuta projects and create the appropriate Snowflake contexts. These users can drive what projects and hence what Snowflake contexts exist. Note: When users leave a project or a project is deleted, that Snowflake context is removed from their Snowflake accounts.

  6. The CREATE_DATA_SOURCE_IN_PROJECT permission is given to specific users so they can expose their derived tables in the project. The derived tables inherit the policies, and then the data can be shared outside the project.

  7. Users access data only through secure views in Snowflake (via Immuta projects), significantly decreasing the amount of role management for administrators in Snowflake. Organizations should also consider having a user in Snowflake who is able to create databases and make GRANTs on those databases. Then having separate users who are able to read and write from those tables.

Benefits

  • Few roles to manage in Snowflake: That complexity is pushed to Immuta, and this is designed to simplify it.
  • A small set of users has direct access to raw tables: Most users go through secure views only, but raw database access can be segmented across departments.
  • Policies are built by the individual database administrators within Immuta and are managed in a single location: This lets changes to policies automatically propagate across thousands of tables’ secure views.
  • Self-service access to data based on data policies.
  • Users work in various contexts in Snowflake natively: Their contexts are based on their collaborators and their purpose, without fear of leaking data.
  • All policies are enforced natively in Snowflake without performance impact: Security is maintained through Snowflake primitives of roles and secure views, and performance and scalability is maintained with no proxy.
  • Policies can be driven by metadata: This allows massive scale policy enforcement with only a small set of actual policies.
  • Derived tables can be shared back out through Immuta: This improves user collaboration.
  • User access and removal are immediately reflected in secure views.

Limitations

  • Snowflake workspaces do not support differential privacy policies. Any Snowflake sources with differential privacy policies applied can not be created within the native Snowflake workspace.
  • Native derived data sources cannot be query-backed.

Cloudera

This workspace allows native access to data on cluster without having to go through the Immuta SparkSession or Immuta Query Engine.

Accessing Data

Users will only be able to access the directory and database created for the workspace when acting under the project. The Immuta Spark SQL Session will apply policies to the data, so any data written to the workspace will already be compliant with the restrictions of the equalized project, where all members see data at the same level of access. When users are ready to write data back to Immuta, they should use the SparkSQL session to copy data into the workspace.

Workspace Configuration Options

  • Cloudera HDFS
  • Cloudera S3A

Available Data Source Types

  • Amazon S3 (Cloudera S3A)
  • Apache Hive
  • Apache HDFS (Cloudera HDFS)
  • Apache Impala

Databricks

This workspace allows native access to data on cluster without having to go through the Immuta SparkSession or Immuta Query Engine.

Accessing Data

Users will only be able to access the directory and database created for the workspace when acting under the project. The Immuta Spark SQL Session will apply policies to the data, so any data written to the workspace will already be compliant with the restrictions of the equalized project, where all members see data at the same level of access. When users are ready to write data back to Immuta, they should use the SparkSQL session to copy data into the workspace.

When acting in the workspace project, users can read data using calls like spark.read.parquet("immuta:///some/path/to/a/workspace").

To write delta lake data to a workspace and then expose that delta table as a data source in Immuta, you must specify a table when creating the derived data source (rather than a directory) in the workspace for the data source.

Workspace Configuration

  • AWS S3
  • Microsoft Azure

EMR

This workspace allows native access to data on cluster without having to go through the Immuta SparkSession or Immuta Query Engine.

Accessing Data

Users will only be able to access the directory and database created for the workspace when acting under the project. The Immuta Spark SQL Session will apply policies to the data, so any data written to the workspace will already be compliant with the restrictions of the equalized project, where all members see data at the same level of access. When users are ready to write data back to Immuta, they should use the SparkSQL session to copy data into the workspace.

Workspace Configuration Options

  • EMR HDFS
  • EMR S3

Available Data Source Types

  • Apache Hive
  • Apache HDFS (EMR HDFS)
  • Amazon S3 (EMR S3)