Skip to content

Native Workspaces

Audience: Project members

Content Summary: This page provides an overview of native workspaces in projects.

Overview

Native workspaces allow project members to write data back to Immuta.

Once Project Equalization is enabled (ensuring that all members see the same data at the same level of access), project owners can create a native workspace. Then, project members can read data in the project and write to that location when acting under the project. Restricting data access to the equalized project guarantees that no data written to the project workspace will leak information.

Once derived data is ready to be shared outside the workspace, it can be exposed as a derived data source in Immuta. The derived data source will inherit policies from its parent source(s), and it will then be available through Immuta outside the project.

Configuring Workspaces

Each of the workspaces outlined below must be configured by an Application Admin before the workspaces can be enabled by a project owner. To do so, Application Admins configure Immuta to a root location for all data to be written to. Then, when a user creates an Immuta project, it will automatically generate a subfolder in that root path and remote database associated with the project. Immuta only supports a single root location, so all projects will write to a subdirectory under this single root location.

Note: If an administrator changes the default directory, the Immuta user must have full access to that directory. Once any workspace is created, this directory can no longer be modified. * Administrators can place a configuration value in the cluster configuration (core-site.xml) to mark that cluster as unavailable for use as a workspace.

Immuta Native Workspace Types

Each of the Immuta native workspace types is described below and includes details about the configuration options and available data source types, when applicable. To learn how to create a native workspace, navigate to Create a Native Workspace.

Cloudera

This workspace allows native access to data on cluster without having to go through the Immuta SparkSession or Immuta Query Engine.

Accessing Data

Users will only be able to access the directory and database created for the workspace when acting under the project. The Immuta Spark SQL Session will apply policies to the data, so any data written to the workspace will already be compliant with the restrictions of the equalized project, where all members see data at the same level of access. When users are ready to write data back to Immuta, they should use the SparkSQL session to copy data into the workspace.

Workspace Configuration Options

  • Cloudera HDFS
  • Cloudera S3A

Available Data Source Types

  • Amazon S3 (Cloudera S3A)
  • Apache Hive
  • Apache HDFS (Cloudera HDFS)
  • Apache Impala

Databricks

Databricks Cluster Configuration

Before creating a workspace, the cluster must send its configuration to Immuta; to do this, run a simple query on the cluster (i.e., show tables). Otherwise, an error message will occur when you attempt to create a workspace.

This workspace allows native access to data on cluster without having to go through the Immuta SparkSession or Immuta Query Engine.

Accessing Data

Users will only be able to access the directory and database created for the workspace when acting under the project. The Immuta Spark SQL Session will apply policies to the data, so any data written to the workspace will already be compliant with the restrictions of the equalized project, where all members see data at the same level of access. When users are ready to write data back to Immuta, they should use the SparkSQL session to copy data into the workspace.

When acting in the workspace project, users can read data using calls like spark.read.parquet("immuta:///some/path/to/a/workspace").

To write delta lake data to a workspace and then expose that delta table as a data source in Immuta, you must specify a table when creating the derived data source (rather than a directory) in the workspace for the data source.

Workspace Configuration Options

  • AWS S3
  • Microsoft Azure

EMR

This workspace allows native access to data on cluster without having to go through the Immuta SparkSession or Immuta Query Engine.

Accessing Data

Users will only be able to access the directory and database created for the workspace when acting under the project. The Immuta Spark SQL Session will apply policies to the data, so any data written to the workspace will already be compliant with the restrictions of the equalized project, where all members see data at the same level of access. When users are ready to write data back to Immuta, they should use the SparkSQL session to copy data into the workspace.

Workspace Configuration Options

  • EMR HDFS
  • EMR S3

Available Data Source Types

  • Apache Hive
  • Apache HDFS (EMR HDFS)
  • Amazon S3 (EMR S3)

Snowflake

Snowflake workspaces allow users to access protected data directly in Snowflake without having to go through the Immuta Query Engine.

Users can interact directly with Snowflake secure views in these workspaces, create derived data sources, and collaborate with other project members at a common access level. Because these derived data sources will inherit all appropriate policies, that data can then be shared outside the project. Additionally, derived data sources use the credentials of the Immuta system Snowflake account, which will allow them to persist after a workspace is disconnected.

Immuta projects are represented as Session Contexts within Snowflake. Once an Immuta native workspace is created for Snowflake, the project automatically creates corresponding

  • roles in Snowflake: IMMUTA_[project name]
  • schemas in the Snowflake IMMUTA database: [project name]
  • secure views in the project schema for any table in the project

When users switch projects, they should change their Snowflake Session Context to the appropriate Immuta project. If users are not entitled to a data source contained by the project, they will not be able to access the Context in Snowflake until they have access to all tables in the project, and if changes are made to a user's attributes, the changes will immediately propagate to the Snowflake context.

For more details, see Native Snowflake Workspaces.

Disabling Immuta Native Workspaces

Workspaces can be temporarily disconnected by disabling the project.

Disable Workspace

Alternatively, workspaces can be permanently deleted using one of these methods:

  • permanently deleted, while the data used by derived data sources is preserved. Note: If you created a derived data source that references a view on top of a table in Snowflake that isn't a derived data source, that table will be deleted and break the derived data source.
  • permanently deleted with all data purged.

    Delete Workspace