Writing to Projects

Project workspaces

With equalization enabled, project users can create project workspaces for Snowflake or Databricks where users can view and write data.

Snowflake project workspaces

Deprecation notice

Support for this feature has been deprecated. See the Deprecations page for EOL dates.

Snowflake project workspaces allow users to access and write data directly in Snowflake.

With Snowflake project workspaces, Immuta enforces policy logic on registered tables and represents them as secure views in Snowflake. Since secure views are static, creating a secure view for every unique user in your organization for every table in your organization would result in secure view bloat; however, Immuta addresses this problem by virtually grouping users and tables and equalizing users to the same level of access, ensuring that all members of the project see the same view of the data. Consequently, all members share one secure view.

While interacting directly with Snowflake secure views in these workspaces, users can write within Snowflake while collaborating with other project members at a common access level.

Policy enforcement

Immuta enforces policy logic on data and represents it as secure views in Snowflake. Because projects group users and tables and equalize members to the same level of access, all members will see the same view of the data and, consequently, will only need one secure view. Changes to policies immediately propagate to relevant secure views.

Snowflake project workspace workflow

An Immuta user with the CREATE_PROJECT permission creates a new project with Snowflake data sources.
The Immuta project owner enables project equalization which balances every project members’ access to the data to be the same.
The Immuta project owner creates a Snowflake project workspace which automatically generates a subfolder in the root path specified by the application admin and remote database associated with the project.
Project members can access data sources within the project. To ensure equalization, users will only see data sources within their project as long as they are working in the Snowflake Context.
If a project member leaves a project or a project is deleted, that Snowflake Context will be removed from the user's Snowflake account.

Root directory details

Immuta only supports a single root location, so all projects will write to a subdirectory under this single root location.
If an administrator changes the default directory, the Immuta user must have full access to that directory. Once any workspace is created, this directory can no longer be modified.

Mapping projects to secure views

Immuta projects are represented as Session Contexts within Snowflake. As they are linked to Snowflake, projects automatically create corresponding

roles in Snowflake: IMMUTA_[project name]
schemas in the Snowflake IMMUTA database: [project name]
secure views in the project schema for any table in the project

To switch projects, users have to change their Snowflake Session Context to the appropriate Immuta project. If users are not entitled to a data source contained by the project, they will not be able to access the Context in Snowflake until they have access to all tables in the project. If changes are made to a user's attributes and access level, the changes will immediately propagate to the Snowflake Context.

Because users access data only through secure views in Snowflake, it significantly decreases the amount of role management for administrators in Snowflake. Organizations should also consider having a user in Snowflake who is able to create databases and make GRANTs on those databases and having separate users who are able to read and write from those tables.

Benefits

Few roles to manage in Snowflake; that complexity is pushed to Immuta, which is designed to simplify it.
A small set of users has direct access to raw tables; most users go through secure views only, but raw database access can be segmented across departments.
Policies are built by the individual database administrators within Immuta and are managed in a single location, and changes to policies are automatically propagated across thousands of tables’ secure views.
Self-service access to data based on data policies.
Users work in various contexts in Snowflake natively, based on their collaborators and their purpose, without fear of leaking data.
All policies are enforced natively in Snowflake without performance impact.
- Security is maintained through Snowflake primitives (roles and secure views).
- Performance and scalability is maintained (no proxy).
Policies can be driven by metadata, allowing massive scale policy enforcement with only a small set of actual policies.
User access and removal are immediately reflected in secure views.

Databricks Spark project workspaces

Using Immuta projects and project equalization, Databricks Spark project workspaces are a space where every project member has the same level of access to data. This equalized access allows collaboration without worries about data leaks. Not only can project members collaborate on data, but they can also write protected data to the project.

Users will only be able to access the directory and database created for the workspace when acting under the project. The Immuta Spark SQL Session will apply policies to the data, so any data written to the workspace will already be compliant with the restrictions of the equalized project, where all members see data at the same level of access. When users are ready to write data to the project, they should use the SparkSQL session to copy data into the workspace.

Databricks project workspace workflow

An Immuta user with the CREATE_PROJECT permission creates a new project with Databricks data sources.
The Immuta project owner enables project equalization which balances every project members’ access to the data to be the same.
The Immuta project owner creates a Databricks project workspace which automatically generates a subfolder in the root path specified by the application admin and remote database associated with the project.
The Immuta project members query equalized data within the context of the project, collaborate, and write data, all within Databricks.

Root directory details

Immuta only supports a single root location, so all projects will write to a subdirectory under this single root location.
If an administrator changes the default directory, the Immuta user must have full access to that directory. Once any workspace is created, this directory can no longer be modified.
Administrators can place a configuration value in the cluster configuration (core-site.xml) to mark that cluster as unavailable for use as a workspace.

Read data

When acting in the workspace project, users can read data using calls like spark.read.parquet("immuta:///some/path/to/a/workspace").

Supported cloud providers

Microsoft Azure

Immuta currently supports the abfss schema for Azure General Purpose V2 Storage Accounts. This includes support for Azure Data Lake Gen 2. When configuring Immuta workspaces for Databricks on Azure, the Azure Databricks workspace ID must be provided. More information about how to determine the workspace ID for your workspace can be found in the Databricks documentation. It is also important that the additional configuration file is included on any clusters that wish to use Immuta workspaces with credentials for the container in Azure Storage that contains Immuta workspaces.

Google Cloud Platform

Immuta currently supports the gs schema for Google Cloud Platform. The primary difference between Databricks on Google Cloud Platform and Databricks on AWS or Azure is that it is deployed to Google Kubernetes Engine. Databricks handles automatically provisioning and auto scaling drivers and executors to pods on Google Kubernetes Engine, so Google Cloud Platform admin users can view and monitor the Google Kubernetes resources in the Google Cloud Platform.

Caveats and limitations

Stage Immuta installation artifacts in Google Storage, not DBFS: The DBFS FUSE mount is unavailable, and the IMMUTA_SPARK_DATABRICKS_DBFS_MOUNT_ENABLED property cannot be set to true to expose the DBFS FUSE mount.
Stage the Immuta init script in Google Storage: Init scripts in DBFS are not supported.
Stage third-party libraries in DBFS: Installing libraries from Google Storage is not supported.
Install third-party libraries as cluster-scoped: Notebook-scoped libraries have limited support. See the Databricks trusted libraries section for more details.
Maven library installation is only supported in Databricks Runtime 8.1+.
/databricks/spark/conf/spark-env.sh is mounted as read-only:
- Set sensitive Immuta configuration values directly in immuta_conf.xml: Do not use environment variables to set sensitive Immuta properties. Immuta is unable to edit the spark-env.sh file because it is read-only; therefore, remove environment variables and keep them from being visible to end users.
- Use /immuta-scratch directly: The IMMUTA_LOCAL_SCRATCH_DIR property is unavailable.
Allow the Kubernetes resource to spin down before submitting another job: Job clusters with init scripts fail on subsequent runs.
The DBFS CLI is unavailable: Other non-DBFS Databricks CLI functions will still work as expected.

Supported metastore providers for Databricks

To write data to a table in Databricks through an Immuta workspace, use one of the following supported provider types for your table format:

avro
csv
delta
orc
parquet

PreviousReference Guides NextProject UDFs (Databricks)

Last updated 1 month ago

Was this helpful?

hashtagSnowflake project workspaces

hashtagPolicy enforcement

hashtagSnowflake project workspace workflow

hashtagRoot directory details

hashtagMapping projects to secure views

hashtagBenefits

hashtagDatabricks Spark project workspaces

hashtagDatabricks project workspace workflow

hashtagRoot directory details

hashtagRead data

hashtagSupported cloud providers

hashtagMicrosoft Azure

hashtagGoogle Cloud Platform

hashtagCaveats and limitations

hashtagSupported metastore providers for Databricks

Snowflake project workspaces

Policy enforcement

Snowflake project workspace workflow

Root directory details

Mapping projects to secure views

Benefits

Databricks Spark project workspaces

Databricks project workspace workflow

Root directory details

Read data

Supported cloud providers

Microsoft Azure

Google Cloud Platform

Caveats and limitations

Supported metastore providers for Databricks