Native Workspace Configuration for Databricks
Audience: System Administrators
Content Summary: This page describes how to configure Native Workspaces for Immuta-enabled Databricks clusters. For more information about Databricks deployments, please see the main installation guide.
Configuration
Tip
Before creating a workspace, the cluster must send its configuration to Immuta; to do this, run a simple query on the
cluster (i.e., show tables
). Otherwise, an error message will occur when you attempt to create a workspace.
Amazon Web Services
- Navigate to the App Settings page in Immuta Console.
- Select Native Workspace in the left sidebar.
- Select Add Workspace.
- For Workspace Type, select Databricks.
- For Scheme Select s3a.
- Fill out the modal.
Microsoft Azure
- Navigate to the App Settings page in Immuta Console.
- Select Native Workspace in the left sidebar.
- Select Add Workspace.
- For Workspace Type, select Databricks.
- For Scheme Select abfss.
- Fill out the modal.
Notes
General
-
When acting in the workspace project, users can read data using calls like
spark.read.parquet("immuta:///some/path/to/a/workspace")
. -
If you wish to write delta lake data to a workspace, and then expose that delta table as a data source in Immuta, you must specify a table when creating the derived data source (rather than a directory) in the workspace for the data source.
Amazon Web Services
-
Immuta currently supports the
s3a
scheme for Amazon S3. -
Either a key pair for S3 needs to be specified in additional config that has access to the workspace bucket/prefix or an instance role must be applied to the cluster with access.
Microsoft Azure
-
Immuta currently supports the
abfss
scheme for Azure General Purpose V2 Storage Accounts. This includes support for Azure Data Lake Gen 2. -
When configuring Immuta workspaces for Azure Databricks, the Azure Databricks Workspace ID must be provided. More information about how to determine the Workspace ID for your Azure Databricks Workspace can be found in the Databricks documentation.
-
In Azure Databricks it is important that an additional configuration file is included on any clusters that wish to use Immuta workspaces with credentials for the container in Azure Storage that contains Immuta workspaces.