Skip to content

Databricks Access Pattern

Audience: Data Owners and Data Users

Content Summary: This page provides an overview of the Databricks access pattern. For installation details, see the Databricks Installation Guide.

Overview

This native integration makes Databricks data sources exposed in Immuta available as tables in Databricks under the 'immuta' database cluster, and users can then query these data sources through their Notebook. Like other integrations, policies are applied to the plan that Spark builds for a user's query and all data access is native.

Using Immuta with Databricks

The Databricks integration is not a significant departure from the Spark integrations. The biggest difference between CDH/EMR and Databricks is that in Databricks the user will not have to use an "ImmutaSparkSession" object, but just the normal SparkSession.

With Immuta installed, the cluster can be used to access data via Immuta. Note: Immuta recommends removing direct access for all Immuta users to the underlying sensitive data so that they are forced to use the Immuta database to access anything.

Once Immuta-enabled clusters are created, users can expose Databricks data sources and then query those sources from an Immuta-enabled cluster. Data sources on-cluster are exposed via the immuta database in Spark, and users can query tables in the immuta database either via SQL statements or with Python using the SparkSession provided in the notebook.

Limitations

  • Databricks Connect is not currently supported.
  • Before a user can query data in the immuta database, they must be GRANTed access to the database. There isn't currently a way in Databricks to GRANT access to everyone so this must be done for each user.