Databricks Access Pattern
Audience: Data Owners and Data Users
Content Summary: This page provides an overview of the Databricks access pattern. For installation details, see the Databricks Installation Guide.
This native integration makes Databricks data sources exposed in Immuta available as tables in a Databricks under the 'immuta' database cluster, and users can then query these data sources through their Notebook. Like other integrations, policies are applied to the plan that Spark builds for a user's query and all data access is native.
Using Immuta with Databricks
The Databricks integration is not a significant departure from the Spark integrations. The biggest difference between CDH/EMR and Databricks is that in Databricks the user will not have to use an "ImmutaSparkSession" object, but just the normal SparkSession.
With Immuta installed, the cluster can be used to access data via Immuta. Note: Immuta recommends removing direct access for all Immuta users to the underlying sensitive data so that they are forced to use the Immuta database to access anything.
Once Immuta-enabled clusters are created, users can expose Databricks data sources and then query those sources from
an Immuta-enabled cluster. Data sources on-cluster are exposed via the
immuta database in Spark, and users can query
tables in the
immuta database either via SQL statements or with Python using the SparkSession provided in the
- Databricks Connect is not currently supported.
- Before a user can query data in the
immutadatabase, they must be GRANTed access to the database. There isn't currently a way in Databricks to GRANT access to everyone so this must be done for each user.