Skip to content

You are viewing documentation for Immuta version 2020.2.

For the latest version, view our documentation for Immuta SaaS or the latest self-hosted version.

Databricks Access Pattern

Audience: Data Owners and Data Users

Content Summary: This page provides an overview of the Databricks access pattern. For installation instructions, see the Databricks Installation Guide.


The Immuta Databricks integration allows you to protect access to tables and manage row-, column-, and cell-level controls without enabling table ACLs or credential passthrough. Like other integrations, policies are applied to the plan that Spark builds for a user's query and enforced live on-cluster.

Using Immuta with Databricks

Mapping Users

Usernames in Immuta must match usernames in Databricks. It is best practice to use the same identity manager for Immuta that you use for Databricks (Immuta supports all common identity manager protocols).

Configuring Tables

You should use a Databricks administrator account to register tables with Immuta using the UI or API; however, you should NOT test Immuta policies using a Databricks administrator account, as they are able to bypass controls. See the “Testing the Integration” section below for more details.

Ideally, you should register entire databases and run schema monitoring jobs through the python script provided during data source registration.

Testing the Integration

Test the integration on an Immuta-enabled cluster with a user that is NOT a Databricks administrator. To illustrate table access and policy controls, we will use two example accounts: Bob (test account) and Emily (administrator account).

Table Access

The administrator, Emily, can control who has access to specific tables in Databricks. The analyst, Bob, will only see the immuta database with no tables in it until he has gained access to tables through Immuta Subscription Policies Emily sets or by being manually added to the data source by Emily. Therefore, if Emily registers a database called fruit with tables banana, kiwi, and apple, once Bob has subscribed to those tables through Immuta, he will see the fruit database and its tables and be able to query them. Note: If Bob tries to query those tables before being subscribed, he will be blocked.

The immuta Database

All tables registered in Immuta will also appear in the immuta database, allowing for a single database for all tables, so in our example Bob would see fruit.banana,, and, and in the immuta database he would see immuta.fruit_banana, immuta.fruit_kiwi, and immuta.fruit_apple.

Immuta will also contain tables that are not in Databricks; if Emily had Athena tables registered with Immuta, they would show in the immuta database and would be queryable through Databricks. (Immuta automatically configures JDBC.)

Fine-grained Access Control

Once Bob is subscribed to the fruit database tables, Emily can apply fine-grained access controls, such as restricting rows or masking columns with advanced anonymization techniques, to manage what Bob can see in each table. More details on data policies can be found here.

Note: Immuta recommends building Global Policies rather than Local Policies, as they allow organizations to easily manage policies as a whole and capture system state in a more deterministic manner.

Accessing Data

All access controls must go through SparkSQL.


df = spark.sql("select * from")


import org.apache.spark.sql.SparkSession

val spark = SparkSession
  .appName("Spark SQL basic example")
  .config("spark.some.config.option", "some-value")
val sqlDF = spark.sql("SELECT * FROM")


select * from


df <- SparkR::sql("SELECT * from")

Note: With R, you must load the SparkR library in a cell before accessing the data.