Databricks Access Pattern
Audience: Data Owners and Data Users
Content Summary: This page provides an overview of the Databricks access pattern. To install this access pattern, see the Databricks Installation Guide.
Overview
This native integration makes Databricks data sources exposed in Immuta available as tables directly in Databricks under the 'immuta' database on the cluster, and users can then query these data sources through their Notebook. Like other integrations, policies are applied to the plan that Spark builds for a user's query and enforced live on cluster and does NOT create a new copy of the data.
Using Immuta with Databricks
Mapping Users
Usernames in Immuta must match usernames in Databricks. It is best practice to use the same identity manager for Immuta that you use for Databricks (Immuta supports all common identity manager protocols).
Configuring Tables
The cluster must have Databricks table ACLs enabled with only administrators having access to the original tables or views. For example, the following table would only be available to the cluster administrator due to table ACLs:
Then, using the Immuta UI (or API), users would register this taxi_trip
table in Immuta. For more details, see this
Databricks Data Source Creation Tutorial.
Note: In the Immuta July 2020 release, we will no longer require table ACLs or the separate immuta
database;
you will be able to manage and query the tables in-place in their original database without changes to downstream
queries or requiring that first manual GRANT step.
Database Access
After the data source is created in Immuta and you have granted users access (through table ACLs) to the immuta
database, users can access that table directly in Databricks. This is just an initial one-time opening of the database;
actual table controls (described below) are managed through Immuta. For example, to provide access to the
immuta
database to all users, you would run the following command (Note that you can do this for individual
users, too):
Table Access
Users can access tables once they are provided access to the data source through Immuta Subscription Policies configured in the UI. Additionally, users can be manually added to the data source from the Members tab.
Fine-grained Access Control
After the data source has been created in Immuta, you can build other Data Policies, such as column masking techniques or row-level security, on the table to restrict what the user sees in the table.
Limitations
- Databricks Connect is not currently supported.