Databricks SQL
Deprecation notice
This integration has been deprecated and replaced by the Databricks Unity Catalog integration.
This page provides an overview of the Databricks SQL integration in Immuta. For a tutorial detailing how to enable this integration, see the installation guide. Databricks SQL is currently in Public Preview. Please provide feedback on any issues you encounter, as well as insight regarding how you would like this feature to evolve in the future.
Overview
Immuta’s Databricks SQL integration provides users direct access to views in a protected database Immuta creates inside Databricks SQL when the integration is configured. This protected database includes
several tables and views Immuta creates to enable policy enforcement (storage of user entitlements, UDFs, etc.).
views that contain policy logic corresponding to the target data source exposed in Immuta by a Data Owner. This view is exposed to all users in Databricks SQL.
Architecture
When an administrator configures the Databricks SQL integration with Immuta, Immuta creates an immuta
database and Databricks SQL creates a default
database in the SQL Endpoint. Data sources registered in Immuta are added as tables to the default database, and a view is created in the immuta
database for each of these tables.
The credentials provided to set up the integration must have the ability to
create an integration database
configure procedures and functions
maintain state between Databricks and Immuta
De-Conflicting Tables
Databricks SQL has a two-level structure with databases and tables. To de-conflict these table names when Immuta creates views in the Immuta-protected database, Immuta prepends each table name with its parent database in Databricks SQL (which is configured in the Immuta UI). The following example illustrates a scenario where multiple Databricks SQL databases are configured in Immuta (whose protected database is named immuta_databricks_sql
in the SQL Endpoint):
Datasource A:
parent Databricks SQL database: public
table name: HR_data
Datasource B:
parent Databricks SQL database: default
table name: HR_data
Resulting Immuta views created:
Data Source A: immuta_databricks_sql.public_HR_data
Data Source B: immuta_databricks_sql.default_HR_data
Policy Enforcement
Immuta uses dynamic views to enforce row- and column-level security in Databricks SQL. These dynamic views allow Immuta to manage which users have access to a view’s rows, columns, or specific records by filtering or masking their values.
When a Data Owner exposes a Databricks SQL table as a data source in Immuta and applies a policy to it, Immuta updates the policy definition in the protected immuta
database in Databricks SQL. Then, Immuta creates a dynamic view based on the table in the default
database, the querying users' entitlements, and policies that apply to that table. Finally, Databricks SQL users query the view through the protected immuta
database.
Data Flow
A Databricks SQL Administrator creates a Databricks SQL endpoint.
Databricks creates a
default
database. Note: Immuta doesn’t lock down access to the default database; an administrator must do that within Databricks SQL itself.The Databricks Admin creates a table of 10 million people and queries the table.
An Immuta Application Admin configures the Databricks SQL integration
Immuta creates a protected database inside the Databricks SQL endpoint.
A Data Owner creates data sources in Immuta from the
default
Databricks database.A user adds or edits a policy, or adds a user to a group that changes a policy on a data source.
Immuta updates the policy or user profile information in Databricks.
Immuta creates dynamic views based on tables in the
default
database, users, groups, attributes, and policies.Users query views in the protected database created by Immuta.
Last updated