Deprecation notice
This integration has been deprecated and replaced by the Databricks Unity Catalog integration.
This page provides an overview of the Databricks SQL integration in Immuta. For a tutorial detailing how to enable this integration, see the installation guide. Databricks SQL is currently in Public Preview. Please provide feedback on any issues you encounter, as well as insight regarding how you would like this feature to evolve in the future.
Immuta’s Databricks SQL integration provides users direct access to views in a protected database Immuta creates inside Databricks SQL when the integration is configured. This protected database includes
several tables and views Immuta creates to enable policy enforcement (storage of user entitlements, UDFs, etc.).
views that contain policy logic corresponding to the target data source exposed in Immuta by a Data Owner. This view is exposed to all users in Databricks SQL.
When an administrator configures the Databricks SQL integration with Immuta, Immuta creates an immuta
database and Databricks SQL creates a default
database in the SQL Endpoint. Data sources registered in Immuta are added as tables to the default database, and a view is created in the immuta
database for each of these tables.
The credentials provided to set up the integration must have the ability to
create an integration database
configure procedures and functions
maintain state between Databricks and Immuta
Databricks SQL has a two-level structure with databases and tables. To de-conflict these table names when Immuta creates views in the Immuta-protected database, Immuta prepends each table name with its parent database in Databricks SQL (which is configured in the Immuta UI). The following example illustrates a scenario where multiple Databricks SQL databases are configured in Immuta (whose protected database is named immuta_databricks_sql
in the SQL Endpoint):
Datasource A:
parent Databricks SQL database: public
table name: HR_data
Datasource B:
parent Databricks SQL database: default
table name: HR_data
Resulting Immuta views created:
Data Source A: immuta_databricks_sql.public_HR_data
Data Source B: immuta_databricks_sql.default_HR_data
Immuta uses dynamic views to enforce row- and column-level security in Databricks SQL. These dynamic views allow Immuta to manage which users have access to a view’s rows, columns, or specific records by filtering or masking their values.
When a Data Owner exposes a Databricks SQL table as a data source in Immuta and applies a policy to it, Immuta updates the policy definition in the protected immuta
database in Databricks SQL. Then, Immuta creates a dynamic view based on the table in the default
database, the querying users' entitlements, and policies that apply to that table. Finally, Databricks SQL users query the view through the protected immuta
database.
A Databricks SQL Administrator creates a Databricks SQL endpoint.
Databricks creates a default
database. Note: Immuta doesn’t lock down access to the default database; an administrator must do that within Databricks SQL itself.
The Databricks Admin creates a table of 10 million people and queries the table.
An Immuta Application Admin configures the Databricks SQL integration
Immuta creates a protected database inside the Databricks SQL endpoint.
A Data Owner creates data sources in Immuta from the default
Databricks database.
A user adds or edits a policy, or adds a user to a group that changes a policy on a data source.
Immuta updates the policy or user profile information in Databricks.
Immuta creates dynamic views based on tables in the default
database, users, groups, attributes, and policies.
Users query views in the protected database created by Immuta.
Deprecation notice
This integration has been deprecated and replaced by the Databricks Unity Catalog integration.
This page provides a tutorial for enabling the native Databricks SQL integration in Immuta. For an overview of the integration, see the Databricks SQL Overview documentation. Native SQL with Databricks SQL is currently in Public Preview. Please provide feedback on any issues you encounter, as well as insight regarding how you would like this feature to evolve in the future.
A functional Databricks SQL environment: For guidance in setting up and using a Databricks SQL environment, see the Get started with Databricks SQL guide in the Databricks documentation.
Databricks personal access token: Your organization's SQL Analytics administrator must generate a Databricks personal access token that will allow users to authenticate to the Databricks REST API and Immuta to connect to SQL endpoints. Databricks will only display this personal access token once, so be sure to copy and save it. If an administrator does not generate the token, it will not carry appropriate privileges to allow Immuta to create the Immuta database inside Databricks SQL when the integration is enabled and an error will be displayed in the Immuta UI.
Log in to Immuta and click the App Settings icon in the left sidebar.
Click the Integrations tab.
Click + Add Native Integration and select Databricks SQL (Public Preview) from the dropdown menu.
In Databricks, navigate to the Databricks SQL page in your Databricks workspace, click Endpoints, and then click the name of the SQL Analytics endpoint you want to configure in Immuta.
Use the information on the Connection Details page to fill in the following information in the Immuta UI:
Host: Use the Server Hostname from Databricks (e.g., https://company.cloud.databricks.com
)
HTTP Path: Use the HTTP Path from Databricks (e.g., /sq/1.0/endpoints/fff6d6eb3a9718cf9
)
The value in the Immuta Database field will be the name of the database that Immuta creates in Databricks SQL Analytics. Opt to change the default name, provided it doesn’t introduce a naming collision in your Databricks environment.
Enter the personal access token that was generated by a SQL Analytics administrator (not a user), and then click Test Databricks SQL Connection.
Click Save. Note that if you enter a personal access token that was generated by a SQL Analytics user, you won't be able to save the configuration successfully.
In Databricks SQL, revoke all privileges from users on databases that contain the backing tables in your SQL Endpoint. This will force users to go through the protected Immuta database to access data.
Once Databricks SQL has been successfully enabled in Immuta, Immuta will perform the following automated tasks:
Create an Immuta database.
Grant usage and select privileges to users on the Immuta database.
Create a system table on the Immuta database called <immuta_database_name>.__immuta_profiles
.
Deny SELECT on <immuta_database_name>.__immuta_profiles
to users.
Create a view called <immuta_database_name>.__immuta_user
, which is equivalent to SELECT * FROM <immuta_database_name>.__immuta_profiles WHERE immuta__userid = current_user
.
Add your SQL Analytics user accounts in Databricks SQL and give them access to the SQL Analytics endpoint as you normally would in Databricks.
Immuta requires an underlying data source in SQL Analytics to have an owner. To test if an object has an owner, run SHOW GRANT ON <object-name>
. If you do not see an entry with ActionType OWN
, the object does not have an owner. When table access control is disabled on a cluster or SQL endpoint, owners are not registered when a database, table, or view is created. You must either enable table access control on your cluster and SQL endpoint, or an admin must assign an owner to the object.
To assign an owner to the object, run the following command:
Register Databricks securables in Immuta.
Click the App Settings icon in the left sidebar.
Navigate to the Integration Settings section and click the down arrow next to the Databricks SQL Integration.
Edit the field you want to change. Note any field shadowed is not editable, and the integration must be disabled and re-installed to change it.
Click Validate Credentials.
Click Save.
Click Confirm.
Click the App Settings icon in the left sidebar.
Navigate to the Integration Settings section and click the down arrow next to the Databricks SQL Integration.
Click the checkbox to disable the integration.
Enter the username and password that were used to initially configure the integration and click Validate Credentials.
Click Save.
Click Confirm.
To add Databricks data sources in Immuta, follow this tutorial.
Deprecation notice
This integration has been deprecated and replaced by the .
This page describes the Databricks SQL integration, configuration options, and features. For a tutorial to enable this integration, see the . Databricks SQL is currently in Public Preview. Please provide feedback on any issues you encounter, as well as insight regarding how you would like this feature to evolve in the future.
Before an administrator configures the Databricks SQL integration within Immuta, a Databricks SQL administrator must set up a Databricks SQL environment. For guidance in setting up and using a Databricks SQL environment, see the in the Databricks documentation.
The Databricks SQL administrator must generate a (generated by a Databricks SQL administrator), which will be used to configure Databricks SQL with Immuta. This token allows users to authenticate to the Databricks REST API and Immuta to connect to SQL endpoints and create the Immuta database inside Databricks SQL. Databricks will only display this personal access token once, so be sure to copy and save it.
Note: If a Databricks SQL administrator does not generate the token, it will not carry appropriate privileges to allow Immuta to create this database and an error will be displayed in the Immuta UI.
The Databricks SQL integration supports the following authentication method to install the integration and create data sources:
Privileged User Token: Users can authenticate with a . Note: The access token should not have an expiration date. If it has an expiration date set, the token will need to be updated periodically when the current one expires.
Starting a SQL Analytics endpoint in Databricks SQL can take several minutes to complete. This startup time is inherent in the Databricks SQL product. As a result, for the Databricks SQL and Immuta Native SQL integration to function properly (i.e., for schema changes to be automatically detected, and for other basic functionality), you should ensure Auto Stop is set to OFF for your SQL Analytics endpoint in Databricks SQL Analytics. Please note that this has cost implications for your Databricks usage.
Currently, Databricks SQL does not have support for UDFs. Due to this limitation, Immuta is unable to support format preserving encryption, reversible masking, randomized response, or regex policies.
In some situations where subscription policies are being updated frequently, a bottleneck can occur with respect to showing and hiding view metadata. This will not affect typical use cases.
The Immuta Databricks SQL integration cannot ingest tags from Databricks SQL, but you can connect any of these to work with your integration.
Users can configure multiple integrations of with a single Immuta instance.
Project Workspaces | User Impersonation | Native Query Audit |