Native Databricks SQL Analytics Integration (Public Preview)
Audience: System Administrators
Content Summary: This page provides an overview of the native Databricks SQL Analytics integration in Immuta. For a tutorial detailing how to enable this native integration, see the installation guide. Native SQL with Databricks SQL Analytics is currently in Public Preview. Please provide feedback on any issues you encounter, as well as insight regarding how you would like this feature to evolve in the future.
Databricks SQL Analytics and Immuta Native SQL
Databricks SQL Analytics provides a simple experience for SQL users who want to run quick ad hoc queries on their data lake, create multiple visualization types to explore query results from different perspectives, and build and share dashboards.
By enabling the native integration between Databricks SQL Analytics and Immuta, users will access views created in an Immuta database inside their Databricks SQL Analytics environment.
An organization's Databricks SQL Analytics administrator must enable and configure the integration within Immuta. As a prerequisite, there must be a pre-existing functional Databricks SQL Analytics environment. For guidance in setting up and using a Databricks SQL Analytics environment, see the Get started with Databricks SQL guide in the Databricks documentation.
Authentication and Authorization
A Databricks personal access token (generated by a Databricks SQL Analytics administrator) allows users to authenticate to the Databricks REST API and Immuta to connect to SQL endpoints and create the Immuta database inside Databricks SQL Analytics. If a Databricks SQL Analytics administrator does not generate the token, it will not carry appropriate privileges to allow Immuta to create this database and an error will be displayed in the Immuta UI.
Databricks SQL Analytics Limitations
- Starting a SQL Analytics endpoint in Databricks SQL Analytics can take several minutes to complete. This startup time is inherent in the Databricks SQL Analytics product. As a result, for the Databricks SQL Analytics and Immuta Native SQL integration to function properly (i.e., for schema changes to be automatically detected, and for other basic functionality), you should ensure Auto Stop is set to OFF for your SQL Analytics endpoint in Databricks SQL Analytics. Please note that this has cost implications for your Databricks usage.
- Currently, Databricks SQL Analytics does not have support for UDFs. Due to this limitation, Immuta is unable to support format preserving encryption, reversible masking, randomized response (sometimes referred to as Local Differential Privacy), or regex policies.
- Unlike standard Databricks, in Databricks SQL Analytics Immuta does not have Spark plan access. As a result, policies in Databricks SQL Analytics are managed via new views that are automatically generated by Immuta, rather than directly on the original tables.
- In some situations where subscription policies are being updated frequently, a bottleneck can occur with respect to showing and hiding view metadata. This will not affect typical use cases.
- A Databricks SQL Analytics administrator creates a Databricks SQL Analytics endpoint.
- Databricks creates a
defaultdatabase. Note: Immuta doesn’t lock down access to the default database; an administrator must do that within Databricks SQL Analytics itself.
- The Databricks Admin creates a table of 10 million people and queries the table.
- An Immuta Application Admin configures the native Databricks SQL Analytics integration.
- Immuta creates a protected database inside the Databricks SQL Analytics endpoint.
- A Data Owner creates data sources in Immuta from the
- A user adds or edits a policy, or adds a user to a group that changes a policy on a data source.
- Immuta updates the policy or user profile information in Databricks.
- Immuta creates dynamic views based on tables in the
defaultdatabase, users, groups, attributes, and policies.
- Users query views in the protected database created by Immuta.