Skip to content

You are viewing documentation for Immuta version 2021.5.

For the latest version, view our documentation for Immuta SaaS or the latest self-hosted version.

Native Databricks SQL Integration (Public Preview)

Audience: System Administrators

Content Summary: This page provides an overview of the native Databricks SQL integration in Immuta. For a tutorial detailing how to enable this native integration, see the installation guide. Native SQL with Databricks SQL is currently in Public Preview. Please provide feedback on any issues you encounter, as well as insight regarding how you would like this feature to evolve in the future.

Databricks SQL and Immuta Native SQL

Databricks SQL provides a simple experience for SQL users who want to run quick ad hoc queries on their data lake, create multiple visualization types to explore query results from different perspectives, and build and share dashboards.

By enabling the native integration between Databricks SQL and Immuta, users will access views created in an Immuta database inside their Databricks SQL environment.

An organization's Databricks SQL administrator must enable and configure the integration within Immuta. As a prerequisite, there must be a pre-existing functional Databricks SQL environment. For guidance in setting up and using a Databricks SQL environment, see the Get started with Databricks SQL guide in the Databricks documentation.

Authentication and Authorization

A Databricks personal access token (generated by a Databricks SQL administrator) allows users to authenticate to the Databricks REST API and Immuta to connect to SQL endpoints and create the Immuta database inside Databricks SQL. If a Databricks SQL administrator does not generate the token, it will not carry appropriate privileges to allow Immuta to create this database and an error will be displayed in the Immuta UI.

Databricks SQL Limitations

  • Starting a SQL Analytics endpoint in Databricks SQL can take several minutes to complete. This startup time is inherent in the Databricks SQL product. As a result, for the Databricks SQL and Immuta Native SQL integration to function properly (i.e., for schema changes to be automatically detected, and for other basic functionality), you should ensure Auto Stop is set to OFF for your SQL Analytics endpoint in Databricks SQL Analytics. Please note that this has cost implications for your Databricks usage.
  • Currently, Databricks SQL does not have support for UDFs. Due to this limitation, Immuta is unable to support format preserving encryption, reversible masking, randomized response (sometimes referred to as Local Differential Privacy), or regex policies.
  • Unlike standard Databricks, in Databricks SQL Immuta does not have Spark plan access. As a result, policies in Databricks SQL are managed via new views that are automatically generated by Immuta, rather than directly on the original tables.
  • In some situations where subscription policies are being updated frequently, a bottleneck can occur with respect to showing and hiding view metadata. This will not affect typical use cases.

Data Flow

  1. A Databricks SQL administrator creates a Databricks SQL endpoint.
  2. Databricks creates a default database. Note: Immuta doesn’t lock down access to the default database; an administrator must do that within Databricks SQL itself.
  3. The Databricks Admin creates a table of 10 million people and queries the table.
  4. An Immuta Application Admin configures the native Databricks SQL integration.
  5. Immuta creates a protected database inside the Databricks SQL endpoint.
  6. A Data Owner creates data sources in Immuta from the default Databricks database.
  7. A user adds or edits a policy, or adds a user to a group that changes a policy on a data source.
  8. Immuta updates the policy or user profile information in Databricks.
  9. Immuta creates dynamic views based on tables in the default database, users, groups, attributes, and policies.
  10. Users query views in the protected database created by Immuta.

Databricks SQL Diagram