Skip to content

You are viewing documentation for Immuta version 2022.1.

For the latest version, view our documentation for Immuta SaaS or the latest self-hosted version.

Databricks Pre-Configuration Details

Audience: System Administrators, Data Owners, and Data Users

Content Summary: This page describes the Databricks integration, configuration options, and features.

See the Databricks integration page for a tutorial on enabling Databricks and these features through the App Settings page.

Feature Availability

Project Workspaces Databricks Tag Ingestion User Impersonation Native Query Audit Multiple Integrations
✅ ❌ ✅ ✅ ✅

Databricks-Specific Details

Prerequisites

  • Databricks instance: Premium tier workspace and Cluster access control enabled
  • Databricks instance has network level access to Immuta instance
  • Access to Immuta releases
  • Permissions and access to download (outside Internet access) or transfer files to the host machine

Recommended Databricks Workspace Configurations:

Note: Azure Databricks authenticates users with Azure AD. Be sure to configure your Immuta instance with an IAM that uses the same user ID as does Azure AD. Immuta's Spark security plugin will look to match this user ID between the two systems. See this Azure Active Directory page for details.

Supported Databricks Runtime Versions

Immuta supports the following Databricks Runtimes:

Supported Databricks Runtimes
  • 5.5
  • 6.4
  • 7.3
  • 7.4
  • 7.5
  • 7.6
  • 8.0
  • 8.1
  • 8.2
  • 8.3
  • 8.4
  • 9.1
  • 10.0
  • 10.1

Supported Clusters

Immuta supports both high concurrency and standard clusters. However, the languages supported by these cluster types differ.

  • High Concurrency Cluster Supported Languages:

    • Python
    • SQL
    • R (requires advanced configuration; work with your Immuta support professional to use R)
  • Standard Cluster Supported Languages:

    • Python
    • SQL
    • R (requires advanced configuration; work with your Immuta support professional to use R)
    • Scala (requires advanced configuration; work with your Immuta support professional to use Scala)

Supported Features

The Immuta Databricks integration supports the following Databricks features:

  • Change Data Feed: Databricks users can see the Databricks Change Data Feed on queried tables if they are allowed to read raw data and meet specific qualifications.
  • Databricks Libraries: Users can register their Databricks Libraries with Immuta as trusted libraries, allowing Databricks cluster administrators to avoid Immuta security manager errors when using third-party libraries.
  • External Metastores: Immuta supports the use of external metastores in local or remote mode.
  • Spark Direct File Reads: In addition to supporting direct file reads through workspace and scratch paths, Immuta allows direct file reads in Spark for file paths.

Workspaces

Users can have additional write access in their integration using project workspaces. Users can integrate a single or multiple workspaces with a single Immuta instance. For more details, see the Databricks Project Workspaces page.

Tag Ingestion

The Immuta Databricks integration cannot ingest tags from Databricks, but you can connect any of these supported external catalogs to work with your integration.

User Impersonation

Native impersonation allows users to natively query data as another Immuta user. To enable native user impersonation, see the Integration User Impersonation page.

Native Query Audit

Audit Limitations

Immuta will audit queries that come from interactive notebooks, notebook jobs, and JDBC connections, but will not audit Scala or R submit jobs. Futhermore, Immuta only audits Spark jobs that are associated with Immuta tables. Consequently, Immuta will not audit a query in a notebook cell that does not trigger a Spark job, unless immuta.spark.audit.all.queries is set to true; for more details about this configuration and auditing all queries in Databricks, see Limited Enforcement in Databricks.

Capturing the code or query that triggers the Spark plan makes audit records more useful in assessing what users are doing.

To audit the code or query that triggers the Spark plan, Immuta hooks into Databricks where notebook cells and JDBC queries execute and saves the cell or query text. Then, Immuta pulls this information into the audits of the resulting Spark jobs. Examples of a saved cell/query and the resulting audit record are provided on the Databricks JDBC and Notebook Cell Query Audit Logs page.

Multiple Databricks Instances

A user can configure multiple integrations of Databricks to a single Immuta instance and use them dynamically or with workspaces.