Databricks Spark

This integration enforces policies on Databricks tables registered as data sources in Immuta, allowing users to query policy-enforced data on Databricks clusters (including job clusters). Immuta policies are applied to the plan that Spark builds for users' queries, all executed directly against Databricks tables.

The guides in this section outline how to integrate Databricks Spark with Immuta.

How-to guides

Reference guides

  • Databricks Spark integration reference guide: This guide describes the design and components of the integration.

  • Configuration settings: These guides describe various integration settings that can be configured, including environment variables, cluster policies, and performance.

  • Databricks change data feed: This guide describes Immuta's support of Databricks change data feed.

  • Databricks libraries: The trusted libraries feature allows Databricks cluster administrators to avoid Immuta security manager errors when using third-party libraries. This guide describes the feature and its configuration.

  • Delta Lake API: When using Delta Lake, the API does not go through the normal Spark execution path. This means that Immuta's Spark extensions do not provide protection for the API. To solve this issue and ensure that Immuta has control over what a user can access, the Delta Lake API is blocked. This reference guide outlines the Spark SQL options that can be substituted for the Delta Lake API.

  • Spark direct file reads: Immuta allows direct file reads in Spark for file paths. This guide describes that process.

Last updated

Self-managed versions

2024.32024.22024.1

Copyright © 2014-2024 Immuta Inc. All rights reserved.