Databricks Spark
Was this helpful?
Was this helpful?
This integration enforces policies on Databricks tables registered as data sources in Immuta, allowing users to query policy-enforced data on Databricks clusters (including job clusters). Immuta policies are applied to the plan that Spark builds for users' queries, all executed directly against Databricks tables.
The guides in this section outline how to integrate Databricks with Immuta to gain value from all three Immuta modules: , , and .
: Configure the Databricks Spark integration.
: Access DBFS in Databricks for non-sensitive data.
: Allow Immuta users to access tables that are not protected by Immuta.
: Hide the Immuta database from users in Databricks, since user queries do not need to reference it.
: Run R and Scala spark-submit
jobs on your Databricks cluster.
: Raise the caching on-cluster and lower the cache timeouts for the Immuta web service to allow use of project UDFs in Spark jobs.
: Use an existing Hive external metastore instead of the built-in metastore.
: This guide describes the design and components of the integration.
Configuration settings: These guides describe various integration settings that can be configured, including , cluster policies, and .
: This guide describes Immuta's support of Databricks change data feed.
: The trusted libraries feature allows Databricks cluster administrators to avoid Immuta security manager errors when using third-party libraries. This guide describes the feature and its configuration.
: When using Delta Lake, the API does not go through the normal Spark execution path. This means that Immuta's Spark extensions do not provide protection for the API. To solve this issue and ensure that Immuta has control over what a user can access, the Delta Lake API is blocked. This reference guide outlines the Spark SQL options that can be substituted for the Delta Lake API.
: Immuta allows direct file reads in Spark for file paths. This guide describes that process.