Databricks Spark

This integration enforces policies on Databricks securables registered in the legacy Hive metastore. Once these securables are registered as Immuta data sources, users can query policy-enforced data on Databricks clusters.

The guides in this section outline how to integrate Databricks Spark with Immuta.

Getting started

This getting started guide outlines how to integrate Databricks with Immuta.

How-to guides

Configure a Databricks Spark integration
Manually update your Databricks cluster: Manually update your cluster to reflect changes in the Immuta init script or cluster policies.
Install a trusted library: Register a Databricks library with Immuta as a trusted library to avoid Immuta Security Manager errors when using third-party libraries.
Project UDFs cache settings: Raise the caching on-cluster and lower the cache timeouts for the Immuta web service to allow use of project UDFs in Spark jobs.
Run R and Scala spark-submit jobs on Databricks: Run R and Scala spark-submit jobs on your Databricks cluster.
DBFS access: Access DBFS in Databricks for non-sensitive data.
Troubleshooting: Resolve errors in the Databricks Spark configuration.

Reference guides

Databricks Spark integration configuration: This guide describes the design and components of the integration.
Security and compliance: This guide provides an overview of the Immuta features that provide security for your users and Databricks clusters and that allow you to prove compliance and monitor for anomalies.
Registering and protecting data: This guide provides an overview of registering Databricks securables and protecting them with Immuta policies.
Accessing data: This guide provides an overview of how Databricks users access data registered in Immuta.

Last updated 7 months ago

Was this helpful?