This integration enforces policies on Databricks tables registered as data sources in Immuta, allowing users to query policy-enforced data on Databricks clusters (including job clusters). Immuta policies are applied to the plan that Spark builds for users' queries, all executed directly against Databricks tables.
- Databricks configuration: Configure the Databricks Spark integration.
- DBFS access: Access DBFS in Databricks for non-sensitive data.
- Limited enforcement in Databricks: Allow Immuta users to access tables that are not protected by Immuta.
- Hiding the Immuta database in Databricks: Hide the Immuta database from users in Databricks, since user queries do not need to reference it.
- Run spark-submit jobs on Databricks: Run R and Scala
spark-submitjobs on your Databricks cluster.
- Project UDFs cache settings: Raise the caching on-cluster and lower the cache timeouts for the Immuta web service to allow use of project UDFs in Spark jobs.
- External metastores: Use an existing Hive external metastore instead of the built-in metastore.
- Databricks Spark integration reference guide: This guide describes the design and components of the integration.
- Configuration settings: These guides describe various integration settings that can be configured, including environment variables, cluster policies, and performance.
- Databricks change data feed: This guide describes Immuta's support of Databricks change data feed.
- Databricks libraries: The trusted libraries feature allows Databricks cluster administrators to avoid Immuta security manager errors when using third-party libraries. This guide describes the feature and its configuration.
- Spark direct file reads: Immuta allows direct file reads in Spark for file paths. This guide describes that process.