Databricks Security Configuration for Performance
Audience: System Administrators
Content Summary: This page describes how the Security Manager is disabled for Databricks clusters that do not allow R or Scala code to be executed. Databricks Administrators should place the desired configuration in the Spark environment variables (recommended) or
Automatic Disabling of the Security Manager
The Immuta Security Manager is an essential element of the Databricks deployment that ensures users can't perform unauthorized actions when using Scala and R, since those languages have features that allow users to circumvent policies without the Security Manager enabled. However, the Security Manager must inspect the call stack every time a permission check is triggered, which adds overhead to queries. To improve Immuta's query performance on Databricks, Immuta disables the Security Manager when Scala and R are not being used.
The cluster init script checks the cluster’s configuration and automatically removes the Security Manager configuration when
spark.databricks.repl.allowedlanguagesis a subset of
When the cluster is configured this way, Immuta can rely on Databricks' process isolation and Py4J security to prevent user code from performing unauthorized actions.
Note: Immuta still expects the
to be set and pointing at the Security Manager.
Beyond disabling the Security Manager, Immuta will skip several startup tasks that are required to secure the cluster when Scala and R are configured, and fewer permission checks will occur on the Driver and Executors in the Databricks cluster, reducing overhead and improving performance.
There are still cases that require the Security Manager; in those instances, Immuta creates a fallback Security Manager to check the code path, so the
IMMUTA_INIT_ALLOWED_CALLING_CLASSES_URIenvironment variable must always point to a valid calling class file.
dbutils.fsis blocked by their
PY4Jsecurity; therefore, it can’t be used to access scratch paths.