Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Py4j security disabled: In addition to support for Python, SQL, and R, this configuration adds support for additional Python libraries and utilities by disabling Databricks-native Py4j security.
This configuration does not rely on Databricks-native Py4j security to secure the cluster, while process isolation is still enabled to secure filesystem and network access from within Python processes. On an Immuta-enabled cluster, once Py4J security is disabled the Immuta SecurityManager is installed to prevent nefarious actions from Python in the JVM. Disabling Py4J security also allows for expanded Python library support, including many Python ML classes (such as LogisticRegression
, StringIndexer
, and DecisionTreeClassifier
) and dbutils.fs.
By default, all actions in R will execute as the root user. Among other things, this permits access to the entire filesystem (including sensitive configuration data). And without iptable restrictions, a user may freely access the cluster’s cloud storage credentials. To properly support the use of the R language, Immuta’s initialization script wraps the R and Rscript binaries to launch each command as a temporary, non-privileged user. This user has limited filesystem and network access. The Immuta SecurityManager is also installed to prevent users from bypassing policies and protects against the above vulnerabilities from within the JVM.
The SecurityManager will incur a small increase in performance overhead; average latency will vary depending on whether the cluster is homogeneous or heterogeneous. (In homogeneous clusters, all users are at the same level of groups/authorizations; this is enforced externally, rather than directly by Immuta.)
When users install third-party Java/Scala libraries, they will be denied access to sensitive resources by default. However, cluster administrators can specify which of the installed Databricks libraries should be trusted by Immuta.
A homogeneous cluster is recommended for configurations where Py4J security is disabled. If all users have the same level of authorization, there would not be any data leakage, even if a nefarious action was taken.
For full details on Databricks’ best practices in configuring clusters, please read their governance documentation.
The table below outlines the integrations supported for various Databricks cluster configurations. For example, the only integration available to enforce policies on a cluster configured to run on Databricks Runtime 9.1 is the Databricks Spark integration.
Legend:
Databricks instance has network level access to Immuta tenant
Permissions and access to download (outside Internet access) or transfer files to the host machine
Recommended Databricks Workspace Configurations:
Immuta supports the Custom access mode.
Supported Languages:
Python
SQL
R (requires advanced configuration; work with your Immuta support professional to use R)
Scala (requires advanced configuration; work with your Immuta support professional to use Scala)
The Immuta Databricks integration supports the following Databricks features:
Audit limitations
Capturing the code or query that triggers the Spark plan makes audit records more useful in assessing what users are doing.
A user can configure multiple integrations of Databricks to a single Immuta tenant and use them dynamically or with workspaces.
Immuta does not support Databricks clusters with Photon acceleration enabled.
This page describes the Databricks integration, configuration options, and features. See the for a tutorial on enabling Databricks and these features through the App Settings page.
Example cluster | Databricks Runtime | Unity Catalog in Databricks | Databricks Spark integration | Databricks Unity Catalog integration |
---|
The feature or integration is enabled.
The feature or integration is disabled.
Databricks instance: Premium tier workspace and
Access to
Note: Azure Databricks authenticates users with Microsoft Entra ID. Be sure to configure your Immuta tenant with an IAM that uses the same user ID as does Microsoft Entra ID. Immuta's Spark security plugin will look to match this user ID between the two systems. See this for details.
See for a list of Databricks Runtimes Immuta supports.
: Databricks users can see the on queried tables if they are allowed to read raw data and meet specific qualifications.
: Users can register their Databricks Libraries with Immuta as trusted libraries, allowing Databricks cluster administrators to avoid Immuta security manager errors when using third-party libraries.
: Immuta supports the use of external metastores in local or remote mode.
: In addition to supporting direct file reads through workspace and scratch paths, Immuta allows direct file reads in Spark for file paths.
Users can have additional write access in their integration using project workspaces. Users can integrate a single or multiple workspaces with a single Immuta tenant. For more details, see the page.
The Immuta Databricks integration cannot ingest tags from Databricks, but you can connect any of these to work with your integration.
Native impersonation allows users to natively query data as another Immuta user. To enable native user impersonation, see the page.
Immuta will audit queries that come from interactive notebooks, notebook jobs, and JDBC connections, but will not audit . Furthermore, Immuta only audits Spark jobs that are associated with Immuta tables. Consequently, Immuta will not audit a query in a notebook cell that does not trigger a Spark job, unless immuta.spark.audit.all.queries
is set to true
; for more details about this configuration and auditing all queries in Databricks, see .
To audit the code or query that triggers the Spark plan, Immuta hooks into Databricks where notebook cells and JDBC queries execute and saves the cell or query text. Then, Immuta pulls this information into the audits of the resulting Spark jobs. Examples of a saved cell/query and the resulting audit record are provided on the page.