Databricks Libraries Introduction
Audience: Databricks Administrators
Content Summary: This page provides an overview of Immuta's Databricks Trusted Libraries feature and support of Notebook-Scoped Libraries on Machine Learning Clusters.
Databricks Libraries and Immuta's Security Manager
The Immuta security manager blocks users from executing code that could allow them to gain access to sensitive data by only allowing select code paths to access sensitive files and methods. These select code paths provide Immuta's code access to sensitive resources while blocking end users from these sensitive resources directly.
Similarly, when users install third-party libraries those libraries will be denied access to sensitive resources by default. However, cluster administrators can specify which of the installed Databricks libraries should be trusted by Immuta.
Databricks Trusted Libraries
The trusted libraries feature allows Databricks cluster administrators to avoid Immuta security manager errors when using third-party libraries. An administrator can specify an installed library as "trusted," which will enable that library's code to bypass the Immuta security manager. Contact your Immuta support professional for custom security configurations for your libraries.
This feature does not impact Immuta's ability to apply policies; trusting a library only allows code through what previously would have been blocked by the security manager.
Using this feature could create a security vulnerability, depending on the third-party library. For example, if
a library exposes a public method named
readProtectedFile that displays the contents of a sensitive file, then
trusting that library would allow end users access to that file. Work with your Immuta support professional to
determine if the risk does not apply to your environment or use case.
Databricks Libraries API
Installing trusted libraries outside of the Databricks Libraries API (e.g.,
ADD JAR ...) is not supported.
The following types of libraries are supported when installing a third-party library using the Databricks UI or the Databricks Libraries API:
Databricks installs libraries right after a cluster has started, but there is no guarantee that library installation will complete before a user's code is executed. If a user executes code before a trusted library installation has completed, Immuta will not be able to identify the library as trusted. This can be solved by either
- waiting for library installation to complete before running any third-party library commands or
- executing a Spark query. This will force Immuta to wait for any trusted Immuta libraries to complete installation before proceeding.
When installing a library using Maven as a library source, Databricks will also install any transitive dependencies for the library. However, those transitive dependencies are installed behind the scenes and will not appear as installed libraries in either the Databricks UI or using the Databricks Libraries API. Only libraries specifically listed in the
IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URISenvironment variable will be trusted by Immuta, which does not include installed transitive dependencies. This effectively means that any code paths that include a class from a transitive dependency but do not include a class from a trusted third-party library can still be blocked by the Immuta security manager. For example, if a user installs a trusted third-party library that has a transitive dependency of a
file-utillibrary, the user will not be able to directly use the
file-utillibrary to read a sensitive file that is normally protected by the Immuta security manager.
In many cases, it is not a problem if dependent libraries aren't trusted because code paths where the trusted library calls down into dependent libraries will still be trusted. However, if the dependent library needs to be trusted, there is a workaround:
Add the transitive dependency jar paths to the
IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URISenvironment variable. In the driver
log4jlogs, Databricks outputs the source jar locations when it installs transitive dependencies. In the cluster driver logs, look for a log message similar to the following:
INFO LibraryDownloadManager: Downloaded library dbfs:/FileStore/jars/maven/org/slf4j/slf4j-api-1.7.25.jar as local file /local_disk0/tmp/addedFile8569165920223626894slf4j_api_1_7_25-784af.jar
In the above example, where
slf4jis the transitive dependency, you would add the path
IMMUTA_SPARK_DATABRICKS_TRUSTED_LIB_URISenvironment variable and restart your cluster.
In case of failure, check the driver logs for details. Some possible causes of failure include
- One of the Immuta configured trusted library URIs does not point to a Databricks library. Check that you have configured the correct URI for the Databricks library.
- For trusted Maven artifacts, the URI must follow this format:
- Databricks failed to install a library. Any Databricks library installation errors will appear in the Databricks UI under the Libraries tab.
For details about configuring trusted libraries, navigate to the installation guide.
Notebook-Scoped Libraries on Machine Learning Clusters
Users on Databricks runtimes 8+ can manage notebook-scoped libraries with
However, this functionality differs from Immuta's
trusted libraries feature, and Python libraries are still not supported
as trusted libraries. The Immuta Security Manager will deny the code of libraries installed with
%pip access to
No additional configuration is needed to enable this feature. Users only need to be running on clusters with DBR 8+.