Troubleshooting
This page provides guidelines for troubleshooting issues with the Databricks Spark integration and resolving Py4J security and Databricks trusted library errors.
Debugging the integration
For easier debugging of the Databricks Spark integration, follow the recommendations below.
Enable cluster init script logging:
In the cluster page in Databricks for the target cluster, navigate to Advanced Options -> Logging.
Change the Destination from
NONE
toDBFS
and change the path to the desired output location. Note: The unique cluster ID will be added onto the end of the provided path.
View the Spark UI on your target Databricks cluster: On the cluster page, click the Spark UI tab, which shows the Spark application UI for the cluster. If you encounter issues creating Databricks data sources in Immuta, you can also view the JDBC/ODBC Server portion of the Spark UI to see the result of queries that have been sent from Immuta to Databricks.
Using the validation and debugging notebook
The validation and debugging notebook is designed to be used by or under the guidance of an Immuta support professional. Reach out to your Immuta representative for assistance.
Import the notebook into a Databricks workspace by navigating to Home in your Databricks instance.
Click the arrow next to your name and select Import.
Once you have executed commands in the notebook and populated it with debugging information, export the notebook and its contents by opening the File menu, selecting Export, and then selecting DBC Archive.
Py4J security error
Error Message:
py4j.security.Py4JSecurityException: Constructor <> is not allowlisted
Explanation: This error indicates you are being blocked by Py4J security rather than the Immuta Security Manager. Py4J security is strict and generally ends up blocking many ML libraries.
Solution: Turn off Py4J security on the offending cluster by setting
IMMUTA_SPARK_DATABRICKS_PY4J_STRICT_ENABLED=false
in the environment variables section. Additionally, because there are limitations to the security mechanisms Immuta employs on-cluster when Py4J security is disabled, ensure that all users on the cluster have the same level of access to data, as users could theoretically see (policy-enforced) data that other users have queried.
Databricks trusted library errors
Check the driver logs for details. Some possible causes of failure include
One of the Immuta-configured trusted library URIs does not point to a Databricks library. Check that you have configured the correct URI for the Databricks library.
For trusted Maven artifacts, the URI must follow this format:
maven:/group.id:artifact-id:version
.Databricks failed to install a library. Any Databricks library installation errors will appear in the Databricks UI under the Libraries tab.
Last updated
Was this helpful?