Troubleshooting

This page provides guidelines for troubleshooting issues with the Databricks Spark integration and resolving Py4J security and Databricks trusted library errors.

Debugging the integration

For easier debugging of the Databricks Spark integration, follow the recommendations below.

  • Enable cluster init script logging:

    • In the cluster page in Databricks for the target cluster, navigate to Advanced Options -> Logging.

    • Change the Destination from NONE to DBFS and change the path to the desired output location. Note: The unique cluster ID will be added onto the end of the provided path.

  • View the Spark UI on your target Databricks cluster: On the cluster page, click the Spark UI tab, which shows the Spark application UI for the cluster. If you encounter issues creating Databricks data sources in Immuta, you can also view the JDBC/ODBC Server portion of the Spark UI to see the result of queries that have been sent from Immuta to Databricks.

Using the validation and debugging notebook

The validation and debugging notebook is designed to be used by or under the guidance of an Immuta support professional. Reach out to your Immuta representative for assistance.

  1. Import the notebook into a Databricks workspace by navigating to Home in your Databricks instance.

  2. Click the arrow next to your name and select Import.

  3. Once you have executed commands in the notebook and populated it with debugging information, export the notebook and its contents by opening the File menu, selecting Export, and then selecting DBC Archive.

Py4J security error

  • Error Message: py4j.security.Py4JSecurityException: Constructor <> is not allowlisted

  • Explanation: This error indicates you are being blocked by Py4J security rather than the Immuta Security Manager. Py4J security is strict and generally ends up blocking many ML libraries.

  • Solution: Turn off Py4J security on the offending cluster by setting IMMUTA_SPARK_DATABRICKS_PY4J_STRICT_ENABLED=false in the environment variables section. Additionally, because there are limitations to the security mechanisms Immuta employs on-cluster when Py4J security is disabled, ensure that all users on the cluster have the same level of access to data, as users could theoretically see (policy-enforced) data that other users have queried.

Databricks trusted library errors

Check the driver logs for details. Some possible causes of failure include

  • One of the Immuta-configured trusted library URIs does not point to a Databricks library. Check that you have configured the correct URI for the Databricks library.

  • For trusted Maven artifacts, the URI must follow this format: maven:/group.id:artifact-id:version.

  • Databricks failed to install a library. Any Databricks library installation errors will appear in the Databricks UI under the Libraries tab.

Last updated

Was this helpful?