Configure a Databricks Spark Integration

Permissions

APPLICATION_ADMIN Immuta permission
CAN MANAGE Databricks privilege on the cluster

Requirements

A Databricks workspace with the Premium tier, which includes cluster policies (required to configure the Spark integration)
A cluster that uses one of these supported Databricks Runtimes:
- 11.3 LTS
- 14.3 (private preview)
Supported languages
- Python
- R (not supported for Databricks Runtime 14.3)
- Scala (not supported for Databricks Runtime 14.3)
- SQL
A Databricks cluster that is one of these supported compute types:
- All-purpose compute
- Job compute
Custom access mode
A Databricks workspace and cluster with the ability to directly make HTTP calls to the Immuta web service. The Immuta web service also must be able to connect to and perform queries on the Databricks cluster, and to call Databricks workspace APIs.
The Databricks Spark integration only works with Spark 3.

Prerequisites

Enable OAuth M2M authentication (recommended) or personal access tokens.
Disable Photon by setting runtime_engine to STANDARD using the Clusters API. Immuta does not support clusters with Photon enabled. Photon is enabled by default on compute running Databricks Runtime 9.1 LTS or newer and must be manually disabled before setting up the integration with Immuta.
Restrict the set of Databricks principals who have CAN MANAGE privileges on Databricks clusters where the Spark plugin is installed. This is to prevent editing environment variables or Spark configuration, editing cluster policies, or removing the Spark plugin from the cluster, all of which would cause the Spark plugin to stop working.
If Databricks Unity Catalog is enabled in a Databricks workspace, you must use an Immuta cluster policy when you set up the Databricks Spark integration to create an Immuta-enabled cluster. See the configure cluster policies section below for guidance.
If Databricks Unity Catalog is not enabled in your Databricks workspace, you must disable Unity Catalog in your Immuta tenant before proceeding with your configuration of Databricks Spark:
1. Navigate to the App Settings page and click Integration Settings.
2. Uncheck the Enable Unity Catalog checkbox.
3. Click Save.

Add the integration on the app settings page

Click the App Settings icon in Immuta.
Navigate to HDFS > System API Key and click Generate Key.
Click Save and then Confirm. If you do not save and confirm, the system API key will not be saved.
Scroll to the Integration Settings section.
Click + Add Native Integration and select Databricks Spark Integration from the dropdown menu.
Complete the Hostname field.
Enter a Unique ID for the integration. The unique ID is used to name cluster policies clearly, which is important when managing several Databricks Spark integrations. As cluster policies are workspace-scoped, but multiple integrations might be made in one workspace, this ID lets you distinguish between different sets of cluster policies.
Select the identity manager that should be used when mapping the current Spark user to their corresponding identity in Immuta from the Immuta IAM dropdown menu. This should be set to reflect the identity manager you use in Immuta (such as Entra ID or Okta).
Choose an Access Model. The Protected until made available by policy option disallows reading and writing tables not protected by Immuta, whereas the Available until protected by policy option allows it.
Select the Storage Access Type from the dropdown menu.
Opt to add any Additional Hadoop Configuration Files.
Click Add Native Integration, and then click Save and Confirm. This will restart the application and save your Databricks Spark integration. (It is normal for this restart to take some time.)

The Databricks Spark integration will not do anything until your cluster policies are configured, so even though your integration is saved, continue to the next section to configure your cluster policies so the Spark plugin can manage authorization on the Databricks cluster.

Configure cluster policies

Click Configure Cluster Policies.
Select one or more cluster policies in the matrix. Clusters running Immuta with Databricks Runtime 14.3 can only use Python and SQL. You can make changes to the policy by clicking Additional Policy Changes and editing the environment variables in the text field or by downloading it. See the Spark environment variables reference guide for information about each variable and its default value. Some common settings are linked below:
1. Audit all queries
2. Scratch paths
3. User impersonation (you can also prevent users from changing impersonation in a session)
Select your Databricks Runtime.
Use one of the two installation types described below to apply the policies to your cluster:
- Automatically push cluster policies: This option allows you to automatically push the cluster policies to the configured Databricks workspace. This will overwrite any cluster policy templates previously applied to this workspace.
  1. Select the Automatically Push Cluster Policies radio button.
  2. Enter your Admin Token. This token must be for a user who has the required Databricks privilege. This will give Immuta temporary permission to push the cluster policies to the configured Databricks workspace and overwrite any cluster policy templates previously applied to the workspace.
  3. Click Apply Policies.
- Manually push cluster policies: Enabling this option allows you to manually push the cluster policies and the init script to the configured Databricks workspace.
  1. Select the Manually Push Cluster Policies radio button.
  2. Click Download Init Script and set the Immuta plugin init script as a cluster-scoped init script in Databricks by following the Databricks documentation.
  3. Click Download Policies, and then manually add this cluster policy to your Databricks workspace.
    Ensure that the init_scripts.0.workspace.destination in the policy matches the file path to the init script you configured above.
    The Immuta cluster policy references Databricks Secrets for several of the sensitive fields. These secrets must be manually created if the cluster policy is not automatically pushed. Use Databricks API or CLI to push the proper secrets.\
Click Close, and then click Save and Confirm.
Apply the cluster policy generated by Immuta to the cluster with the Spark plugin installed by following the Databricks documentation.

Map users and grant them access to the cluster

Map external user IDs from Databricks to Immuta.
Give users the Can Attach To permission on the cluster.

PreviousHow-to Guides NextManually Update Your Databricks Cluster

Last updated 2 months ago

Was this helpful?