# Configure a Databricks Spark Integration

## Permissions

* `APPLICATION_ADMIN` Immuta permission
* `CAN MANAGE` Databricks privilege on the cluster

## Requirements

* A Databricks workspace with the Premium tier, which includes cluster policies (required to configure the Spark integration)
* A cluster that uses one of these supported Databricks Runtimes:
  * 11.3 LTS
  * 14.3 LTS
* Supported languages
  * Python
  * R (not supported for Databricks Runtime 14.3 LTS)
  * Scala (not supported for Databricks Runtime 14.3 LTS)
  * SQL
* A Databricks cluster that is one of these supported compute types:
  * [All-purpose compute](https://docs.databricks.com/en/compute/index.html#types-of-compute)
  * [Job compute](https://docs.databricks.com/en/compute/index.html#types-of-compute)
* Custom access mode
* A Databricks workspace and cluster with the ability to directly make HTTP calls to the Immuta web service. The Immuta web service also must be able to connect to and perform queries on the Databricks cluster, and to call [Databricks workspace APIs](https://docs.databricks.com/api/workspace/introduction).

## Prerequisites

* Enable [OAuth M2M authentication ](https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html)(recommended) or [​personal access tokens](https://docs.databricks.com/administration-guide/access-control/tokens.html).
* [Disable Photon](https://docs.databricks.com/en/compute/photon.html#configure-photon-enablement) by setting `runtime_engine` to `STANDARD` using the [Clusters API](https://docs.databricks.com/api/workspace/clusters). Immuta does not support clusters with Photon enabled. Photon is enabled by default on compute running Databricks Runtime 9.1 LTS or newer and must be manually disabled before setting up the integration with Immuta.
* Restrict the set of Databricks principals who have `CAN MANAGE` [privileges on Databricks clusters](https://docs.databricks.com/en/compute/clusters-manage.html#compute-permissions) where the Spark plugin is installed. This is to prevent editing [environment variables or Spark configuration](https://docs.google.com/document/d/1QyXHVVD5wHVd3h28S7KkT-V3RDJbWqbtovphBhRSmZQ/edit#heading=h.8kwz75d2w1dv), editing cluster policies, or removing the Spark plugin from the cluster, all of which would cause the Spark plugin to stop working.
* If **Databricks Unity Catalog is enabled** in a Databricks workspace, you must use an Immuta cluster policy when you set up the Databricks Spark integration to create an Immuta-enabled cluster. See the [configure cluster policies](#configure-cluster-policies) section below for guidance.
* If **Databricks Unity Catalog is not enabled** in your Databricks workspace, you must disable Unity Catalog in your Immuta tenant before proceeding with your configuration of Databricks Spark:
  1. Navigate to the <i class="fa-gear">:gear:</i> **App Settings** page and click **Integration Settings**.
  2. Uncheck the **Enable Unity Catalog** checkbox.
  3. Click **Save**.

## Add the integration on the app settings page

1. Click the <i class="fa-gear">:gear:</i> **App Settings** icon in Immuta.
2. Navigate to **HDFS** > **System API Key** and click **Generate Key**.
3. Click **Save** and then **Confirm**. If you do not save and confirm, the system API key will not be saved.
4. Scroll to the **Integration Settings** section.
5. Click **+ Add Native Integration** and select **Databricks Spark Integration** from the dropdown menu.
6. Complete the **Hostname** field.
7. Enter a **Unique ID** for the integration. The unique ID is used to name cluster policies clearly, which is important when managing several Databricks Spark integrations. As cluster policies are workspace-scoped, but multiple integrations might be made in one workspace, this ID lets you distinguish between different sets of cluster policies.
8. Select the identity manager that should be used when mapping the current Spark user to their corresponding identity in Immuta from the **Immuta IAM** dropdown menu. This should be set to reflect the identity manager you use in Immuta (such as Entra ID or Okta).
9. Choose an **Access Model**. The **Protected until made available by policy** option [disallows reading and writing tables not protected by Immuta](/SaaS/configuration/integrations/databricks/databricks-spark/reference-guides/databricks/customizing-the-integration.md#protected-and-unprotected-tables), whereas the **Available until protected by policy** option allows it.

{% hint style="warning" %}
**Behavior change**

If a table is registered in Immuta and does not have a subscription policy applied to it, that data will be visible to users, even if the **Protected until made available by policy** setting is enabled.

If you have enabled this setting, author an "Allow individually selected users" [global subscription policy](/SaaS/govern/secure-your-data/authoring-policies-in-secure/section-contents/how-to-guides/subscription-policy-tutorial.md) that applies to all data sources.
{% endhint %}

10. Select the **Storage Access Type** from the dropdown menu.
11. Opt to add any **Additional Hadoop Configuration Files**.
12. Click **Add Native Integration,** and then click **Save** and **Confirm**. This will restart the application and save your Databricks Spark integration. (It is normal for this restart to take some time.)

The Databricks Spark integration will not do anything until your cluster policies are configured, so even though your integration is saved, continue to the next section to configure your cluster policies so the Spark plugin can manage authorization on the Databricks cluster.

## Configure cluster policies

1. Click **Configure Cluster Policies**.
2. Select one or more cluster policies in the matrix. Clusters running Immuta with Databricks Runtime 14.3 can only use Python and SQL. You can make changes to the policy by clicking **Additional Policy Changes** and editing the environment variables in the text field or by downloading it. See the [Spark environment variables reference guide](/SaaS/configuration/integrations/databricks/databricks-spark/reference-guides/databricks/configuration.md) for information about each variable and its default value. Some common settings are linked below:
   1. [Audit all queries](/SaaS/configuration/integrations/databricks/databricks-spark/reference-guides/databricks/configuration.md#immuta_spark_audit_all_queries)
   2. [Scratch paths](/SaaS/configuration/integrations/databricks/databricks-spark/reference-guides/databricks/configuration.md#immuta_spark_databricks_scratch_paths)
   3. [User impersonation](/SaaS/configuration/integrations/databricks/databricks-spark/reference-guides/databricks/configuration.md#immuta_spark_databricks_allowed_impersonation_users) (you can also [prevent users from changing impersonation in a session](/SaaS/configuration/integrations/databricks/databricks-spark/reference-guides/databricks/configuration.md#immuta_spark_databricks_single_impersonation_user))
3. Select your Databricks Runtime.
4. Use one of the two installation types described below to apply the policies to your cluster:
   * **Automatically push cluster policies:** This option allows you to automatically push the cluster policies to the configured Databricks workspace. This will overwrite any cluster policy templates previously applied to this workspace.
     1. Select the **Automatically Push Cluster Policies** radio button.
     2. Enter your **Admin Token**. This token must be for a user who has the [required Databricks privilege](#permissions). This will give Immuta temporary permission to push the cluster policies to the configured Databricks workspace and overwrite any cluster policy templates previously applied to the workspace.
     3. Click **Apply Policies**.
   * **Manually push cluster policies:** Enabling this option allows you to manually push the cluster policies and the init script to the configured Databricks workspace.
     1. Select the **Manually Push Cluster Policies** radio button.
     2. Click **Download Init Script** and set the **Immuta plugin init script** as a cluster-scoped init script in Databricks by following the [Databricks documentation](https://docs.databricks.com/en/init-scripts/cluster-scoped.html).
     3. Click **Download Policies**, and then [manually add this cluster policy to your Databricks](https://docs.databricks.com/aws/en/admin/clusters/policies) workspace.
        1. Ensure that the `init_scripts.0.workspace.destination` in the policy matches the file path to the init script you configured above.
        2. The Immuta cluster policy references Databricks Secrets for several of the sensitive fields. These secrets must be manually created if the cluster policy is not automatically pushed. Use Databricks API or CLI to push the proper secrets.
5. Click **Close**, and then click **Save** and **Confirm**.
6. Apply the cluster policy generated by Immuta to the cluster with the Spark plugin installed by following the [Databricks documentation](https://docs.databricks.com/api/workspace/clusters/create).

## Map users and grant them access to the cluster

1. [Map external user IDs from Databricks to Immuta](/SaaS/configuration/people/users-index/how-to-guides/external-user-mapping.md).
2. Give users the `Can Attach To` permission on the cluster.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.immuta.com/SaaS/configuration/integrations/databricks/databricks-spark/how-to-guides/simplified.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
