> For the complete documentation index, see [llms.txt](https://documentation.immuta.com/2024.2/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://documentation.immuta.com/2024.2/data-and-integrations/databricks-spark/how-to-guides/configuration.md).

# Configuration

{% hint style="info" %}
This page contains references to the term whitelist, which Immuta no longer uses. When the term is removed from the software, it will be removed from this page.
{% endhint %}

## Prerequisites

* Databricks instance: Premium tier workspace **and** [Cluster access control enabled](https://docs.databricks.com/security/access-control/cluster-acl.html)
* Databricks instance has network level access to Immuta tenant
* Permissions and access to download (outside Internet access) or transfer files to the host machine

**Recommended Databricks Workspace Configurations**:

* [Workspace access control enabled](https://docs.databricks.com/administration-guide/access-control/workspace-acl.html)
* [Personal access tokens enabled](https://docs.databricks.com/administration-guide/access-control/tokens.html)

*Note: Azure Databricks authenticates users with Microsoft Entra ID. Be sure to configure your Immuta tenant with an IAM that uses the same user ID as does Microsoft Entra ID. Immuta's Spark security plugin will look to match this user ID between the two systems. See this* [*Microsoft Entra ID page*](/2024.2/people/section-contents/how-to-guides/microsoft-entra-id.md) *for details.*

## Supported Databricks Runtime Versions

Use the table below to determine which version of Immuta supports your Databricks Runtime version:

| Databricks Runtime Version | Immuta Version     |
| -------------------------- | ------------------ |
| 11.3 LTS                   | 2023.1 and newer   |
| 10.4 LTS                   | 2022.2.x and newer |
| <p>7.3 LTS<br>9.1 LTS</p>  | 2021.5.x and newer |

## Supported Databricks Cluster Configurations

The table below outlines the integrations supported for various Databricks cluster configurations. For example, the only integration available to enforce policies on a cluster configured to run on Databricks Runtime 9.1 is the Databricks Spark integration.

| Example cluster | Databricks Runtime | Unity Catalog in Databricks | Databricks Spark integration       | Databricks Unity Catalog integration |
| --------------- | ------------------ | --------------------------- | ---------------------------------- | ------------------------------------ |
| Cluster 1       | 9.1                | Unavailable                 | :white\_check\_mark:               | Unavailable                          |
| Cluster 2       | 10.4               | Unavailable                 | :white\_check\_mark:               | Unavailable                          |
| Cluster 3       | 11.3               | :no\_entry:                 | :white\_check\_mark: / :no\_entry: | Unavailable                          |
| Cluster 4       | 11.3               | :white\_check\_mark:        | :no\_entry:                        | :no\_entry:                          |
| Cluster 5       | 11.3               | :white\_check\_mark:        | :white\_check\_mark:               | :white\_check\_mark:                 |

**Legend**:

* :white\_check\_mark: The feature or integration is enabled.
* :no\_entry: The feature or integration is disabled.

## Supported Access Mode and Languages

Immuta supports the Custom access mode.

* **Supported Languages**:
  * Python
  * SQL
  * *R (requires advanced configuration; work with your Immuta support professional to use R)*
  * *Scala (requires advanced configuration; work with your Immuta support professional to use Scala)*

## Databricks Installation Overview

{% hint style="info" %}
**Users who can read raw tables on-cluster**

* If a Databricks Admin is tied to an Immuta account, they will have the ability to read raw tables on-cluster.
* If a Databricks user is listed as an "ignored" user, they will have the ability to read raw tables on-cluster. Users can be added to the `immuta.spark.acl.whitelist` configuration to become ignored users.
  {% endhint %}

The Immuta Databricks integration injects an Immuta plugin into the SparkSQL stack at cluster startup. The Immuta plugin creates an "immuta" database that is available for querying and intercepts all queries executed against it. For these queries, policy determinations will be obtained from the connected Immuta tenant and applied before returning the results to the user.

The Databricks cluster init script provided by Immuta downloads the Immuta artifacts onto the target cluster and puts them in the appropriate locations on local disk for use by Spark. Once the init script runs, the Spark application running on the Databricks cluster will have the appropriate artifacts on its CLASSPATH to use Immuta for policy enforcement.

The cluster init script uses environment variables in order to

* Determine the location of the required artifacts for downloading.
* Authenticate with the service/storage containing the artifacts.

*Note: Each target system/storage layer (HTTPS, for example) can only have one set of environment variables, so the cluster init script assumes that any artifact retrieved from that system uses the same environment variables.*

### Limitations

See the [Databricks Pre-Configuration Details page](/2024.2/data-and-integrations/databricks-spark/reference-guides/pre-configuration.md#limitation) for known limitations.

## Installation Methods

There are two installation options for Databricks. Click a link below to navigate to a tutorial for your chosen method:

* [Simplified Configuration](/2024.2/data-and-integrations/databricks-spark/how-to-guides/configuration/simplified.md): The steps to enable the integration with this method include
  1. Adding the integration on the App Settings page.
  2. Downloading or automatically pushing cluster policies to your Databricks workspace.
  3. Creating or restarting your cluster.
* [Manual Configuration](/2024.2/data-and-integrations/databricks-spark/how-to-guides/configuration/manual.md): The steps to enable the integration with this method include
  1. Downloading and configuring Immuta artifacts.
  2. Staging Immuta artifacts somewhere the cluster can read from during its startup procedures.
  3. Protecting Immuta environment variables with Databricks Secrets.
  4. Creating and configuring the cluster to start with the init script and load Immuta into its SparkSQL environment.

## Debugging Immuta Installation Issues

For easier debugging of the Immuta Databricks installation, enable cluster init script logging. In the cluster page in Databricks for the target cluster, under **Advanced Options** -> **Logging**, change the **Destination** from `NONE` to `DBFS` and change the path to the desired output location. *Note: The unique cluster ID will be added onto the end of the provided path.*

For debugging issues between the Immuta web service and Databricks, you can view the Spark UI on your target Databricks cluster. On the cluster page, click the **Spark UI** tab, which shows the Spark application UI for the cluster. If you encounter issues creating Databricks data sources in Immuta, you can also view the **JDBC/ODBC Server** portion of the Spark UI to see the result of queries that have been sent from Immuta to Databricks.

### Using the Validation and Debugging Notebook

The Validation and Debugging Notebook (`immuta-validation.ipynb`) is packaged with other Databricks release artifacts (for manual installations), or it can be downloaded from the App Settings page when configuring Databricks through the Immuta UI. This notebook is designed to be used by or under the guidance of an Immuta Support Professional.

1. Import the notebook into a Databricks workspace by navigating to **Home** in your Databricks instance.
2. Click the **arrow** next to your name and select **Import**.
3. Once you have executed commands in the notebook and populated it with debugging information, export the notebook and its contents by opening the **File** menu, selecting **Export**, and then selecting **DBC Archive**.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://documentation.immuta.com/2024.2/data-and-integrations/databricks-spark/how-to-guides/configuration.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.