# Spark Environment Variables

This page outlines configuration details for Immuta-enabled Databricks clusters. Databricks administrators should place the desired configuration in the Spark environment variables.

## IMMUTA\_INIT\_ADDITIONAL\_CONF\_URI

If you add additional Hadoop configuration during the integration setup, this variable sets the path to that file.

The additional Hadoop configuration is where sensitive configuration goes for remote filesystems (if you are using a secret key pair to access S3, for example).

## IMMUTA\_EPHEMERAL\_HOST\_OVERRIDE

**Default value**: `true`

Set this to `false` if ephemeral overrides should not be enabled for Spark. When `true`, this will automatically override ephemeral data source httpPaths with the httpPath of the Databricks cluster running the user's Spark application.

## IMMUTA\_EPHEMERAL\_HOST\_OVERRIDE\_HTTPPATH

This configuration item can be used if automatic detection of the Databricks httpPath should be disabled in favor of a static path to use for ephemeral overrides.

## IMMUTA\_EPHEMERAL\_TABLE\_PATH\_CHECK\_ENABLED

**Default value**: `true`

When querying Immuta data sources in Spark, the metadata from the Metastore is compared to the metadata for the target source in Immuta to validate that the source being queried exists and is queryable on the current cluster. This check typically validates that the target (database, table) pair exists in the Metastore and that the table’s underlying location matches what is in Immuta. This configuration can be used to disable location checking if that location is dynamic or changes over time. *Note: This may lead to undefined behavior if the same table names exist in multiple workspaces but do not correspond to the same underlying data.*

## IMMUTA\_INIT\_ALLOWED\_CALLING\_CLASSES\_URI

A URI that points to a valid calling class file, which is an Immuta artifact you download during the [Databricks Spark configuration](/latest/configuration/integrations/databricks/databricks-spark/how-to-guides/configuration.md) process.

## IMMUTA\_SPARK\_ACL\_ALLOWLIST

This is a comma-separated list of Databricks users who can access any table or view in the cluster metastore without restriction.

## IMMUTA\_SPARK\_ACL\_PRIVILEGED\_TIMEOUT\_SECONDS

**Default value**: `3600`

The number of seconds to cache privileged user status for the Immuta ACL. A privileged Databricks user is an admin or is allowlisted in `IMMUTA_SPARK_ACL_ALLOWLIST`.

## IMMUTA\_SPARK\_AUDIT\_ALL\_QUERIES

**Default value**: `false`

Enables auditing all queries run on a Databricks cluster, regardless of whether users touch Immuta-protected data or not.

## IMMUTA\_SPARK\_DATABRICKS\_ALLOW\_NON\_IMMUTA\_READS

**Default value**: `false`

Allows non-privileged users to `SELECT` from tables that are not protected by Immuta. See the [Customizing the integration guide](/latest/configuration/integrations/databricks/databricks-spark/reference-guides/databricks/customizing-the-integration.md#protected-and-unprotected-tables) for details about this feature.

## IMMUTA\_SPARK\_DATABRICKS\_ALLOW\_NON\_IMMUTA\_WRITES

**Default value**: `false`

Allows non-privileged users to run DDL commands and data-modifying commands against tables or spaces that are not protected by Immuta. See the [Customizing the integration guide](/latest/configuration/integrations/databricks/databricks-spark/reference-guides/databricks/customizing-the-integration.md#protected-and-unprotected-tables) for details about this feature.

## IMMUTA\_SPARK\_DATABRICKS\_ALLOWED\_IMPERSONATION\_USERS

This is a comma-separated list of Databricks users who are allowed to impersonate Immuta users:

```json
"spark_env_vars.IMMUTA_SPARK_DATABRICKS_ALLOWED_IMPERSONATION_USERS": {
  "type": "fixed",
  "value": "edixon@example.com,dakota@example.com"
}
```

## IMMUTA\_SPARK\_DATABRICKS\_DBFS\_MOUNT\_ENABLED

**Default value**: `false`

Exposes the DBFS FUSE mount located at `/dbfs`. Granular permissions are not possible, so all users will have read/write access to all objects therein. *Note: Raw, unfiltered source data should never be stored in DBFS.*

## IMMUTA\_SPARK\_DATABRICKS\_DISABLED\_UDFS

Block one or more Immuta [user-defined functions (UDFs)](/latest/configuration/integrations/databricks/databricks-spark/how-to-guides/project-udfs.md) from being used on an Immuta cluster. This should be a Java regular expression that matches the set of UDFs to block by name (excluding the `immuta` database). For example to block all project UDFs, you may configure this to be `^.*_projects?$`. For a list of functions, see the [project UDFs page](/latest/governance/author-policies-for-data-access-control/projects-and-purpose-based-access-control/writing-to-projects/reference-guides/project-udfs.md#available-functions).

## IMMUTA\_SPARK\_DATABRICKS\_JAR\_URI

**Default value**: `file:///databricks/jars/immuta-spark-hive.jar`

The location of `immuta-spark-hive.jar` on the filesystem for Databricks. This should not need to change unless a custom initialization script that places immuta-spark-hive in a non-standard location is necessary.

## IMMUTA\_SPARK\_DATABRICKS\_LOCAL\_SCRATCH\_DIR\_ENABLED

**Default value**: `true`

Creates a world-readable or writable scratch directory on local disk to facilitate the use of `dbutils` and 3rd party libraries that may write to local disk. Its location is non-configurable and is stored in the environment variable `IMMUTA_LOCAL_SCRATCH_DIR`. Note: Sensitive data should not be stored at this location.

## IMMUTA\_SPARK\_DATABRICKS\_LOG\_LEVEL

**Default value**: `INFO`

The SLF4J log level to apply to Immuta's Spark plugins.

## IMMUTA\_SPARK\_DATABRICKS\_LOG\_STDOUT\_ENABLED

**Default value**: `false`

If true, writes logging output to stdout/the console as well as the `log4j-active.txt` file (default in Databricks).

## IMMUTA\_SPARK\_DATABRICKS\_SCRATCH\_DATABASE

This configuration is a comma-separated list of additional databases that will appear as scratch databases when running a `SHOW DATABASE` query. This configuration increases performance by circumventing the Metastore to get the metadata for all the databases to determine what to display for a `SHOW DATABASE` query; it won't affect access to the scratch databases. Instead, use [`IMMUTA_SPARK_DATABRICKS_SCRATCH_PATHS`](#immuta_spark_databricks_scratch_paths) to control read and write access to the underlying database paths.

Additionally, this configuration will only display the scratch databases that are configured and will not validate that the configured databases exist in the Metastore. Therefore, it is up to the Databricks administrator to properly set this value and keep it current.

## IMMUTA\_SPARK\_DATABRICKS\_SCRATCH\_PATHS

Comma-separated list of remote paths that Databricks users are allowed to directly read/write. These paths amount to unprotected "scratch spaces." You can create a scratch database by configuring its specified location (or configure `dbfs:/user/hive/warehouse/<db_name>.db` for the default location).

To create a scratch path to a location or a database stored at that location, configure

```bash
IMMUTA_SPARK_DATABRICKS_SCRATCH_PATHS=s3://path/to/the/dir
```

To create a scratch path to a database created using the default location,

```bash
IMMUTA_SPARK_DATABRICKS_SCRATCH_PATHS=s3://path/to/the/dir,dbfs:/user/hive/warehouse/any_db_name.db</value>
```

## IMMUTA\_SPARK\_DATABRICKS\_SCRATCH\_PATHS\_CREATE\_DB\_ENABLED

**Default value**: `false`

Enables non-privileged users to create or drop scratch databases.

## IMMUTA\_SPARK\_DATABRICKS\_SINGLE\_IMPERSONATION\_USER

**Default value**: `false`

When `true`, this configuration prevents users from changing their impersonation user once it has been set for a given Spark session. This configuration should be set when the BI tool or other service allows users to submit arbitrary SQL or issue SET commands.

## IMMUTA\_SPARK\_DATABRICKS\_SUBMIT\_TAG\_JOB

**Default value**: `true`

Denotes whether the Spark job will be run that "tags" a Databricks cluster as being associated with Immuta.

## IMMUTA\_SPARK\_DATABRICKS\_TRUSTED\_LIB\_URIS

A comma-separated list of [Databricks trusted library](/latest/configuration/integrations/databricks/databricks-spark/how-to-guides/installation.md) URIs.

## IMMUTA\_SPARK\_NON\_IMMUTA\_TABLE\_CACHE\_SECONDS

**Default value**: `3600`

The number of seconds Immuta caches whether a table has been exposed as a data source in Immuta. This setting only applies when `IMMUTA_SPARK_DATABRICKS_ALLOW_NON_IMMUTA_WRITES` or `IMMUTA_SPARK_DATABRICKS_ALLOW_NON_IMMUTA_READS` is enabled.

## IMMUTA\_SPARK\_REQUIRE\_EQUALIZATION

**Default value**: `false`

Requires that users act through a single, equalized project. A cluster should be equalized if users need to run Scala jobs on it, and it should be limited to Scala jobs only via `spark.databricks.repl.allowedLanguages`.

## IMMUTA\_SPARK\_RESOLVE\_RAW\_TABLES\_ENABLED

**Default value**: `true`

Enables use of the underlying database and table name in queries against a table-backed Immuta data source. Administrators or allowlisted users can set `IMMUTA_SPARK_RESOLVE_RAW_TABLES_ENABLED` to `false` to bypass resolving raw databases or tables as Immuta data sources. This is useful if an admin wants to read raw data but is also an Immuta user. By default, data policies will be applied to a table even for an administrative user if that admin is also an Immuta user.

## IMMUTA\_SPARK\_SESSION\_RESOLVE\_RAW\_TABLES\_ENABLED

**Default value**: `true`

Same as the [IMMUTA\_SPARK\_RESOLVE\_RAW\_TABLES\_ENABLED](#immuta_spark_resolve_raw_tables_enabled) variable, but this is a session property that allows users to toggle this functionality. If users run `set immuta.spark.session.resolve.raw.tables.enabled=false`, they will see raw data only (not Immuta data policy-enforced data). *Note: This property is not set in `immuta_conf.xml`.*

## IMMUTA\_SPARK\_SHOW\_IMMUTA\_DATABASE

**Default value**: `true`

This shows the `immuta` database in the configured Databricks cluster. When set to `false` Immuta will no longer show this database when a `SHOW DATABASES` query is performed. However, queries can still be performed against tables in the `immuta` database using the Immuta-qualified table name (e.g., `immuta.my_schema_my_table`) regardless of whether or not this feature is enabled.

## IMMUTA\_SPARK\_VERSION\_VALIDATE\_ENABLED

**Default value**: `true`

Immuta checks the versions of its artifacts to verify that they are compatible with each other. When set to `true`, if versions are incompatible, that information will be logged to the Databricks driver logs and the cluster will not be usable. If a configuration file or the jar artifacts have been patched with a new version (and the artifacts are known to be compatible), this check can be set to `false` so that the versions don't get logged as incompatible and make the cluster unusable.

## IMMUTA\_USER\_MAPPING\_IAMID

**Default value**: `bim`

Denotes which IAM in Immuta should be used when mapping the current Spark user's username to a userid in Immuta. This defaults to Immuta's internal IAM (`bim`) but should be updated to reflect an actual production IAM.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.immuta.com/latest/configuration/integrations/databricks/databricks-spark/reference-guides/databricks/configuration.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
