> For the complete documentation index, see [llms.txt](https://documentation.immuta.com/latest/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://documentation.immuta.com/latest/configuration/integrations/databricks/databricks-spark/reference-guides/security-and-compliance.md).

# Security and Compliance

Immuta offers several features to provide security for your users and Databricks clusters and to prove compliance and monitor for anomalies.

## Authentication

### Configuring the integration and registering data

Immuta supports the following authentication methods to configure the Databricks Spark integration and register data sources:

* **OAuth machine-to-machine (M2M)**: Immuta uses the [Client Credentials Flow](https://auth0.com/docs/get-started/authentication-and-authorization-flow/client-credentials-flow) to integrate with [Databricks OAuth machine-to-machine authentication](https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html), which allows Immuta to authenticate with Databricks using a client secret. Once Databricks verifies the Immuta service principal’s identity using the client secret, Immuta is granted a temporary OAuth token to perform token-based authentication in subsequent requests. When that token expires (after one hour), Immuta requests a new temporary token.\
  \
  See the [Databricks OAuth machine-to-machine (M2M) authentication page](https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html) for more details.
* **Personal access token (PAT)**: This token gives Immuta temporary permission to push the cluster policies to the configured Databricks workspace and overwrite any cluster policy templates previously applied to the workspace when configuring the integration or to register securables as Immuta data sources.

### User authentication

The built-in Immuta IAM can be used as a complete solution for authentication and fine-grained user entitlement. However, you can connect your existing identity management provider to Immuta to use that system for authentication and fine-grained user entitlement instead.

Each of the supported identity providers includes a specific set of configuration options that enable Immuta to communicate with the IAM system and map the users, permissions, groups, and attributes into Immuta.

See the [Identity managers](/latest/configuration/people/section-contents/reference-guides/identity-managers.md) guide for a list of supported providers and details.

See the [Setting up users guide](/latest/configuration/integrations/databricks/databricks-spark/reference-guides/databricks/setting-up-users.md) for details and instructions on mapping Databricks user accounts to Immuta.

## Cluster security

### Data processing and encryption

See the [Data processing, encryption, and masking practices guide](/latest/configuration/application-settings/reference-guides/data-processing.md) for more information about transmission of policy decision data, encryption of data in transit and at rest, and encryption key management.

### Protecting the Immuta configuration

Non-administrator users on an Immuta-enabled Databricks cluster must not have access to view or modify Immuta configuration, as this poses a security loophole around Immuta policy enforcement. [Databricks secrets](https://docs.databricks.com/security/secrets/index.html#spark-conf-env-var) allow you to securely apply environment variables to Immuta-enabled clusters.

Databricks secrets can be used in the environment variables configuration section for a cluster by referencing the secret path instead of the actual value of the environment variable.

See the [Installation and compliance guide](/latest/configuration/integrations/databricks/databricks-spark/reference-guides/databricks/installation-and-compliance.md#protecting-the-immuta-configuration) for details and instructions on using Databricks secrets.

### Scala cluster security

There are limitations to isolation among users in Scala jobs on a Databricks cluster. When data is broadcast, cached (spilled to disk), or otherwise saved to `SPARK_LOCAL_DIR`, it's impossible to distinguish between which user’s data is composed in each file/block. To address this vulnerability, Immuta suggests that you

* limit Scala clusters to Scala jobs only and
* require equalized projects, which will force all users to act under the same set of attributes, groups, and purposes with respect to their data access. This requirement guarantees that data being dropped into `SPARK_LOCAL_DIR` will have policies enforced and that those policies will be homogeneous for all users on the cluster. Since each user will have access to the same data, if they attempt to manually access other users' cached/spilled data, they will only see what they have access to via equalized permissions on the cluster. If project equalization is not turned on, users could dig through that directory and find data from another user with heightened access, which would result in a data leak.

See the [Installation and compliance guide](/latest/configuration/integrations/databricks/databricks-spark/reference-guides/databricks/installation-and-compliance.md#scala-clusters) for more details and configuration instructions.

## Auditing and compliance

Immuta provides auditing features and governance reports so that data owners and governors can monitor users' access to data and detect anomalies in behavior.

You can view the information in these audit logs on [dashboards](/latest/governance/detect-your-activity/detection/detect-concept.md) or [configure your Immuta deployment with audit](/latest/configuration/self-managed-deployment/configure/audit-best-practices.md) for long-term backup and processing with log data processors and tools. This capability fosters convenient integrations with log monitoring services and data pipelines.

### Databricks query audit

Immuta captures the code or query that triggers the Spark plan in Databricks, making audit records more useful in assessing what users are doing.

To audit what triggers the Spark plan, Immuta hooks into Databricks where notebook cells and JDBC queries execute and saves the cell or query text. Then, Immuta pulls this information into the audits of the resulting Spark jobs.

Immuta will audit queries that come from interactive notebooks, notebook jobs, and JDBC connections, but will not audit [Scala or R submit jobs](/latest/configuration/integrations/databricks/databricks-spark/how-to-guides/spark-submit.md). Furthermore, Immuta only audits Spark jobs that are associated with Immuta tables. Consequently, Immuta will not audit a query in a notebook cell that does not trigger a Spark job, *unless* [*`IMMUTA_SPARK_AUDI_ALL_QUERIES` is set to `true`.*](/latest/configuration/integrations/databricks/databricks-spark/reference-guides/databricks/configuration.md#immuta_spark_audit_all_queries)

See the [Databricks Spark query audit logs](/latest/governance/detect-your-activity/audit/reference-guides/query-audit-logs/databricks.md) page for examples of saved queries and the resulting audit records. To exclude query text from audit events, see the [App settings](/latest/configuration/application-settings/how-to-guides/config-builder-guide.md#exclude-query-text-from-audit-records) page.

### Auditing all queries

Immuta supports auditing all queries run on a Databricks cluster, regardless of whether users touch Immuta-protected data or not.

See the [Installation and compliance guide](/latest/configuration/integrations/databricks/databricks-spark/reference-guides/databricks/installation-and-compliance.md#audit) for details and instructions.

### Auditing queries run while impersonating another user

When a query is run by a user impersonating another user, the `extra.impersonationUser` field in the audit log payload is populated with the Databricks username of the user impersonating another user. The `userId` field will return the Immuta username of the user being impersonated:

```json
{
  "id": "query-a20e-493e-id-c1ada0a23a26",
  [...]
  "userId": "<immuta_username>",
  [...]
  "extra": {
    [...]
    "impersonationUser": "<databricks_username>"
  }
  [...]
}
```

See the [Setting up users guide](/latest/configuration/integrations/databricks/databricks-spark/reference-guides/databricks/setting-up-users.md#user-impersonation) for details about user impersonation.

### Governance reports

Immuta governance reports allow users with the `GOVERNANCE` Immuta permission to use a natural language builder to instantly create reports that delineate user activity across Immuta. These reports can be based on various entity types, including users, groups, projects, data sources, purposes, policy types, or connection types.

See the [Governance report types](/latest/governance/detect-your-activity/audit/reference-guides/reports.md) page for a list of report types and guidance.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.immuta.com/latest/configuration/integrations/databricks/databricks-spark/reference-guides/security-and-compliance.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.