Setting Up Users

When the Databricks Spark plugin is running on a Databricks cluster, all Databricks users running jobs or queries are either a privileged user or a non-privileged user:

Privileged users: Privileged users can effectively read from and write to any table or view in the cluster Metastore, or any file path accessible by the cluster, without restriction. Privileged users are either Databricks workspace admins or users specified in IMMUTA_SPARK_ACL_ALLOWLIST. Any user writing queries or jobs impersonating another user is a non-privileged user, even if they are impersonating a privileged user.
Privileged users have effective authority to read from and write to any securable in the cluster metastore or file path, because in almost all cases Databricks clusters running with the Immuta Spark plug-in installed have disabled Hive metastore table access control. However, if Hive metastore table access control is enabled on the cluster, privileged users will have the authority granted to them that is specified by table access control.
Non-privileged users: Non-privileged users are any users who are not privileged users, and all authorization for non-privileged users is determined by Immuta policies.

Whether a user is a privileged user or a non-privileged user, for a given query or job, is cached once first determined, based on IMMUTA_SPARK_ACL_PRIVILEGED_TIMEOUT_SECONDS environment variable. This caching can be disabled entirely by setting the value of that environment variable to 0.

Mapping Databricks users to Immuta

Usernames in Databricks must match the usernames in the connected Immuta tenant. By default, the Immuta Spark plugin checks the Databricks username against the username within Immuta's internal IAM to determine access. However, you can integrate your existing IAM with Immuta and use that instead of the default internal IAM. Ideally, you should use the same identity manager for Immuta that you use for Databricks. See the Immuta support matrix page for a list of supported identity providers and protocols.

It is possible within Immuta to have multiple users share the same username if they exist within different IAMs. In this case, the cluster can be configured to look up users from a specified IAM. To do this, the value of theIMMTUA_USER_MAPPING_IAMID Spark environment variable must be updated to be the targeted IAM ID configured within the Immuta tenant. The targeted IAM ID can be found on the App settings page. Each Databricks cluster can only be mapped to one IAM.

User impersonation

Databricks user impersonation allows a Databricks user to impersonate an Immuta user. With this feature,

the Immuta user who is being impersonated does not have to have a Databricks account, but they must have an Immuta account.
the Databricks user who is impersonating an Immuta user does not have to be associated with Immuta. For example, this could be a service account.

When acting under impersonation, the Databricks user loses their privileged access, so they can only access the tables the Immuta user has access to and only perform DDL commands when that user is acting under an allowed circumstance (such as workspaces, scratch paths, or non-Immuta reads/writes).

Use the IMMUTA_SPARK_DATABRICKS_ALLOWED_IMPERSONATION_USERS Spark environment variable to enable user impersonation.

Scala clusters

Immuta discourages use of this feature with Scala clusters, as the proper security mechanisms were not built to account for user isolation limitations in Scala clusters. Instead, this feature was developed for the BI tool use case in which service accounts connecting to the Databricks cluster need to impersonate Immuta users so that policies can be enforced.

PreviousCustomizing the Integration NextSpark Environment Variables

Last updated 3 months ago

Was this helpful?