Skip to content

Databricks Spark Application Configuration

Audience: System Administrators

Content Summary: This page outlines configuration options for Immuta-enabled Databricks clusters. Databricks Administrators should place desired configuration in the immuta_conf.xml file.

Spark Application Configuration

  • immuta.spark.acl.enabled

    • Default: true

    • Description: Immuta Access Control List (ACL). Controls whether Databricks users are blocked from accessing non-Immuta tables. Ignored if Databricks Table ACLs are enabled (i.e., spark.databricks.acl.dfAclsEnabled=true).

  • immuta.spark.acl.whitelist

    • Description: Comma-separated list of Databricks usernames who may access raw tables when the Immuta ACL is in use.
  • immuta.spark.acl.privileged.timeout.seconds

    • Default: 3600

    • Description: The number of seconds to cache privileged user status for the Immuta ACL. A privileged Databricks user is an admin or is whitelisted in immuta.spark.acl.whitelist.

  • immuta.spark.acl.assume.not.privileged

    • Default: false

    • Description: Session property that overrides privileged user status when the Immuta ACL is in use. This should only be used in R scripts associated with spark-submit jobs.

  • immuta.spark.resolve.raw.tables.enabled

    • Default: true

    • Description: Enables use of the underlying database and table name in queries against a table-backed Immuta data source. Note that this property is not set in immuta_conf.xml. Administrators or whitelisted users can set immuta.spark.session.resolve.raw.tables.enabled to false to bypass resolving raw databases or tables as Immuta data sources. This is useful if an admin wants to read raw data but is also an Immuta user. By default, data policies will be applied to a table even for an administrative user if that admin is also an Immuta user; however if they run set immuta.spark.session.resolve.raw.tables.enabled=false then they will see raw data only (not Immuta data policy-enforced data).

  • immuta.spark.session.resolve.raw.tables.enabled

    • Default: true

    • Description: Same as above, but a session property that allows users to toggle this functionality. Ignored if immuta.spark.resolve.raw.tables.enabled=false.

  • immuta.spark.databricks.allowed.remote.schemes

    • Description: Comma separated list of remote schemes that databricks users are allowed to directly read/write. Doesn't apply to hive table locations. Will always be a subset of immuta.spark.remote.schemes.
  • immuta.spark.databricks.filesystem.blacklist

    • Default: hdfs

    • Description: A list of filesystem protocols that this instance of Immuta will not support for workspaces. This is useful in cases where a filesystem is available to a cluster but should not be used on that cluster.

  • immuta.spark.acl.workspace.enabled

    • Default: true

    • Description: Enables enforcement of workspace operations in Databricks.

  • immuta.spark.require.equalization

    • Default: false

    • Description: Requires that users act through a single, equalized project. A cluster should be equalized if users need to run Scala jobs on it, and it should be limited to Scala jobs only via spark.databricks.repl.allowedLanguages.

  • immuta.user.context.class

    • Default: com.immuta.spark.OSUserContext

    • Description: The class name of the UserContext that will be used to determine the current user in immuta-spark-hive. The default implementation gets the OS user running the JVM for the Spark application.

  • immuta.user.allow.hdfsUser.fallback

    • Default: false

    • Description: If true, the Immuta Spark plugins will attempt to use both the userid field in Immuta as well as the hdfsUser field to map an Immuta user to the current Spark user. This is currently only relevant in Databricks deployments and hdfsUser is always used elsewhere.

  • immuta.user.mapping.iamid

    • Default: bim

    • Description: Denotes which IAM in Immuta should be used when mapping the current Spark user's username to a userid in Immuta. This defaults to bim but should be updated to reflect an actual production IAM.