Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
This page provides an overview of the Databricks integration. For installation instructions, see the Databricks Installation Introduction.
Databricks is a plugin integration with Immuta. This integration allows you to protect access to tables and manage row-, column-, and cell-level controls without enabling table ACLs or credential passthrough. Policies are applied to the plan that Spark builds for a user's query and enforced live on-cluster.
An Application Admin will configure Databricks with either the
Simplified Databricks Configuration on the Immuta App Settings page
Manual Databricks Configuration where Immuta artifacts must be downloaded and staged to your Databricks clusters
In both configuration options, the Immuta init script adds the Immuta plugin in Databricks: the Immuta Security Manager, wrappers, and Immuta analysis hook plan rewrite. Once an administrator gives users Can Attach To
entitlements on the cluster, they can query Immuta-registered data source directly in their Databricks notebooks.
Simplified Databricks configuration additional entitlements
The credentials used to do the Simplified Databricks configuration with automatic cluster policy push must have the following entitlement:
Allow cluster creation
This will give Immuta temporary permission to push the cluster policies to the configured Databricks workspace and overwrite any cluster policy templates previously applied to the workspace.
Immuta best practices: Test user
Test the integration on an Immuta-enabled cluster with a user that is not a Databricks administrator.
You should register entire databases with Immuta and run Schema Monitoring jobs through the Python script provided during data source registration. Additionally, you should use a Databricks administrator account to register data sources with Immuta using the UI or API; however, you should not test Immuta policies using a Databricks administrator account, as they are able to bypass controls.
A Databricks administrator can control who has access to specific tables in Databricks through Immuta Subscription Policies or by manually adding users to the data source. Data users will only see the immuta
database with no tables until they are granted access to those tables as Immuta data sources.
immuta
DatabaseWhen a table is registered in Immuta as a data source, users can see that table in the native Databricks database and in the immuta
database. This allows for an option to use a single database (immuta
) for all tables.
After data users have subscribed to data sources, administrators can apply fine-grained access controls, such as restricting rows or masking columns with advanced anonymization techniques, to manage what the users can see in each table. More details on the types of data policies can be found on the Data Policies page, including an overview of masking struct and array columns in Databricks.
Note: Immuta recommends building Global Policies rather than Local Policies, as they allow organizations to easily manage policies as a whole and capture system state in a more deterministic manner.
All access controls must go through SQL.
Note: With R, you must load the SparkR library in a cell before accessing the data.
Usernames in Immuta must match usernames in Databricks. It is best practice is to use the same identity manager for Immuta that you use for Databricks (Immuta supports these identity manager protocols and providers. however, for Immuta SaaS users, it’s easiest to just ensure usernames match between systems.
An Immuta Application Administrator configures the Databricks integration and registers available cluster policies Immuta generates.
The Immuta init script adds the immuta
plugin in Databricks: the Immuta SecurityManager, wrappers, and Immuta analysis hook plan rewrite.
A Data Owner registers Databricks tables in Immuta as data sources. A Data Owner, Data Governor, or Administrator creates or changes a policy or user in Immuta.
Data source metadata, tags, user metadata, and policy definitions are stored in Immuta's Metadata Database.
A Databricks user who is subscribed to the data source in Immuta queries the corresponding table directly in their notebook or workspace.
During Spark Analysis, Spark calls down to the Metastore to get table metadata.
Immuta intercepts the call to retrieve table metadata from the Metastore.
Immuta modifies the Logical Plan to enforce policies that apply to that user.
Immuta wraps the Physical Plan with specific Java classes to signal to the SecurityManager that it is a trusted node and is allowed to scan raw data.
The Physical Plan is applied and filters out and transforms raw data coming back to the user.
The user sees policy-enforced data.
Py4j security disabled: In addition to support for Python, SQL, and R, this configuration adds support for additional Python libraries and utilities by disabling Databricks-native Py4j security.
This configuration does not rely on Databricks-native Py4j security to secure the cluster, while process isolation is still enabled to secure filesystem and network access from within Python processes. On an Immuta-enabled cluster, once Py4J security is disabled the Immuta SecurityManager is installed to prevent nefarious actions from Python in the JVM. Disabling Py4J security also allows for expanded Python library support, including many Python ML classes (such as LogisticRegression
, StringIndexer
, and DecisionTreeClassifier
) and dbutils.fs.
By default, all actions in R will execute as the root user. Among other things, this permits access to the entire filesystem (including sensitive configuration data). And without iptable restrictions, a user may freely access the cluster’s cloud storage credentials. To properly support the use of the R language, Immuta’s initialization script wraps the R and Rscript binaries to launch each command as a temporary, non-privileged user. This user has limited filesystem and network access. The Immuta SecurityManager is also installed to prevent users from bypassing policies and protects against the above vulnerabilities from within the JVM.
The SecurityManager will incur a small increase in performance overhead; average latency will vary depending on whether the cluster is homogeneous or heterogeneous. (In homogeneous clusters, all users are at the same level of groups/authorizations; this is enforced externally, rather than directly by Immuta.)
When users install third-party Java/Scala libraries, they will be denied access to sensitive resources by default. However, cluster administrators can specify which of the installed Databricks libraries should be trusted by Immuta.
A homogeneous cluster is recommended for configurations where Py4J security is disabled. If all users have the same level of authorization, there would not be any data leakage, even if a nefarious action was taken.
For full details on Databricks’ best practices in configuring clusters, please read their governance documentation.
Performance: This is the most performant policy configuration.
In this configuration, Immuta is able to rely on Databricks-native security controls, reducing overhead. The key security control here is the enablement of process isolation. This prevents users from obtaining unintentional access to the queries of other users. In other words, masked and filtered data is consistently made accessible to users in accordance with their assigned attributes. This Immuta cluster configuration relies on Py4J security being enabled.
Many Python ML classes (such as LogisticRegression
, StringIndexer
, and DecisionTreeClassifier
) and dbutils.fs are unfortunately not supported with Py4J security enabled. Users will also be unable to use the Databricks Connect client library. Additionally, only Python and SQL are available as supported languages.
For full details on Databricks’ best practices in configuring clusters, please read their governance documentation.
Additional overhead: In relation to the Python & SQL cluster policy, this configuration trades some additional overhead for added support of the R language.
In this configuration, you are able to rely on the Databricks-native security controls. The key security control here is the enablement of process isolation. This prevents users from obtaining unintentional access to the queries of other users. In other words, masked and filtered data is consistently made accessible to users in accordance with their assigned attributes.
Like the Python & SQL configuration, Py4j security is enabled for the Python & SQL & R configuration. However, because R has been added Immuta enables the SecurityManager, in addition to Py4j security, to provide more security guarantees. For example, by default all actions in R execute as the root user; among other things, this permits access to the entire filesystem (including sensitive configuration data), and, without iptable restrictions, a user may freely access the cluster’s cloud storage credentials. To address these security issues, Immuta’s initialization script wraps the R and Rscript binaries to launch each command as a temporary, non-privileged user with limited filesystem and network access and installs the Immuta SecurityManager, which prevents users from bypassing policies and protects against the above vulnerabilities from within the JVM.
Consequently, the cost of introducing R is that the SecurityManager incurs a small increase in performance overhead; however, average latency will vary depending on whether the cluster is homogeneous or heterogeneous. (In homogeneous clusters, all users are at the same level of groups/authorizations; this is enforced externally, rather than directly by Immuta.)
Many Python ML classes (such as LogisticRegression
, StringIndexer
, and DecisionTreeClassifier
) and dbutils.fs are unfortunately not supported with Py4J security enabled. Users will also be unable to use the Databricks Connect client library.
When users install third-party Java/Scala libraries, they will be denied access to sensitive resources by default. However, cluster administrators can specify which of the installed Databricks libraries should be trusted by Immuta.
For full details on Databricks’ best practices in configuring clusters, please read their governance documentation.
Single-user clusters recommended
Like Databricks, Immuta recommends single-user clusters for sparklyr when user isolation is required. A single-user cluster can either be a job cluster or a cluster with credential passthrough enabled. Note: spark-submit jobs are not currently supported.
Two cluster types can be configured with sparklyr: Single-User Clusters (recommended) and Multi-User Clusters (discouraged).
Single-User Clusters: Credential Passthrough (required on Databricks) allows a single-user cluster to be created. This setting automatically configures the cluster to assume the role of the attached user when reading from storage. Because Immuta requires that raw data is readable by the cluster, the instance profile associated with the cluster should be used rather than a role assigned to the attached user.
Multi-User Clusters: Because Immuta cannot guarantee user isolation in a multi-user sparklyr cluster, it is not recommended to deploy a multi-user cluster. To force all users to act under the same set of attributes, groups, and purposes with respect to their data access and eliminate the risk of a data leak, all sparklyr multi-user clusters must be equalized either by convention (all users able to attach to the cluster have the same level of data access in Immuta) or by configuration (detailed below).
In addition to the configuration for an Immuta cluster with R, add this environment variable to the Environment Variables section of the cluster:
This configuration makes changes to the iptables rules on the cluster to allow the sparklyr client to connect to the required ports on the JVM used by the sparklyr backend service.
Install and load libraries into a notebook. Databricks includes the stable version of sparklyr, so library(sparklyr)
in an R notebook is sufficient, but you may opt to install the latest version of sparklyr from CRAN
. Additionally, loading library(DBI)
will allow you to execute SQL queries.
Set up a sparklyr connection:
Pass the connection object to execute queries:
Add the following items to the Spark Config section of the cluster:
The trustedFileSystems
setting is required to allow Immuta’s wrapper FileSystem (used in conjunction with the ImmutaSecurityManager
for data security purposes) to be used with credential passthrough. Additionally, the InstanceProfileCredentialsProvider
must be configured to continue using the cluster’s instance profile for data access, rather than a role associated with the attached user.
Avoid deploying multi-user clusters with sparklyr configuration
It is possible, but not recommended, to deploy a multi-user cluster sparklyr configuration. Immuta cannot guarantee user isolation in a multi-user sparklyr configuration.
The configurations in this section enable sparklyr, require project equalization, map sparklyr sessions to the correct Immuta user, and prevent users from accessing Immuta native workspaces.
Add the following environment variables to the Environment Variables section of your cluster configuration:
Add the following items to the Spark Config section:
Immuta’s integration with sparklyr does not currently support
spark-submit jobs,
UDFs, or
Databricks Runtimes 5, 6, or 7.
Scala clusters: This configuration is for Scala-only clusters.
Where Scala language support is needed, this configuration can be used in the Custom access mode.
According to Databricks’ cluster type support documentation, Scala clusters are intended for single users only. However, nothing inherently prevents a Scala cluster from being configured for multiple users. Even with the Immuta SecurityManager enabled, there are limitations to user isolation within a Scala job.
For a secure configuration, it is recommended that clusters intended for Scala workloads are limited to Scala jobs only and are made homogeneous through the use of project equalization or externally via convention/cluster ACLs. (In homogeneous clusters, all users are at the same level of groups/authorizations; this is enforced externally, rather than directly by Immuta.)
For full details on Databricks’ best practices in configuring clusters, please read their governance documentation.
This page describes the Databricks integration, configuration options, and features. See the Databricks integration page for a tutorial on enabling Databricks and these features through the App Settings page.
The table below outlines the integrations supported for various Databricks cluster configurations. For example, the only integration available to enforce policies on a cluster configured to run on Databricks Runtime 9.1 is the Databricks Spark integration.
Cluster 1
9.1
Unavailable
Unavailable
Cluster 2
10.4
Unavailable
Unavailable
Cluster 3
11.3
Unavailable
Cluster 4
11.3
Cluster 5
11.3
Legend:
The feature or integration is enabled.
The feature or integration is disabled.
Databricks instance: Premium tier workspace and Cluster access control enabled
Databricks instance has network level access to Immuta tenant
Access to Immuta archives
Permissions and access to download (outside Internet access) or transfer files to the host machine
Recommended Databricks Workspace Configurations:
Note: Azure Databricks authenticates users with Microsoft Entra ID. Be sure to configure your Immuta tenant with an IAM that uses the same user ID as does Microsoft Entra ID. Immuta's Spark security plugin will look to match this user ID between the two systems. See this Microsoft Entra ID page for details.
See this page for a list of Databricks Runtimes Immuta supports.
Immuta supports the Custom access mode.
Supported Languages:
Python
SQL
R (requires advanced configuration; work with your Immuta support professional to use R)
Scala (requires advanced configuration; work with your Immuta support professional to use Scala)
The Immuta Databricks integration supports the following Databricks features:
Change Data Feed: Databricks users can see the Databricks Change Data Feed on queried tables if they are allowed to read raw data and meet specific qualifications.
Databricks Libraries: Users can register their Databricks Libraries with Immuta as trusted libraries, allowing Databricks cluster administrators to avoid Immuta security manager errors when using third-party libraries.
External Metastores: Immuta supports the use of external metastores in local or remote mode.
Spark Direct File Reads: In addition to supporting direct file reads through workspace and scratch paths, Immuta allows direct file reads in Spark for file paths.
Users can have additional write access in their integration using project workspaces. Users can integrate a single or multiple workspaces with a single Immuta tenant. For more details, see the Databricks Project Workspaces page.
The Immuta Databricks integration cannot ingest tags from Databricks, but you can connect any of these supported external catalogs to work with your integration.
Native impersonation allows users to natively query data as another Immuta user. To enable native user impersonation, see the User Impersonation page.
Audit limitations
Immuta will audit queries that come from interactive notebooks, notebook jobs, and JDBC connections, but will not audit Scala or R submit jobs. Furthermore, Immuta only audits Spark jobs that are associated with Immuta tables. Consequently, Immuta will not audit a query in a notebook cell that does not trigger a Spark job, unless immuta.spark.audit.all.queries
is set to true
; for more details about this configuration and auditing all queries in Databricks, see Limited Enforcement in Databricks.
Capturing the code or query that triggers the Spark plan makes audit records more useful in assessing what users are doing.
To audit the code or query that triggers the Spark plan, Immuta hooks into Databricks where notebook cells and JDBC queries execute and saves the cell or query text. Then, Immuta pulls this information into the audits of the resulting Spark jobs. Examples of a saved cell/query and the resulting audit record are provided on the Databricks query audit logs page.
A user can configure multiple integrations of Databricks to a single Immuta tenant and use them dynamically or with workspaces.
Immuta does not support Databricks clusters with Photon acceleration enabled.
CDF shows the row-level changes between versions of a Delta table. The changes displayed include row data and metadata that indicates whether the row was inserted, deleted, or updated.
Immuta does not support applying policies to the changed data, and the CDF cannot be read for data source tables if the user does not have access to the raw data in Databricks. However, the CDF can be read if the querying user is allowed to read the raw data and one of the following statements is true:
the table is in the current workspace,
the table is in a scratch path,
non-Immuta reads are enabled AND the table does not intersect with a workspace under which the current user is not acting, or
non-Immuta reads are enabled AND the table is not part of an Immuta data source.
There are no configuration changes necessary to use this feature.
Immuta does not support reading changes in .
/
It is most secure to leverage an equalized project when working in a Scala cluster; however, it is not required to limit Scala to equalized projects. This document outlines security recommendations for Scala clusters and discusses the security risks involved when equalized projects are not used.
Language support: R and Scala are both supported, but require advanced configuration; work with your Immuta support professional to use these languages.
There are limitations to isolation among users in Scala jobs on a Databricks cluster, even when using Immuta’s SecurityManager. When data is broadcast, cached (spilled to disk), or otherwise saved to SPARK_LOCAL_DIR
, it's impossible to distinguish between which user’s data is composed in each file/block. If you are concerned about this vulnerability, Immuta suggests that Scala clusters
be limited to Scala jobs only.
use project equalization, which forces all users to act under the same set of attributes, groups, and purposes with respect to their data access.
When data is read in Spark using an Immuta policy-enforced plan, the masking and redaction of rows is performed at the leaf level of the physical Spark plan, so a policy such as "Mask using hashing the column social_security_number
for everyone" would be implemented as an expression on a project node right above the FileSourceScanExec/LeafExec
node at the bottom of the plan. This process prevents raw data from being shuffled in a Spark application and, consequently, from ending up in SPARK_LOCAL_DIR
.
This policy implementation coupled with an equalized project guarantees that data being dropped into SPARK_LOCAL_DIR
will have policies enforced and that those policies will be homogeneous for all users on the cluster. Since each user will have access to the same data, if they attempt to manually access other users' cached/spilled data, they will only see what they have access to via equalized permissions on the cluster. If project equalization is not turned on, users could dig through that directory and find data from another user with heightened access, which would result in a data leak.
To require that Scala clusters be used in equalized projects and avoid the risk described above, change the immuta.spark.require.equalization
value to true
in your Immuta configuration file when you spin up Scala clusters:
Once this configuration is complete, users on the cluster will need to switch to an Immuta equalized project before running a job. (Remember that when working under an Immuta Project, only tables within that project can be seen.) Once the first job is run using that equalized project, all subsequent jobs, no matter the user, must also be run under that same equalized project. If you need to change a cluster's project, you must restart the cluster.
This page describes how the Security Manager is disabled for Databricks clusters that do not allow R or Scala code to be executed. Databricks Administrators should place the desired configuration in the Spark environment variables (recommended) or immuta_conf.xml
(not recommended).
The Immuta Security Manager is an essential element of the Databricks deployment that ensures users can't perform unauthorized actions when using Scala and R, since those languages have features that allow users to circumvent policies without the Security Manager enabled. However, the Security Manager must inspect the call stack every time a permission check is triggered, which adds overhead to queries. To improve Immuta's query performance on Databricks, Immuta disables the Security Manager when Scala and R are not being used.
The cluster init script checks the cluster’s configuration and automatically removes the Security Manager configuration when
spark.databricks.repl.allowedlanguages
is a subset of {python, sql}
IMMUTA_SPARK_DATABRICKS_PY4J_STRICT_ENABLED
is true
When the cluster is configured this way, Immuta can rely on Databricks' process isolation and Py4J security to prevent user code from performing unauthorized actions.
Note: Immuta still expects the spark.driver.extraJavaOptions
and spark.executor.extraJavaOptions
to be set and pointing at the Security Manager.
Beyond disabling the Security Manager, Immuta will skip several startup tasks that are required to secure the cluster when Scala and R are configured, and fewer permission checks will occur on the Driver and Executors in the Databricks cluster, reducing overhead and improving performance.
There are still cases that require the Security Manager; in those instances, Immuta creates a fallback Security Manager to check the code path, so the IMMUTA_INIT_ALLOWED_CALLING_CLASSES_URI
environment variable must always point to a valid calling class file.
Databricks’ dbutils.fs
is blocked by their PY4J
security; therefore, it can’t be used to access scratch paths.
This page outlines configuration details for Immuta-enabled Databricks clusters. Databricks Administrators should place the desired configuration in the Spark environment variables (recommended) or immuta_conf.xml
(not recommended).
This page contains references to the term whitelist, which Immuta no longer uses. When the term is removed from the software, it will be removed from this page.
Environment variable overrides
Properties in the config file can be overridden during installation using environment variables. The variable names are the config names in all upper case with _
instead of .
. For example, to set the value of immuta.base.url
via an environment variable, you would set the following in the Environment Variables
section of cluster configuration: IMMUTA_BASE_URL=https://immuta.mycompany.com
immuta.ephemeral.host.override
Default: true
Description: Set this to false
if ephemeral overrides should not be enabled for Spark. When true
, this will automatically override ephemeral data source httpPaths with the httpPath of the Databricks cluster running the user's Spark application.
immuta.ephemeral.host.override.httpPath
Description: This configuration item can be used if automatic detection of the Databricks httpPath should be disabled in favor of a static path to use for ephemeral overrides.
immuta.ephemeral.table.path.check.enabled
Default: true
Description: When querying Immuta data sources in Spark, the metadata from the Metastore is compared to the metadata for the target source in Immuta to validate that the source being queried exists and is queryable on the current cluster. This check typically validates that the target (database, table) pair exists in the Metastore and that the table’s underlying location matches what is in Immuta. This configuration can be used to disable location checking if that location is dynamic or changes over time. Note: This may lead to undefined behavior if the same table names exist in multiple workspaces but do not correspond to the same underlying data.
immuta.spark.acl.enabled
Default: true
Description: Immuta Access Control List (ACL). Controls whether Databricks users are blocked from accessing non-Immuta tables. Ignored if Databricks Table ACLs are enabled (i.e., spark.databricks.acl.dfAclsEnabled=true
).
immuta.spark.acl.whitelist
Description: Comma-separated list of Databricks usernames who may access raw tables when the Immuta ACL is in use.
immuta.spark.acl.privileged.timeout.seconds
Default: 3600
Description: The number of seconds to cache privileged user status for the Immuta ACL. A privileged Databricks user is an admin or is whitelisted in immuta.spark.acl.whitelist
.
immuta.spark.acl.assume.not.privileged
Default: false
Description: Session property that overrides privileged user status when the Immuta ACL is in use. This should only be used in R scripts associated with spark-submit jobs.
immuta.spark.audit.all.queries
Default: false
Description: Enables auditing all queries run on a Databricks cluster, regardless of whether users touch Immuta-protected data or not.
immuta.spark.databricks.allow.non.immuta.reads
Default: false
Description: Allows non-privileged users to SELECT
from tables that are not protected by Immuta. See for details about this feature.
immuta.spark.databricks.allow.non.immuta.writes
Default: false
Description: Allows non-privileged users to run DDL commands and data-modifying commands against tables or spaces that are not protected by Immuta. See for details about this feature.
immuta.spark.databricks.allowed.impersonation.users
Description: This configuration is a comma-separated list of Databricks users who are allowed to impersonate Immuta users.
immuta.spark.databricks.dbfs.mount.enabled
Default: false
Description: Exposes the DBFS FUSE mount located at /dbfs
. Granular permissions are not possible, so all users will have read/write access to all objects therein. Note: Raw, unfiltered source data should never be stored in DBFS.
immuta.spark.databricks.disabled.udfs
Description: Block one or more Immuta from being used on an Immuta cluster. This should be a Java regular expression that matches the set of UDFs to block by name (excluding the immuta
database). For example to block all project UDFs, you may configure this to be ^.*_projects?$
. For a list of functions, see the .
immuta.spark.databricks.filesystem.blacklist
Default: hdfs
Description: A list of filesystem protocols that this instance of Immuta will not support for workspaces. This is useful in cases where a filesystem is available to a cluster but should not be used on that cluster.
immuta.spark.databricks.jar.uri
Default: file:///databricks/jars/immuta-spark-hive.jar
Description: The location of immuta-spark-hive.jar
on the filesystem for Databricks. This should not need to change unless a custom initialization script that places immuta-spark-hive in a non-standard location is necessary.
immuta.spark.databricks.local.scratch.dir.enabled
Default: true
Description: Creates a world-readable/writable scratch directory on local disk to facilitate the use of dbutils
and 3rd party libraries that may write to local disk. Its location is non-configurable and is stored in the environment variable IMMUTA_LOCAL_SCRATCH_DIR
. Note: Sensitive data should not be stored at this location.
immuta.spark.databricks.log.level
Default Value: INFO
Description: The SLF4J log level to apply to Immuta's Spark plugins.
immuta.spark.databricks.log.stdout.enabled
Default: false
Description: If true, writes logging output to stdout/the console as well as the log4j-active.txt
file (default in Databricks).
immuta.spark.databricks.py4j.strict.enabled
Default: true
Description: Disable to allow the use of the dbutils
API in Python. Note: This setting should only be disabled for customers who employ a homogeneous integration (i.e., all users have the same level of data access).
immuta.spark.databricks.scratch.database
Description: This configuration is a comma-separated list of additional databases that will appear as scratch databases when running a SHOW DATABASE
query. This configuration increases performance by circumventing the Metastore to get the metadata for all the databases to determine what to display for a SHOW DATABASE
query; it won't affect access to the scratch databases. Instead, use immuta.spark.databricks.scratch.paths
to control read and write access to the underlying database paths.
Additionally, this configuration will only display the scratch databases that are configured and will not validate that the configured databases exist in the Metastore. Therefore, it is up to the Databricks administrator to properly set this value and keep it current.
immuta.spark.databricks.scratch.paths
Description: Comma-separated list of remote paths that Databricks users are allowed to directly read/write. These paths amount to unprotected "scratch spaces." You can create a scratch database by configuring its specified location (or configure dbfs:/user/hive/warehouse/<db_name>.db
for the default location).
To create a scratch path to a location or a database stored at that location, configure
To create a scratch path to a database created using the default location,
immuta.spark.databricks.scratch.paths.create.db.enabled
Default: false
Description: Enables non-privileged users to create or drop scratch databases.
immuta.spark.databricks.single.impersonation.user
Default: false
Description: When true
, this configuration prevents users from changing their impersonation user once it has been set for a given Spark session. This configuration should be set when the BI tool or other service allows users to submit arbitrary SQL or issue SET commands.