Scala Cluster Security Details
Audience: System Administrators
Content Summary: It is most secure to leverage an equalized project when working in a Scala cluster; however, it is not required to limit Scala to equalized projects. This document outlines security recommendations for Scala clusters and discusses the security risks involved when equalized projects are not used.
There are limitations to isolation among users in Scala jobs on a Databricks cluster, even when using Immuta’s
SecurityManager. When data is broadcast, cached (spilled to disk), or otherwise saved to
impossible to distinguish between which user’s data is composed in each file/block. If you are concerned about this
vulnerability, Immuta suggests that Scala clusters
- be limited to Scala jobs only.
- use project equalization, which forces all users to act under the same set of attributes, groups, and purposes with respect to their data access.
Context for Security: Why Project Equalization is Recommended
When data is read in Spark using an Immuta policy-enforced plan, the masking and redaction of rows is performed at the
leaf level of the physical Spark plan, so a policy such as "Mask using hashing the column
everyone" would be implemented as an expression on a project node right above the
FileSourceScanExec/LeafExec node at
the bottom of the plan. This process prevents raw data from being shuffled in a Spark application and, consequently,
from ending up in
This policy implementation coupled with an equalized project guarantees that data
being dropped into
SPARK_LOCAL_DIR will have policies enforced and that those policies will be homogeneous for all
users on the cluster. Since each user will have access to the same data, if they attempt to manually access other
users' cached/spilled data, they will only see what they have access to via equalized permissions on the cluster. If
project equalization is not turned on, users could dig through that directory and find data from another
user with heightened access, which would result in a data leak.
Configuration for Requiring Equalized Projects with Scala
To require that Scala clusters be used in equalized projects and avoid the risk described above, change the
immuta.spark.require.equalization value to
true in your Immuta configuration file when you spin up Scala clusters:
<property> <name>immuta.spark.require.equalization</name> <value>true</value> </property>
Once this configuration is complete, users on the cluster will need to switch to an Immuta equalized project before running a job. (Remember that when working under an Immuta Project, only tables within that project can be seen.) Once the first job is run using that equalized project, all subsequent jobs, no matter the user, must also be run under that same equalized project. If you need to change a cluster's project, you must restart the cluster.