Security and Compliance
Immuta offers several features to provide security for your users and Databricks clusters and to prove compliance and monitor for anomalies.
Authentication
Configuring the integration and registering data
Immuta supports the following authentication methods to configure the Databricks Spark integration and register data sources:
OAuth machine-to-machine (M2M): Immuta uses the Client Credentials Flow to integrate with Databricks OAuth machine-to-machine authentication, which allows Immuta to authenticate with Databricks using a client secret. Once Databricks verifies the Immuta service principal’s identity using the client secret, Immuta is granted a temporary OAuth token to perform token-based authentication in subsequent requests. When that token expires (after one hour), Immuta requests a new temporary token. See the Databricks OAuth machine-to-machine (M2M) authentication page for more details.
Personal access token (PAT): This token gives Immuta temporary permission to push the cluster policies to the configured Databricks workspace and overwrite any cluster policy templates previously applied to the workspace when configuring the integration or to register securables as Immuta data sources.
User authentication
The built-in Immuta IAM can be used as a complete solution for authentication and fine-grained user entitlement. However, you can connect your existing identity management provider to Immuta to use that system for authentication and fine-grained user entitlement instead.
Each of the supported identity providers includes a specific set of configuration options that enable Immuta to communicate with the IAM system and map the users, permissions, groups, and attributes into Immuta.
See the Identity managers guide for a list of supported providers and details.
See the Setting up users guide for details and instructions on mapping Databricks user accounts to Immuta.
Cluster security
Data processing and encryption
See the Data processing and the Encryption and masking practices guides for more information about transmission of policy decision data, encryption of data in transit and at rest, and encryption key management.
Protecting the Immuta configuration
Non-administrator users on an Immuta-enabled Databricks cluster must not have access to view or modify Immuta configuration, as this poses a security loophole around Immuta policy enforcement. Databricks secrets allow you to securely apply environment variables to Immuta-enabled clusters.
Databricks secrets can be used in the environment variables configuration section for a cluster by referencing the secret path instead of the actual value of the environment variable.
See the Installation and compliance guide for details and instructions on using Databricks secrets.
Scala cluster security
There are limitations to isolation among users in Scala jobs on a Databricks cluster. When data is broadcast, cached (spilled to disk), or otherwise saved to SPARK_LOCAL_DIR
, it's impossible to distinguish between which user’s data is composed in each file/block. To address this vulnerability, Immuta suggests that you
limit Scala clusters to Scala jobs only and
require equalized projects, which will force all users to act under the same set of attributes, groups, and purposes with respect to their data access. This requirement guarantees that data being dropped into
SPARK_LOCAL_DIR
will have policies enforced and that those policies will be homogeneous for all users on the cluster. Since each user will have access to the same data, if they attempt to manually access other users' cached/spilled data, they will only see what they have access to via equalized permissions on the cluster. If project equalization is not turned on, users could dig through that directory and find data from another user with heightened access, which would result in a data leak.
See the Installation and compliance guide for more details and configuration instructions.
Auditing and compliance
Immuta provides auditing features and governance reports so that data owners and governors can monitor users' access to data and detect anomalies in behavior.
You can view the information in these audit logs on dashboards or export the full audit logs to S3 and ADLS for long-term backup and processing with log data processors and tools. This capability fosters convenient integrations with log monitoring services and data pipelines.
See the Audit documentation for details about these capabilities and how they work with the Databricks Spark integration.
Databricks query audit
Immuta captures the code or query that triggers the Spark plan in Databricks, making audit records more useful in assessing what users are doing.
To audit what triggers the Spark plan, Immuta hooks into Databricks where notebook cells and JDBC queries execute and saves the cell or query text. Then, Immuta pulls this information into the audits of the resulting Spark jobs.
Immuta will audit queries that come from interactive notebooks, notebook jobs, and JDBC connections, but will not audit Scala or R submit jobs. Furthermore, Immuta only audits Spark jobs that are associated with Immuta tables. Consequently, Immuta will not audit a query in a notebook cell that does not trigger a Spark job, unless IMMUTA_SPARK_AUDI_ALL_QUERIES
is set to true
.
See the Databricks Spark query audit logs page for examples of saved queries and the resulting audit records. To exclude query text from audit events, see the App settings page.
Auditing all queries
Immuta supports auditing all queries run on a Databricks cluster, regardless of whether users touch Immuta-protected data or not.
See the Installation and compliance guide for details and instructions.
Auditing queries run while impersonating another user
When a query is run by a user impersonating another user, the extra.impersonationUser
field in the audit log payload is populated with the Databricks username of the user impersonating another user. The userId
field will return the Immuta username of the user being impersonated:
See the Setting up users guide for details about user impersonation.
Governance reports
Immuta governance reports allow users with the GOVERNANCE
Immuta permission to use a natural language builder to instantly create reports that delineate user activity across Immuta. These reports can be based on various entity types, including users, groups, projects, data sources, purposes, policy types, or connection types.
See the Governance report types page for a list of report types and guidance.
Last updated
Was this helpful?