Limited Enforcement in Databricks Spark

Generally, Immuta prevents users from seeing data unless they are explicitly given access, which blocks access to raw sources in the underlying databases. However, in some native patterns (such as Snowflake), Immuta adds views to allow users access to Immuta sources but does not impede access to preexisting sources in the underlying database. Therefore, if a user had access in Snowflake to a table before Immuta was installed, they would still have access to that table after.

Unlike the example above, Databricks non-admin users will only see sources to which they are subscribed in Immuta, and this can present problems if organizations have a data lake full of non-sensitive data and Immuta removes access to all of it. The Limited Enforcement Scope feature addresses this challenge by allowing Immuta users to access any tables that are not protected by Immuta (i.e., not registered as a data source or a table in a native workspace). Although this is similar to how privileged users in Databricks operate, non-privileged users cannot bypass Immuta controls.

This feature is composed of two configurations:

  • Allowing non-Immuta reads: Immuta users with regular (unprivileged) Databricks roles may SELECT from tables that are not registered in Immuta.

  • Allowing non-Immuta writes: Immuta users with regular (unprivileged) Databricks roles can run DDL commands and data-modifying commands against tables or spaces that are not registered in Immuta.

Additionally, Immuta supports auditing all queries run on a Databricks cluster, regardless of whether users touch Immuta-protected data or not. To configure Immuta to do so, navigate to the Enable Auditing of All Queries in Databricks section.

Enable Non-Immuta Reads

Non-Immuta reads

  • This setting does not allow reading data directly with commands like spark.read.format("x"). Users are still required to read data and query tables using Spark SQL.

  • When non-Immuta reads are enabled, Immuta users will see all databases and tables when they run show databases and/or show tables. However, this does not mean they will be able to query all of them.

  1. Enable non-Immuta Reads by setting this configuration in the Spark environment variables (recommended) or immuta_conf.xml (not recommended):

    <property>
        <name>immuta.spark.databricks.allow.non.immuta.reads</name>
        <value>true</value>
    </property>
  2. Opt to adjust the cache duration by changing the default value in the Spark environment variables (recommended) or immuta_conf.xml (not recommended). (Immuta caches whether a table has been exposed as an Immuta source to improve performance. The default caching duration is 1 hour.)

    <property>
        <name>immuta.spark.non.immuta.table.cache.seconds</name>
        <value>3600</value>
    </property>

Enable Non-Immuta Writes

Non-Immuta writes

  • These non-protected tables/spaces have the same exposure as detailed in the read section, but with the distinction that users can write data directly to these paths.

  • With non-Immuta writes enabled, it will be possible for users on the cluster to mix any policy-enforced data they may have access to via any registered data sources in Immuta with non-Immuta data, and write the ensuing result to a non-Immuta write space where it would be visible to others. If this is not a desired possibility, the cluster should instead be configured to only use Immuta’s native workspaces.

  1. Enable non-Immuta Writes by setting this configuration in the Spark environment variables (recommended) or immuta_conf.xml (not recommended):

    <property>
        <name>immuta.spark.databricks.allow.non.immuta.writes</name>
        <value>true</value>
    </property>
  2. Opt to adjust the cache duration by changing the default value in the Spark environment variables (recommended) or immuta_conf.xml (not recommended). (Immuta caches whether a table has been exposed as an Immuta source to improve performance. The default caching duration is 1 hour.)

    <property>
        <name>immuta.spark.non.immuta.table.cache.seconds</name>
        <value>3600</value>
    </property>

Enable Auditing of All Queries in Databricks

Enable support for auditing all queries run on a Databricks cluster (regardless of whether users touch Immuta-protected data or not) by setting this configuration in the Spark environment variables (recommended) or immuta_conf.xml (not recommended):

<property>
    <name>immuta.spark.audit.all.queries</name>
    <value>true</value>
</property>

Default Configuration Values

The controls and default values associated with non-Immuta reads, non-Immuta writes, and audit functionality are outlined below.

<property>
    <name>immuta.spark.databricks.allow.non.immuta.reads</name>
    <value>false</value>
</property>
<property>
    <name>immuta.spark.databricks.allow.non.immuta.writes</name>
    <value>false</value>
</property>
<property>
    <name>immuta.spark.non.immuta.table.cache.seconds</name>
    <value>3600</value>
</property>
<property>
    <name>immuta.spark.audit.all.queries</name>
    <value>false</value>
</property>

Last updated

Copyright © 2014-2024 Immuta Inc. All rights reserved.