Limited Enforcement in Databricks Spark

Databricks non-admin users will only see sources to which they are subscribed in Immuta, and this can present problems if organizations have a data lake full of non-sensitive data and Immuta removes access to all of it. The Limited Enforcement Scope feature addresses this challenge by allowing Immuta users to access any tables that are not protected by Immuta (i.e., not registered as a data source or a table in a project workspace). Although this is similar to how privileged users in Databricks operate, non-privileged users cannot bypass Immuta controls.

This feature is composed of two configurations:

  • Allowing non-Immuta reads: Immuta users with regular (unprivileged) Databricks roles may SELECT from tables that are not registered in Immuta.

  • Allowing non-Immuta writes: Immuta users with regular (unprivileged) Databricks roles can run DDL commands and data-modifying commands against tables or spaces that are not registered in Immuta.

Additionally, Immuta supports auditing all queries run on a Databricks cluster, regardless of whether users touch Immuta-protected data or not. To configure Immuta to do so, navigate to the Enable Auditing of All Queries in Databricks section.

Enable Non-Immuta Reads

  1. Enable non-Immuta Reads by setting this configuration in the Spark environment variables (recommended) or immuta_conf.xml (not recommended):

    <property>
        <name>immuta.spark.databricks.allow.non.immuta.reads</name>
        <value>true</value>
    </property>
  2. Opt to adjust the cache duration by changing the default value in the Spark environment variables (recommended) or immuta_conf.xml (not recommended). (Immuta caches whether a table has been exposed as an Immuta source to improve performance. The default caching duration is 1 hour.)

    <property>
        <name>immuta.spark.non.immuta.table.cache.seconds</name>
        <value>3600</value>
    </property>

Enable Non-Immuta Writes

  1. Enable non-Immuta Writes by setting this configuration in the Spark environment variables (recommended) or immuta_conf.xml (not recommended):

    <property>
        <name>immuta.spark.databricks.allow.non.immuta.writes</name>
        <value>true</value>
    </property>
  2. Opt to adjust the cache duration by changing the default value in the Spark environment variables (recommended) or immuta_conf.xml (not recommended). (Immuta caches whether a table has been exposed as an Immuta source to improve performance. The default caching duration is 1 hour.)

    <property>
        <name>immuta.spark.non.immuta.table.cache.seconds</name>
        <value>3600</value>
    </property>

Enable Auditing of All Queries in Databricks

Enable support for auditing all queries run on a Databricks cluster (regardless of whether users touch Immuta-protected data or not) by setting this configuration in the Spark environment variables (recommended) or immuta_conf.xml (not recommended):

<property>
    <name>immuta.spark.audit.all.queries</name>
    <value>true</value>
</property>

Default Configuration Values

The controls and default values associated with non-Immuta reads, non-Immuta writes, and audit functionality are outlined below.

<property>
    <name>immuta.spark.databricks.allow.non.immuta.reads</name>
    <value>false</value>
</property>
<property>
    <name>immuta.spark.databricks.allow.non.immuta.writes</name>
    <value>false</value>
</property>
<property>
    <name>immuta.spark.non.immuta.table.cache.seconds</name>
    <value>3600</value>
</property>
<property>
    <name>immuta.spark.audit.all.queries</name>
    <value>false</value>
</property>

Last updated

Was this helpful?