Skip to content

Limited Enforcement in Databricks

Audience: System Administrators

Content Summary: This page details how to enable non-Immuta reads access, non-Immuta writes access, and auditing of all queries in Databricks.

Introduction

Generally, Immuta prevents users from seeing data unless they are explicitly given access, which blocks access to raw sources in the underlying databases. However, in some native patterns (such as Snowflake), Immuta adds views to allow users access to Immuta sources but does not impede access to preexisting sources in the underlying database. Therefore, if a user had access in Snowflake to a table before Immuta was installed, they would still have access to that table after.

Unlike the example above, Databricks non-admin users will only see sources to which they are subscribed in Immuta, and this can present problems if organizations have a data lake full of non-sensitive data and Immuta removes access to all of it. The Limited Enforcement Scope feature addresses this challenge by allowing users to access any sources that are not protected by Immuta (i.e., not registered as a data source or a table in a native workspace). Although this is similar to how privileged users in Databricks operate, non-privileged users cannot bypass Immuta controls.

This feature is composed of two configurations:

  • Allowing non-Immuta reads: Regular (unprivileged) Databricks users may SELECT from tables that are not protected in some way by Immuta.
  • Allowing non-Immuta writes: Regular (unprivileged) users can run DDL commands and data-modifying commands against tables or spaces that are not protected by Immuta.

Additionally, Immuta supports auditing all queries run on a Databricks cluster, regardless of whether users touch Immuta-protected data or not. To configure Immuta to do so, navigate to the Enable Auditing of All Queries in Databricks section.

Enable Non-Immuta Reads

Non-Immuta Reads

  • This setting does not allow reading data directly with commands like spark.read.format("x"). Users are still required to read data and query tables using Spark SQL.

  • When non-Immuta reads are enabled, users will see all databases and tables when they run show databases and/or show tables. However, this does not mean they will be able to query all of them.

  1. Enable non-Immuta Reads by setting this configuration in immuta_conf.xml:

    <property>
        <name>immuta.spark.databricks.allow.non.immuta.reads</name>
        <value>true</value>
    </property>
    
  2. Opt to adjust the cache duration by changing the default value in immuta_conf.xml. (Immuta caches whether a table has been exposed as an Immuta source to improve performance. The default caching duration is 1 hour.)

    <property>
        <name>immuta.spark.non.immuta.table.cache.seconds</name>
        <value>3600</value>
    </property>
    

Enable Non-Immuta Writes

Non-Immuta Writes

  • These non-protected tables/spaces have the same exposure as detailed in the read section, but with the distinction that users can write data directly to these paths.

  • With non-Immuta writes enabled, it will be possible for users on the cluster to mix any policy-enforced data they may have access to via any registered data sources in Immuta with non-Immuta data, and write the ensuing result to a non-Immuta write space where it would be visible to others. If this is not a desired possibility, the cluster should instead be configured to only use Immuta’s native workspaces.

  1. Enable non-Immuta Writes by setting this configuration in immuta_conf.xml:

    <property>
        <name>immuta.spark.databricks.allow.non.immuta.writes</name>
        <value>true</value>
    </property>
    
  2. Opt to adjust the cache duration by changing the default value in immuta_conf.xml. (Immuta caches whether a table has been exposed as an Immuta source to improve performance. The default caching duration is 1 hour.)

    <property>
        <name>immuta.spark.non.immuta.table.cache.seconds</name>
        <value>3600</value>
    </property>
    

Enable Auditing of All Queries in Databricks

Enable support for auditing all queries run on a Databricks cluster (regardless of whether users touch Immuta-protected data or not) by setting this configuration in immuta_conf.xml:

<property>
    <name>immuta.spark.audit.all.queries</name>
    <value>true</value>
</property>

Default Configuration Values

The controls associated with non-Immuta reads, non-Immuta writes, and audit functionality are in immuta_conf.xml. The excerpt below outlines these properties and their default values.

<property>
    <name>immuta.spark.databricks.allow.non.immuta.reads</name>
    <value>false</value>
</property>
<property>
    <name>immuta.spark.databricks.allow.non.immuta.writes</name>
    <value>false</value>
</property>
<property>
    <name>immuta.spark.non.immuta.table.cache.seconds</name>
    <value>3600</value>
</property>
<property>
    <name>immuta.spark.audit.all.queries</name>
    <value>false</value>
</property>