Accessing Data
Last updated
Was this helpful?
Last updated
Was this helpful?
Once a Databricks securable is registered in Immut as a data source and you are subscribed to that data source, you must access that data through SQL:
See the sections below for more guidance on accessing data using , , and .
When using Delta Lake, the API does not go through the normal Spark execution path. This means that Immuta's Spark extensions do not provide protection for the API. To solve this issue and ensure that Immuta has control over what a user can access, the Delta Lake API is blocked.
Spark SQL can be used instead to give the same functionality with all of Immuta's data protections. See the for a list of corresponding Spark SQL calls to use.
In addition to supporting direct file reads through workspace and scratch paths, Immuta allows direct file reads in Spark for file paths. As a result, users who prefer to interact with their data using file paths or who have existing workflows revolving around file paths can continue to use these workflows without rewriting those queries for Immuta.
When reading from a path in Spark, the Immuta Databricks Spark plugin queries the Immuta Web Service to find Databricks data sources for the current user that are backed by data from the specified path. If found, the query plan maps to the Immuta data source and follows existing code paths for policy enforcement.
Users can read data from individual parquet files in a sub-directory and partitioned data from a sub-directory (or by using a where
predicate). Expand the blocks below to view examples of reading data using these methods.
Direct file reads for Immuta data sources only apply to data sources created from tables, not data sources created from views or queries.
If more than one data source has been created for a path, Immuta will use the first valid data source it finds. It is therefore not recommended to use this integration when more than one data source has been created for a path.
In Databricks, multiple input paths are supported as long as they belong to the same data source.
CSV-backed tables are not currently supported.
Loading a delta
partition from a sub-directory is not recommended by Spark and is not supported in Immuta. Instead, use a where
predicate:
User impersonation allows Databricks users to query data as another Immuta user. To impersonate another user, see the .