> For the complete documentation index, see [llms.txt](https://documentation.immuta.com/2025.1/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://documentation.immuta.com/2025.1/configuration/integrations/databricks/databricks-spark/reference-guides/accessing-data.md). # Accessing Data Once a Databricks securable is registered in Immuta as a data source and you are subscribed to that data source, you must access that data through SQL: {% tabs %} {% tab title="Python" %} ```python df = spark.sql("select * from immuta.table") ``` {% endtab %} {% tab title="Scala" %} ```scala import org.apache.spark.sql.SparkSession val spark = SparkSession .builder() .appName("Spark SQL basic example") .config("spark.some.config.option", "some-value") .getOrCreate() val sqlDF = spark.sql("SELECT * FROM immuta.table") ``` {% endtab %} {% tab title="SQL" %} ```sql %sql select * from immuta.table ``` {% endtab %} {% tab title="R" %} ```r library(SparkR) df <- SparkR::sql("SELECT * from immuta.table") ``` *With R, you must load the SparkR library in a cell before accessing the data.* {% endtab %} {% endtabs %} See the sections below for more guidance on accessing data using [Delta Lake](#delta-lake), [direct file reads in Spark for file paths](#spark-direct-file-reads), and [user impersonation](#user-impersonation). ## Delta Lake When using Delta Lake, the API does not go through the normal Spark execution path. This means that Immuta's Spark extensions do not provide protection for the API. To solve this issue and ensure that Immuta has control over what a user can access, the Delta Lake API is blocked. Spark SQL can be used instead to give the same functionality with all of Immuta's data protections. See the [Delta API reference guide](/2025.1/configuration/integrations/databricks/databricks-spark/reference-guides/accessing-data/delta-lake-api.md) for a list of corresponding Spark SQL calls to use. ## Spark direct file reads In addition to supporting direct file reads through workspace and scratch paths, Immuta allows direct file reads in Spark for file paths. As a result, users who prefer to interact with their data using file paths or who have existing workflows revolving around file paths can continue to use these workflows without rewriting those queries for Immuta. When reading from a path in Spark, the Immuta Databricks Spark plugin queries the Immuta Web Service to find Databricks data sources for the current user that are backed by data from the specified path. If found, the query plan maps to the Immuta data source and follows existing code paths for policy enforcement. Users can read data from individual parquet files in a sub-directory and partitioned data from a sub-directory (or by using a `where` predicate). Expand the blocks below to view examples of reading data using these methods.

Read data from an individual parquet file

To read from an individual file, load a partition file from a sub-directory: ```bash spark.read.format("parquet").load("s3:/my_bucket/path/to/my_parquet_table/partition_column=01/my_file.parquet") ```

Read partitioned data from a sub-directory

To read partitioned data from a sub-directory, load a parquet partition from a sub-directory: ```bash spark.read.format("parquet").load("s3:/my_bucket/path/to/my_parquet_table/partition_column=01") ``` Alternatively, load a parquet partition using a `where` predicate: ```bash spark.read.format("parquet").load("s3:/my_bucket/path/to/my_parquet_table").where("partition_column=01")Read partitioned data from a sub-directory ```

### Limitations * Direct file reads for Immuta data sources only apply to data sources created from tables, not data sources created from views or queries. * If more than one data source has been created for a path, Immuta will use the first valid data source it finds. It is therefore not recommended to use this integration when more than one data source has been created for a path. * In Databricks, multiple input paths are supported as long as they belong to the same data source. * CSV-backed tables are not currently supported. * Loading a `delta` partition from a sub-directory is not recommended by Spark and is not supported in Immuta. Instead, use a `where` predicate: ```bash # Not recommended by Spark and not supported in Immuta spark.read.format("delta").load("s3:/my_bucket/path/to/my_delta_table/partition_column=01") # Recommended by Spark and supported in Immuta. spark.read.format("delta").load("s3:/my_bucket/path/to/my_delta_table").where("partition_column=01") ``` ## User impersonation User impersonation allows Databricks users to query data as another Immuta user. To impersonate another user, see the [Impersonate a user page](/2025.1/governance/data-consumers/user-impersonation.md#databricks-spark). --- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://documentation.immuta.com/2025.1/configuration/integrations/databricks/databricks-spark/reference-guides/accessing-data.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.