# Accessing Data

Once a Databricks securable is registered in Immuta as a data source and you are subscribed to that data source, you must access that data through SQL:

{% tabs %}
{% tab title="Python" %}

```python
df = spark.sql("select * from immuta.table")
```

{% endtab %}

{% tab title="Scala" %}

```scala
import org.apache.spark.sql.SparkSession

val spark = SparkSession
  .builder()
  .appName("Spark SQL basic example")
  .config("spark.some.config.option", "some-value")
  .getOrCreate()
val sqlDF = spark.sql("SELECT * FROM immuta.table")
```

{% endtab %}

{% tab title="SQL" %}

```sql
%sql
select * from immuta.table
```

{% endtab %}

{% tab title="R" %}

```r
library(SparkR)
df <- SparkR::sql("SELECT * from immuta.table")
```

*With R, you must load the SparkR library in a cell before accessing the data.*
{% endtab %}
{% endtabs %}

See the sections below for more guidance on accessing data using [Delta Lake](#delta-lake), [direct file reads in Spark for file paths](#spark-direct-file-reads), and [user impersonation](#user-impersonation).

## Delta Lake

When using Delta Lake, the API does not go through the normal Spark execution path. This means that Immuta's Spark extensions do not provide protection for the API. To solve this issue and ensure that Immuta has control over what a user can access, the Delta Lake API is blocked.

Spark SQL can be used instead to give the same functionality with all of Immuta's data protections. See the [Delta API reference guide](https://documentation.immuta.com/SaaS/configuration/integrations/databricks/databricks-spark/reference-guides/accessing-data/delta-lake-api) for a list of corresponding Spark SQL calls to use.

## Spark direct file reads

In addition to supporting direct file reads through workspace and scratch paths, Immuta allows direct file reads in Spark for file paths. As a result, users who prefer to interact with their data using file paths or who have existing workflows revolving around file paths can continue to use these workflows without rewriting those queries for Immuta.

When reading from a path in Spark, the Immuta Databricks Spark plugin queries the Immuta Web Service to find Databricks data sources for the current user that are backed by data from the specified path. If found, the query plan maps to the Immuta data source and follows existing code paths for policy enforcement.

Users can read data from individual parquet files in a sub-directory and partitioned data from a sub-directory (or by using a `where` predicate). Expand the blocks below to view examples of reading data using these methods.

<details>

<summary>Read data from an individual parquet file</summary>

To read from an individual file, load a partition file from a sub-directory:

```bash
spark.read.format("parquet").load("s3:/my_bucket/path/to/my_parquet_table/partition_column=01/my_file.parquet")
```

</details>

<details>

<summary>Read partitioned data from a sub-directory</summary>

To read partitioned data from a sub-directory, load a parquet partition from a sub-directory:

```bash
spark.read.format("parquet").load("s3:/my_bucket/path/to/my_parquet_table/partition_column=01")
```

Alternatively, load a parquet partition using a `where` predicate:

```bash
spark.read.format("parquet").load("s3:/my_bucket/path/to/my_parquet_table").where("partition_column=01")Read partitioned data from a sub-directory
```

</details>

### Limitations

* Direct file reads for Immuta data sources only apply to data sources created from tables, not data sources created from views or queries.
* If more than one data source has been created for a path, Immuta will use the first valid data source it finds. It is therefore not recommended to use this integration when more than one data source has been created for a path.
* In Databricks, multiple input paths are supported as long as they belong to the same data source.
* CSV-backed tables are not currently supported.
* Loading a `delta` partition from a sub-directory is not recommended by Spark and is not supported in Immuta. Instead, use a `where` predicate:

  ```bash
  # Not recommended by Spark and not supported in Immuta
  spark.read.format("delta").load("s3:/my_bucket/path/to/my_delta_table/partition_column=01")

  # Recommended by Spark and supported in Immuta.
  spark.read.format("delta").load("s3:/my_bucket/path/to/my_delta_table").where("partition_column=01")

  ```

## User impersonation

User impersonation allows Databricks users to query data as another Immuta user. To impersonate another user, see the [Impersonate a user page](https://documentation.immuta.com/SaaS/govern/secure-your-data/data-consumers/user-impersonation#databricks-spark).
