Databricks

Audience: Data Users

Content Summary: This page offers a tutorial on how to query data within the Databricks integration.

Prerequisites:

Query Data with Python

  1. Create a new workspace.

  2. Query the Immuta-protected data, which takes the form of database.table_name:

    • Database: The database that houses the backing tables of your Immuta data sources.

    • Table Name: The name of the table backing your Immuta data sources.

  3. Run your query, it should look something like:

    df = spark.sql('select * from database.table_name')
    df.show()

Query Data with SQL

  1. Create a new workspace.

  2. Query the Immuta-protected data, which takes the form of database.table_name:

    • Database: The database that houses the backing tables of your Immuta data sources.

    • Table Name: The name of the table backing your Immuta data sources.

  3. Run your query. It should look something like this:

    select * from database.table_name;

Query Data with SparkR

Establish the User's Identity

  1. Create a new workspace.

  2. Run:

    library(SparkR)

Run a Query

  1. In the same workspace, but a different cell, query the Immuta-protected data, which takes the form of database.table_name:

    • Database: The database that houses the backing tables of your Immuta data sources.

    • Table Name: The name of the table backing your Immuta data sources.

  2. Run your query. It should look something like this:

    df <- SparkR::sql("select * from database.table_name")
    SparkR::head(df)

Query Data with Scala

  1. Query the Immuta-protected data, which takes the form of database.table_name:

    • Database: The database that houses the backing tables of your Immuta data sources.

    • Table Name: The name of the table backing your Immuta data sources.

  2. Run your query. It should look something like this:

    val sqlDF = spark.sql("select * from database.tablename")
    sqlDF.show()

Copyright © 2014-2024 Immuta Inc. All rights reserved.