Skip to content

You are viewing documentation for Immuta version 2023.1.

For the latest version, view our documentation for Immuta SaaS or the latest self-hosted version.

Query Your Data Guide

Prerequisite: Before using this walkthrough, please ensure that you’ve first done the Schema Monitoring and Automatic Sensitive Data Discovery walkthrough.

Querying Databricks with Notebooks

Prerequisites: Ensure you are using an Immuta-enabled cluster, as described in Part 3 of the POV Data Setup.

You must consider the Databricks user you are using. If you are a Databricks Admin, certain scenarios may allow you to bypass Immuta security controls. In Part 3 of the POV Data Setup you may have created a non-admin user, and it’s best to use that user in lieu of (or in combination with) your admin user, especially when querying your data in Databricks. When using the admin user, here are scenarios to consider, but the workflow that sent you to this page will describe if you should use the admin user in your queries or not:

About policy enforcement

Immuta enforcement occurs by rewriting the SQL query under the covers prior to the Databricks catalyst planning. This means all data interactions in Databricks must occur through a metastore SQL query - no direct file access. If you attempt direct file access to your cloud storage (S3, ADLS, Google Storage) it will be blocked. The below notebook will demonstrate querying data into a DataFrame in each language using this pattern. Note it is possible to configure direct file access using Immuta, but that is limited to file-level controls only and is not covered in this POV Guide. Please work with your Immuta expert for more information if you want direct file reads.

We highly recommend abstracting structured data with a metastore if you are not already doing so.

Run a query:

  1. Download this Notebook* to run queries.

  2. Consider using an incognito/private window to use two different users (your admin user and the non-admin user you potentially created) to run queries, importing that notebook for both.

*That notebook contains generic queries; some of the walkthroughs may instruct you to run different queries.

Querying Databricks SQL

Prerequisite: Ensure you have configured the Databricks SQL integration with Immuta as described in Part 3 of the POV Data Setup.

About policy enforcement

Immuta enforces policy by creating Databricks SQL views that contain all the policy logic.

  • When moving to production, for everyone but people that need to manage the raw tables, you should REVOKE access from the raw tables and instead only let users query these Immuta-managed views.
  • Those views are created in the database configured in Part 3 of the POV Data Setup
    • Table name of the raw table is maintained (as the view name) with the database appended to the front to avoid collisions.
    • Immuta handles GRANTing access to those views.

Run a query:

  1. Make sure you select SQL from the upper left menu in Databricks.
  2. Click Create → Query.
  3. Select the database name used when configuring the integration in Part 3 of the POV Data Setup.
  4. Run the following queries in the worksheet* (you must include the database):
    • select * from [database name from above].immuta_pov_immuta_fake_credit_card_transactions limit 100;
    • select * from [database name from above].immuta_pov_immuta_fake_hr_data limit 100;
  5. Consider using an incognito/private window to use two different users (your admin user and the non-admin user you potentially created) to run queries.

*These are generic queries, some of the walkthroughs may instruct you to run different queries.

Querying Snowflake

Prerequisite: Ensure you have configured the Snowflake integration with Immuta as described in Part 3 of the POV Data Setup.

About policy enforcement

Immuta enforces policy by creating Snowflake views that contain all the policy logic. Those views are available to the PUBLIC role in Snowflake, which is the foundation of every role. The outcome of this means:

  • When moving to production, for everyone but people that need to manage the raw tables, you should REVOKE access from the raw tables and instead only let users query these Immuta-managed views.
  • Since PUBLIC is the foundation of every role, you no longer need to manage Snowflake roles for your end users, roles can be limited to defining warehouse access.
    • You do not need to explicitly GRANT access to those views, that is handled by Immuta by GRANTing to the PUBLIC role (and further protecting using subscription policies you will configure in a future walkthrough).
  • Those views are created in the database configured in Part 3 of the POV Data Setup.
    • Schema name of the raw table is maintained.
    • Table name of the raw table is maintained (as the view name).

Run a query:

  1. Create a new worksheet.
  2. You must select the appropriate role/database/schema to query the views:
    1. Role: any or just PUBLIC.
    2. Warehouse: any.
    3. Database: the database name used when configuring the integration in Part 3 of the POV Data Setup (default is IMMUTA).
    4. Schema: POV_DATA (or whatever you named the schema in Part 2 of the POV Data Setup).
  3. Run the following queries in the worksheet*:
    1. select * from immuta_fake_credit_card_transactions limit 100;
    2. select * from immuta_fake_hr_data limit 100;
  4. Consider using an incognito/private window to use two different users (your admin user and the non-admin user you potentially created) to run queries.

*These are generic queries, some of the walkthroughs may instruct you to run different queries.

Querying Synapse

Prerequisite: Ensure you have configured the Synapse integration with Immuta as described in Part 3 of the POV Data Setup.

About policy enforcement

Immuta enforces policy by creating Synapse views that contain all the policy logic within a schema (contained in your dedicated SQL pool) that is created in Part 3 of the POV Data Setup.

  • When moving to production, for everyone but people that need to manage the raw tables, you should REVOKE access from the raw tables and instead only let users query these Immuta-managed views. In other words, only give them access to the schema created in Part 3 of the POV Data Setup.
  • Those views are created in the schema configured in Part 3 of the POV Data Setup.
    • Table name of the raw table is maintained with the database name appended to the front to avoid conflicts.
    • Immuta handles GRANTing access to those views.

Run a query:

  1. From Synapse Studio click on the Data menu on the left.
  2. Click on the Workspace tab.
  3. Expand databases and you should see the dedicated pool you created: immuta_pov.
  4. Expand immuta_pov and you should see the schema you created in Part 3 of the POV Data Setup.
  5. Select that schema.
  6. Select New SQL script and then Empty script.
  7. Run the following queries* (note that Synapse does not support LIMIT and the SQL is case sensitive:
    1. SELECT TOP 100 * FROM [schema].immuta_pov_immuta_fake_credit_card_transactions;
    2. SELECT TOP 100 * FROM [schema].immuta_pov_immuta_fake_hr_data;

*These are generic queries, some of the walkthroughs may instruct you to run different queries.

Querying Starburst (Trino)

Prerequisite: Ensure you have configured the Starburst (Trino) integration with Immuta as described in Part 3 of the POV Data Setup.

About policy enforcement

Immuta enforces policy by creating views that contain all the policy logic. Those views are created within a catalog that is created in Part 3 of the POV Data Setup.

The outcome of this means:

  • When moving to production, for everyone but people that need to manage the raw tables, you should REVOKE access from the raw tables and instead only let users query these Immuta-managed views.
  • You do not need to explicitly GRANT access to those views, that is handled by Immuta.
  • Those views are created in the catalog configured in Part 3 of the POV Data Setup.
    • Schema name of the raw table is maintained.
    • Table name of the raw table is maintained (as the view name).

Run a query:

  1. Since Starburst (Trino) does not have a user interface, please use your tool of choice to connect.
  2. Once connected, you should see the catalog that was created in Part 3 of the POV Data Setup.
  3. Expand that and you should see the pov_data schema.
  4. Expand that and you should see the secure views Immuta created, you can run these queries*:
    1. select * from [catalog].pov_data.immuta_fake_credit_card_transactions limit 100;
    2. select * from [catalog].pov_data.immuta_fake_hr_data limit 100;

*These are generic queries, some of the walkthroughs may instruct you to run different queries.

Querying Redshift

Prerequisites:

  1. Ensure you have configured the Redshift integration with Immuta as described in Part 3 of the POV Data Setup.
  2. You must use an RA3 instance type; this is because Immuta requires cross-database views and those are only supported in Redshift RA3 instance types.
    • If you get an error like this when querying the Immuta-generated views, you are NOT on an RA3 instance type:

      SQL Error [500310] [XX000]: [Amazon](500310) Invalid operation: Could not find parent table for alias
      `immuta_pov.pov_data.immuta_fake_credit_card_transactions`.;
      

About policy enforcement

Immuta enforces policy by creating views that contain all the policy logic. Those views are created within a database that is created in Part 3 of the POV Data Setup.

The outcome of this means:

  • When moving to production, for everyone but people that need to manage the raw tables, you should REVOKE access from the raw tables and instead only let users query these Immuta-managed views.
  • You do not need to explicitly GRANT access to those views, that is handled by Immuta.
  • Those views are created in the database configured in Part 3 of the POV Data Setup.
    • Schema name of the raw table is maintained
    • Table name of the raw table is maintained (as the view name)

Run a query:

  1. Since Redshift does not have a user interface, please use your tool of choice to connect.
  2. Once connected, you should see the database that was created in Part 3 of the POV Data Setup.
  3. Expand that and you should see the pov_data schema.
  4. Expand that and you should see the secure views Immuta created, you can run these queries*:
    1. select * from [database].pov_data.immuta_fake_credit_card_transactions limit 100;
    2. select * from [database].pov_data.immuta_fake_hr_data limit 100;

*These are generic queries, some of the walkthroughs may instruct you to run different queries.