Skip to content

Immuta Best Practices (SQL)

Audience: Data Users

Content Summary: This page outlines some best practices for maximum performance from the Immuta Query Engine, which is an abstraction of your underlying database tables. This virtual layer is powerful as a single point to access all data across your organization and apply data policies consistently.

Overview

Immuta has the ability to push down many different queries, operations, and functions depending on the native database. Below are common best practices to consider when using the Immuta Query Engine in order to take advantage of the pushdown optimization and to avoid problematic queries that add latency to the response.

  1. Limit result sets.
  2. Push work down to the native database.
  3. Use unions and window functions sparingly.
  4. Avoid joining across servers or technologies.
  5. Keep data types consistent.

Limit result sets

Unless there’s a good reason to pull back a large result set, try to limit the response. The simplest way to accomplish this is through a WHERE clause that limits the response size or a LIMIT. It is best to use WHERE clauses when possible, since whether or not a LIMIT is pushed down depends on the query structure. Many business intelligence tools append LIMIT statements by default to avoid this issue. Treat Immuta similarly.

Push work down to the native database

You are limited to PostgreSQL syntax when querying through the Immuta Query Engine. For a function to be pushed down, an equivalent function must exist in the native database. Immuta maintains a "Rosetta Stone" of database functions that map PostgreSQL syntax to their analogs in all of the data sources that Immuta supports.

If there is specific database syntax that is not pushed down or does not exist in PostgreSQL, you can create an Immuta data source using the SQL statement option (rather than the SQL table option) and include the unsupported native syntax as the SQL statement for the data source. When you query the data source that was created with a SQL statement you will still query the Immuta layer with PostgreSQL syntax, but the underlying native syntax will also be executed as a subquery on the native database (similar to a view), effectively pushing the syntax down.

Use unions and window functions sparingly

Statements using UNION and most window functions are not currently pushed down, so they will run in the Immuta layer if you are not aggregating or setting a LIMIT on the results. To aggregate or LIMIT results of the UNION, the data must flow back to Immuta. One solution to this problem is to create a view that represents the UNION, expose that view in Immuta, and query that view.

Avoid joining across servers or technologies

Immuta will push down joins between data sources that share the same source database as long as the data sources are created with the same connection details, which include hostname, port, database name, username, and schema.

If a JOIN statement contains data sources that use Kerberos authentication, then the same Immuta user must have been used to create each data source. If an Immuta user has the IMPERSONATE_HDFS_USER permission, they may set their principal to match any data sources that they want to be pushed down. Simply put, at the time of data source creation, all data source details must be the same in order to be pushed down.

Keep data types consistent

Casting, especially when done using CASE statements, can be an expensive operation. Therefore, it's best to have a schema set in the native database, and then allow Immuta to replicate that schema. That is, if the source data are TEXT, make sure that Immuta data type is TEXT. Inconsistencies, especially for DATE data types, can cause performance issues with tools that automatically generate queries.