Skip to content

Redshift Access Pattern

Audience: System Administrators

Content Summary: This page describes the Native Redshift access pattern, through which Immuta applies policies directly in Redshift.

For installation instructions, see the installation guide.

Overview

With the Native Redshift access pattern, Immuta applies policies directly in Redshift. This allows data analysts to query their data natively with Redshift views instead of going through the Immuta Query Engine and have per-user policies dynamically applied at query time. Users can configure multiple native Redshift integrations in a single instance of Immuta.

Data Flow

  1. An Immuta Application Administrator configures the Redshift integration and registers Redshift warehouse and databases with Immuta.
  2. Immuta creates a database inside the configured Redshift ecosystem that contains Immuta policy definitions and user entitlements.
  3. A Data Owner registers Redshift tables in Immuta as data sources. A Data Owner, Data Governor, or Administrator creates or changes a policy or user in Immuta.
  4. Data source metadata, tags, user metadata, and policy definitions are stored in Immuta's Metadata Database.
  5. The Immuta Web Service calls a stored procedure that modifies the user entitlements or policies.
  6. A Redshift user who is subscribed to the data source in Immuta queries the corresponding table directly in Redshift through the immuta database and sees policy-enforced data.

Redshift Integration

Syncing and Policy Enforcement

The Redshift access pattern uses webhooks to keep views up-to-date with Immuta data sources. When a data source or policy is created, updated, or disabled, a webhook will be called that will create, modify, or delete the dynamic view.

SQL statements are used to create all views, including a join to the secure view: immuta_system.user_profile. This secure view is a select from the immuta_system.profile table (which contains all Immuta users and their current groups, attributes, projects, and a list of valid tables they have access to) with a constraint immuta__userid = current_user() to ensure it only contains the profile row for the current user. The immuta_system.user_profile view is readable by all users, but will only display the data that corresponds to the user executing the query.

Note: The immuta_system.profile table is updated through webhooks when a user's groups or attributes change, they switch projects, they acknowledge a purpose, or when their data source access is approved or revoked. The profile table can only be read and updated by the Immuta system account.

Configuring

The native integration will create views from the tables within the database specified when configured. Then, the user can choose the name for the schema where all the Immuta generated views will reside. Immuta will also create the schemas immuta_system, immuta_functions, and immuta_procedures to contain the tables, views, UDFs, and stored procedures that support the native integration.

Limitations

  • The host of the data source must match the host of the native connection for the native view to be created.
  • Redshift with Immuta must be an RA3 instance type to support the necessary cross database joins.
  • When using multiple Redshifts, a user will have to have the same user account across all hosts.

Python UDF Specific Limitations

For most policy types in Redshift, Immuta uses SQL clauses to implement enforcement logic; however Immuta uses Python UDFs in the Redshift integration to implement the following masking policies:

  • Masking using a regular expression
  • Reversible masking
  • Format-preserving masking
  • Randomized response

The number of Python UDFs that can run concurrently per Redshift cluster is limited to one-fourth of the total concurrency level for the cluster. For example, if the Redshift cluster is configured with a concurrency of 15, a maximum of three Python UDFs can run concurrently. After the limit is reached, Python UDFs are queued for execution within workload management queues.

The SVL_QUERY_QUEUE_INFO view in Redshift, which is visible to a Redshift superuser, summarizes details for queries that spent time in a workload management (WLM) query queue. Queries must be completed in order to appear as results in the SVL_QUERY_QUEUE_INFO view.

If you find that queries on Immuta-built views are spending time in the workload management (WLM) query queue, you should either edit your Redshift cluster configuration to increase concurrency, or use fewer of the masking policies which leverage Python UDFs. For more information on increasing concurrency, see the Redshift docs on implementing workload management.