arrow-left

All pages
gitbookPowered by GitBook
1 of 3

Loading...

Loading...

Loading...

Reference Guides

Redshift Integration

This page provides an overview of the Redshift integration in Immuta. For a tutorial detailing how to enable this integration, see the installation guide.

hashtag
Overview

Redshift is a policy push integration that allows Immuta to apply policies directly in Redshift. This allows data analysts to query Redshift views directly instead of going through a proxy and have per-user policies dynamically applied at query time.

hashtag
Architecture

The Redshift integration will create views from the tables within the database specified when configured. Then, the user can choose the name for the schema where all the Immuta generated views will reside. Immuta will also create the schemas immuta_system, immuta_functions, and immuta_procedures to contain the tables, views, UDFs, and stored procedures that support the integration. Immuta then creates a system role and gives that system account the following privileges:

  • ALL PRIVILEGES ON DATABASE IMMUTA_DB

  • ALL PRIVILEGES ON ALL SCHEMAS IN DATABASE IMMUTA_DB

  • USAGE ON FUTURE PROCEDURES IN SCHEMA IMMUTA_DB.IMMUTA_PROCEDURES

Additionally the PUBLIC role will be granted the following privileges:

  • USAGE ON DATABASE IMMUTA_DB

  • TEMP ON DATABASE IMMUTA_DB

  • USAGE ON SCHEMA IMMUTA_DB.IMMUTA_PROCEDURES

hashtag
Integration type

Immuta supports the Redshift integration as both multi-database and single-database integrations. In either integration type, Immuta supports a single integration with secure views in a single database per cluster.

hashtag
Multi-database integration

If using a multi-database integration, you must use a Redshift cluster with an RA3 node because Immuta requires cross-database views.

hashtag
Single-database integration

If using a single-database integration, all Redshift cluster types are supported. However, because cross-database queries are not supported in any types other than RA3, Immuta's views must exist in the same database as the raw tables. Consequently, the steps for configuring the integration for Redshift clusters with external tables differ slightly from those that don't have external tables. Allow Immuta to create secure views of your external tables through one of these methods:

  • : Instead of creating an immuta database that manages all schemas and views created when Redshift data is registered in Immuta, the integration adds the Immuta-managed schemas and views to an existing database in Redshift.

  • and re-create all of your external tables in that database.

hashtag
Policy enforcement

SQL statements are used to create all views, including a join to the secure view: immuta_system.user_profile. This secure view is a select from the immuta_system.profile table (which contains all Immuta users and their current groups, attributes, projects, and a list of valid tables they have access to) with a constraint immuta__userid = current_user() to ensure it only contains the profile row for the current user. The immuta_system.user_profile view is readable by all users, but will only display the data that corresponds to the user executing the query.

The Redshift integration uses webhooks to keep views up-to-date with Immuta data sources. When a data source or policy is created, updated, or disabled, a webhook will be called that will create, modify, or delete the dynamic view. The immuta_system.profile table is updated through webhooks when a user's groups or attributes change, they switch projects, they acknowledge a purpose, or when their data source access is approved or revoked. The profile table can only be read and updated by the Immuta system account.

hashtag
Integration health status

The status of the integration is visible on the integrations tab of the Immuta application settings page. If errors occur in the integration, a banner will appear in the Immuta UI with guidance for remediating the error.

The definitions for each status and the state of configured data platform integrations is available in the . However, the UI consolidates these error statuses and provides detail in the error messages.

hashtag
Data flow

  1. An Immuta Application Administrator and registers Redshift warehouse and databases with Immuta.

  2. Immuta creates a database inside the configured Redshift ecosystem that contains Immuta policy definitions and user entitlements.

  3. A Data Owner registers Redshift tables in Immuta as .

hashtag
Redshift Spectrum

Redshift Spectrum () allows Redshift users to query external data directly from files on Amazon S3. Because cross-database queries are not supported in Redshift Spectrum, Immuta's views must exist in the same database as the raw tables. Consequently, the steps for configuring the integration for Redshift clusters with external tables differ slightly from those that don't have external tables. Allow Immuta to create secure views of your external tables through one of these methods:

  • : Instead of creating an immuta database that manages all schemas and views created when Redshift data is registered in Immuta, the integration adds the Immuta-managed schemas and views to an existing database in Redshift

  • and re-create all of your external tables in that database.

Once the integration is configured, Data Owners must .

USAGE ON LANGUAGE PLPYTHONU

USAGE ON SCHEMA IMMUTA_DB.IMMUTA_FUNCTIONS

  • USAGE ON FUTURE FUNCTIONS IN SCHEMA IMMUTA_DB.IMMUTA_FUNCTIONS

  • USAGE ON SCHEMA IMMUTA_DB.IMMUTA_SYSTEM

  • SELECT ON TABLES TO public

  • A Data Owner, Data Governor, or Administrator or user in Immuta.
  • Data source metadata, tags, user metadata, and policy definitions are stored in Immuta's Metadata Database.

  • The Immuta Web Service calls a stored procedure that modifies the user entitlements or policies.

  • A Redshift user who is subscribed to the data source in Immuta directly in Redshift through the immuta database and sees policy-enforced data.

  • configure the integration with an existing database that contains the external tables
    configure the integration by creating a new immuta database
    response schema of the integrations API
    configures the Redshift integration
    data sources
    Redshift external tablesarrow-up-right
    configure the integration with an existing database that contains the external tables
    configure the integration by creating a new immuta database
    register Redshift Spectrum data sources using the Immuta CLI or V2 API
    creates or changes a policy
    queries the corresponding table

    Redshift Pre-Configuration Details

    This page describes the Redshift integration, configuration options, and features. For a tutorial to enable this integration, see the installation guide.

    hashtag
    Feature Availability

    Project Workspaces
    Tag Ingestion
    User Impersonation
    Query Audit
    Multiple Integrations

    hashtag
    Prerequisite

    For automated installations, the credentials provided must be a Superuser or have the ability to create databases and users and modify grants.

    hashtag
    Supported Features

    • Redshift datashares

    • Redshift Serverless

    • For configuration and data source registration instructions, see the .

    hashtag
    Authentication Methods

    The Redshift integration supports the following authentication methods to configure the integration and create data sources:

    • Username and Password: Users can authenticate with their Redshift username and password.

    • AWS Access Key: Users can authenticate with an .

    hashtag
    Tag Ingestion

    Immuta cannot ingest tags from Redshift, but you can connect any of these to work with your integration.

    hashtag
    User Impersonation

    circle-info

    Required Redshift privileges

    Setup User:

    • OWNERSHIP ON GROUP IMMUTA_IMPERSONATOR_ROLE

    Impersonation allows users to query data as another Immuta user in Redshift. To enable user impersonation, see the page.

    hashtag
    Multiple Integrations

    Users can enable multiple with a single Immuta tenant.

    hashtag
    Redshift Limitations

    • The host of the data source must match the host of the connection for the view to be created.

    • When using multiple Redshift integrations, a user has to have the same user account across all hosts.

    • Case sensitivity of database, table, and column identifiers is not supported. The must be set to false (default setting) for your Redshift cluster to configure the integration and register data sources.

    hashtag
    Python UDF Specific Limitations

    For most policy types in Redshift, Immuta uses SQL clauses to implement enforcement logic; however Immuta uses Python UDFs in the Redshift integration to implement the following masking policies:

    • Masking using a regular expression

    • Reversible masking

    • Format-preserving masking

    • Randomized response

    The number of Python UDFs that can run concurrently per Redshift cluster is limited to one-fourth of the total concurrency level for the cluster. For example, if the Redshift cluster is configured with a concurrency of 15, a maximum of three Python UDFs can run concurrently. After the limit is reached, Python UDFs are queued for execution within workload management queues.

    The SVL_QUERY_QUEUE_INFO view in Redshift, which is visible to a Redshift superuser, summarizes details for queries that spent time in a workload management (WLM) query queue. Queries must be completed in order to appear as results in the SVL_QUERY_QUEUE_INFO view.

    If you find that queries on Immuta-built views are spending time in the workload management (WLM) query queue, you should either edit your Redshift cluster configuration to increase concurrency, or use fewer of the masking policies which leverage Python UDFs. For more information on increasing concurrency, see the Redshift docs on implementing .

    CREATE GROUP

    Immuta System Account:

    • GRANT EXECUTE ON PROCEDURE grant_impersonation

    • GRANT EXECUTE ON PROCEDURE revoke_impersonation

    ❌

    ❌

    ✅

    ❌

    ✅

    Redshift Spectrumarrow-up-right
    configuration page
    AWS access keyarrow-up-right
    supported external catalogs
    User Impersonation
    Redshift integrations
    enable_case_sensitive_identifier parameterarrow-up-right
    workload managementarrow-up-right