Warehouse Sizing Recommendations

The warehouse you select when configuring the Snowflake integration uses compute resources to set up the integration, register data sources, orchestrate policies, and run jobs like sensitive data discovery. Snowflake credit charges are based on the size of and amount of time the warehouse is active, not the number of queries run.

This document prescribes how and when to adjust the size and scale of clusters for your warehouse to manage workloads so that you can use Snowflake compute resources the most cost effectively.

In general, increase the size of and number of clusters for the warehouse to handle heavy workloads and multiple queries. Workloads are typically lighter after data sources are onboarded and policies are established in Immuta, so compute resources can be reduced after those workloads complete.

Integration and data source registration warehouse use

The Snowflake integration uses warehouse compute resources to sync policies created in Immuta to the Snowflake objects registered as data sources and, if enabled, to run sensitive data discovery and schema monitoring. Follow the guidelines below to adjust the warehouse size and scale according to your needs.

  • Increase the size of and number of clusters for the warehouse during large policy syncs, updates, and changes.

  • Enable auto-suspend and auto-resume to optimize resource use in Snowflake. In the Snowflake UI, the lowest auto suspend time setting is 5 minutes. However, through SQL query, you can set auto_suspend to 61 seconds (since the minimum uptime for a warehouse is 60 seconds). For example,

    ALTER WAREHOUSE "WH_NAME" SET WAREHOUSE_SIZE = 'XSMALL' AUTO_SUSPEND = 61 AUTO_RESUME = TRUE MIN_CLUSTER_COUNT = 1 MAX_CLUSTER_COUNT = 2 SCALING_POLICY = 'STANDARD' COMMENT = '';
  • Sensitive data discovery uses compute resources for each table registered if it is enabled. Consider disabling sensitive data discovery when registering data sources if you have an external catalog available or a tagging strategy in place.

  • Register data before creating global policies. By default, Immuta does not apply a subscription policy on registered data (unless an existing global policy applies to it), which allows Immuta to only pull metadata instead of also applying policies when data sources are created. Registering data before policies are created reduces the workload and the Snowflake compute resources needed.

  • Begin onboarding with a small dataset of tables, and then review and monitor query performance in the Snowflake Query Monitor. Adjust the virtual warehouse accordingly to handle heavier loads.

  • Schema monitoring uses the compute warehouse that was employed during the initial ingestion. If you expect a low number of new tables or minimal changes to the table structure, consider scaling down the warehouse size.

  • Resize the warehouse after after data sources are registered and policies are established. For example,

    ALTER WAREHOUSE "INTEGRATION_WH" SET WAREHOUSE_SIZE = 'XSMALL' AUTO_SUSPEND = 120 AUTO_RESUME = TRUE MIN_CLUSTER_COUNT = 1 MAX_CLUSTER_COUNT = 2 SCALING_POLICY = 'STANDARD'; 

For more details and guidance about warehouse sizing, see the Snowflake Warehouse Considerations documentation.

Identifying bulk jobs and heavy workloads

Even after your integration is configured, data sources are registered, and policies are established, changes to those data sources or policies may initiate heavy workloads. Follow the guidelines below to adjust your warehouse size and scale according to your needs.

  • Review your Snowflake query history to identify query performance and bottlenecks.

  • Check how many credits queries have consumed:

    SELECT h.* FROM "SNOWFLAKE"."ACCOUNT_USAGE"."QUERY_HISTORY" h
    INNER JOIN "SNOWFLAKE"."ACCOUNT_USAGE"."SESSIONS" s
    ON s.session_id = h.session_id
    WHERE GET(parse_json(s.client_environment), 'APPLICATION') = 'IMMUTA' limit 25;
  • After reviewing query performance and cost, implement strategies above to adjust your warehouse.

Last updated