Skip to content

You are viewing documentation for Immuta version 2023.1.

For the latest version, view our documentation for Immuta SaaS or the latest self-hosted version.

Determine Long-Running Background Jobs

Audience: Data Users

Content Summary: This page shows users how to identify background jobs that take too long to run.

Query Overview

Below is a SQL query, which will help determine background jobs that take too long to run.

Note: This query unions the job and archive table. Adjust the top interval appropriately for your needs.

Instructions

Run the following SQL query to determine background jobs that take too long to run.

WITH
--FOR INTERVAL USAGE ===> NOW() - INTERVAL '4 hours' AS start_dt
--FOR SPECIFIC DATETIME USAGE ===> TO_TIMESTAMP('2021-11-23 12:05:00', 'YYYY-MM-DD hh24:mi:ss')::timestamp without time zone at time zone 'EDT' AS start_dt
vars AS (
    SELECT
    NOW() AS end_dt,
    NOW() - INTERVAL '7 days' AS start_dt
),
all_jobs_raw AS (
    SELECT name, state, startedon, createdon, completedon, data FROM pgboss.job
    UNION SELECT name, state, startedon, createdon, completedon, data FROM pgboss.archive -- COMMENT OUT THIS TABLE IF LESS THAN 12 HOURS AND RUNNING SLOW
),
all_jobs AS (
    SELECT
        CASE
            WHEN name = 'automaticSubscription' AND data->>'profileId' IS NULL THEN 'autoSub - ByDataSource'
            WHEN name = 'automaticSubscription' AND data->>'profileId' IS NOT NULL THEN 'autoSub - ByUser'
            WHEN name ILIKE 'nativeSqlProfileRefresh_Snowflake%' THEN 'nativeSqlProfileRefresh_Snowflake%'
            WHEN name ILIKE 'nativeSqlProfileRefresh_Redshift%' THEN 'nativeSqlProfileRefresh_Redshift%'
            ELSE name
        END AS name, state, startedon, completedon
    FROM all_jobs_raw INNER JOIN vars ON TRUE
    WHERE all_jobs_raw.createdon > vars.start_dt and all_jobs_raw.createdon < vars.end_dt
),
batches AS (
    SELECT DISTINCT name, COUNT(*) AS job_count, AVG(completedon - startedon) AS average_time
    FROM all_jobs GROUP BY name)
SELECT *, job_count * average_time AS total_time, end_dt - start_dt AS all_time
FROM batches
JOIN vars ON TRUE
ORDER BY total_time DESC;