Content Summary: This page shows how to determine when Immuta background jobs will complete.
Query Overview
Below is a SQL query, which will help determine when some background jobs will be complete.
Note: This query unions the job and archive table. Adjust the top interval appropriately for your needs. In addition, collections of jobs aren’t typically differentiated. For example, if you hit LDAP three times, that is one collection of jobs, not three.
Caveats
Avg Time: The value for the average time will have a standard deviation applied to it.
Duration: How long it has been running for. If this reads towards 12 hours, then this value is longer than the reported value.
Until Complete: How long it will take to complete the remaining queue. This is assuming the job is the only active job to utilize all of the web pods.
When EDT: When the jobs will be completed. This is assuming the job is the only active job to utilize all of the web pods.
Instructions
Run the following SQL query to investigate Immuta's background jobs.
WITH--FILTER by an INTERVAL or a START timestampvars AS (SELECTNOW() AS now_dt, NOW() - INTERVAL '4 hours'ASstart, NOW()),--vars AS (SELECT NOW() AS now_dt, TO_TIMESTAMP('2021-11-23 12:05:00', 'YYYY-MM-DD hh24:mi:ss')::timestamp without time zone at time zone 'EDT' AS start, NOW()),all_jobs_raw AS (SELECTname, state, startedon, createdon, completedon, dataFROM pgboss.job--UNION SELECT name, state, startedon, createdon, completedon, data FROM pgboss.archive),all_jobs AS (SELECTCASEWHENname='automaticSubscription'ANDdata->>'profileId'ISNULLTHEN'autoSub - ByDataSource'WHENname='automaticSubscription'ANDdata->>'profileId'IS NOT NULLTHEN'autoSub - ByUser'WHENname ILIKE 'nativeSqlProfileRefresh_Snowflake%'THEN'nativeSqlProfileRefresh_Snowflake%'WHENname ILIKE 'nativeSqlProfileRefresh_Redshift%'THEN'nativeSqlProfileRefresh_Redshift%'WHENname ILIKE 'nativeSqlDataSourceSync_Snowflake%'THEN'nativeSqlDataSourceSync_Snowflake%'WHENname ILIKE 'nativeSqlDataSourceSync_Redshift%'THEN'nativeSqlDataSourceSync_Redshift%'ELSEnameENDASname, state, startedon, completedonFROM all_jobs_raw INNER JOIN vars ON TRUEWHERE all_jobs_raw.createdon > vars.startANDnameNOTLIKE'\_\_%'ANDnameNOT ILIKE 'cleanUp%'ANDnameNOT ILIKE 'initiateScheduled%'ANDnameNOTIN ('bootstrapImmutaQueryEngine', 'checkForRollOver', 'checkInWithLicenseServer', 'clearMaxTTLTokens','columnEvolutionCheck', 'createPolicySearchRecords','dataSourceTest', 'entitlements_cache_job_root', 'entitlements_cache_job_worker','executeCustomerMetricsQueries', 'expirePolicyAdjustments','expirePolicyCertifications', 'loadIamGroups', 'nativeSqlMigration_Redshift', 'recomputeDslFromJsonPolicies','removeIamAuths', 'scheduleDbtUpdates','scheduleSchemaEvolutionChecks', 'schemaProjectsQueryableMigration', 'visibilitySchemaUpdate') -- ignore list--AND (name ILIKE '%redshift%' OR name ILIKE '%bulk%') -- adjust to filter down to specific jobs),distinct_jobs AS ( SELECT DISTINCTnameFROM all_jobs),completed_jobs AS (SELECTname,COUNT(*) AS total,AVG(completedon - startedon) AS average_time,MIN(completedon) AS min_completedon,MAX(completedon) AS max_completedonFROM all_jobsWHEREstate='completed'GROUP BYname),active_jobs AS ( SELECTname, count(*) AS total FROM all_jobs WHEREstate='active'GROUP BYname ),queued_jobs AS ( SELECTname, count(*) AS total FROM all_jobs WHEREstate='created'GROUP BYname ),expired_jobs AS ( SELECTname, count(*) AS total FROM all_jobs WHEREstate='expired'GROUP BYname ),failed_jobs AS ( SELECTname, count(*) AS total FROM all_jobs WHEREstate='failed'GROUP BYname )SELECT distinct_jobs.name as"Job Name",--vars.start AT TIME ZONE 'EDT' AS "Filter EDT", completed_jobs.min_completedon ATTIMEZONE'EDT'AS"Earliest Job EDT",--completed_jobs.max_completedon AT TIME ZONE 'EDT' AS "Latest Job EDT", TO_CHAR(completed_jobs.max_completedon - completed_jobs.min_completedon, 'HH24:MI:SS.MS' ) AS"Duration", TO_CHAR(completed_jobs.total, 'FM9,999,999,999') AS"# Completed", TO_CHAR(completed_jobs.average_time, 'MI:SS.MS') AS"Avg Time", TO_CHAR(active_jobs.total, 'FM9,999,999,999') AS"# Active", TO_CHAR(queued_jobs.total, 'FM9,999,999,999') AS"# Queued", TO_CHAR(expired_jobs.total, 'FM9,999,999,999') AS"# Expired", TO_CHAR(failed_jobs.total, 'FM9,999,999,999') AS"# Failed", ((queued_jobs.total / active_jobs.total) * completed_jobs.average_time) AS"Until Complete"--(vars.now_dt + ((queued_jobs.total / active_jobs.total) * completed_jobs.average_time)) AT TIME ZONE 'EDT' AS "When EDT"FROM distinct_jobsINNER JOIN vars ON TRUELEFT OUTER JOIN completed_jobs ON distinct_jobs.name = completed_jobs.nameLEFT OUTER JOIN active_jobs ON distinct_jobs.name = active_jobs.nameLEFT OUTER JOIN queued_jobs ON distinct_jobs.name = queued_jobs.nameLEFT OUTER JOIN expired_jobs ON distinct_jobs.name = expired_jobs.nameLEFT OUTER JOIN failed_jobs ON distinct_jobs.name = failed_jobs.nameORDER BY queued_jobs.total DESCNULLSLAST;
Demo Video
For further instruction on how to monitor background jobs, refer to this video.