Data Source Health Checks Reference Guide

When an Immuta data source is created, background jobs use the connection information provided to compute health checks dependent on the type of data source created and how it was configured. These data source health checks include the

blob crawl status: indicates whether the blob was successfully crawled.
column detection status: indicates whether the job run to determine if a column was added or removed from the remote table registered as an Immuta data source was successful.
external catalog link status: indicates whether or not the external catalog was successfully linked to the data source.
fingerprint generation status: indicates whether or not the data source fingerprint was successfully generated. Fingerprints are only available for Snowflake data sources.
framework classification status: indicates whether classification was successfully run on the data source to determine the sensitivity of the data source.
global policy applied status: indicates whether global policies were successfully applied to the data source.
high cardinality calculation status: indicates whether the data source's high cardinality column was successfully calculated.
SQL sync status (for Snowflake data sources): indicates whether Snowflake governance policies have been successfully synced.
SQL view creation status (for Amazon Redshift Spectrum, Azure Synapse Analytics, or Google BigQuery data sources): indicates whether views were properly created for tables registered in Immuta.
row count status: indicates whether the number of rows in the data source was successfully calculated.
schema detection status: indicates whether the job run to determine if a remote table was added or removed from the schema was successful.
sensitive data discovery status: indicates whether identification was successfully run on the data source.

After these jobs complete, the health status for each is updated to indicate whether the status check passed, was skipped, is unknown, or failed.

These background jobs can be disabled during data source creation by adding a specific tag to prevent automatic table statistics. This prevent statistics tag can be set on the app settings page by a system administrator. However, with automatic table statistics disabled these policies will be unavailable until the data source owner manually generates the fingerprint (available only for Snowflake data sources):

Masking with format preserving masking
Masking using randomized response

Unhealthy Databricks data sources

Unhealthy data sources may fail their row count queries if they run against a cluster that has the Databricks query watchdog enabled.

Limitations

Data sources with over 1600 columns will not have health checks run, but will still appear as healthy. The health check cannot be run automatically or manually.

PreviousDisable Immuta from Sampling Raw Data NextSchema Monitoring

Last updated 3 months ago

Was this helpful?

hashtagUnhealthy Databricks data sources

hashtagLimitations

Unhealthy Databricks data sources

Limitations