Skip to content

Data Source Health Checks

When an Immuta data source is created, background jobs use the connection information provided to compute health checks dependent on the type of data source created and how it was configured. These data source health checks include the

  • blob crawl status: indicates whether the blob was successfully crawled.
  • column detection status: indicates whether the job run to determine if a column was added or removed from the remote table registered as an Immuta data source was successful.
  • external catalog link status: indicates whether or not the external catalog was successfully linked to the data source.
  • fingerprint generation status: indicates whether or not the data source fingerprint was successfully generated.
  • framework classification status: indicates whether classification was successfully run on the data source to determine the sensitivity of the data source.
  • global policy applied status: indicates whether global policies were successfully applied to the data source.
  • high cardinality calculation status: indicates whether the data source's high cardinality column was successfully calculated.
  • native SQL sync status (for Snowflake data sources): indicates whether Snowflake governance policies have been successfully synced.
  • native SQL view creation status (for Snowflake and Redshift data sources): indicates whether native views were properly created for Redshift and Snowflake tables registered in Immuta.
  • row count status: indicates whether the number of rows in the data source was successfully calculated.
  • schema detection status: indicates whether the job run to determine if a remote table was added or removed from the schema was successful.
  • sensitive data discovery status: indicates whether sensitive data discovery was successfully run on the data source.

After these jobs complete, the health status for each is updated to indicate whether the status check passed, was skipped, is unknown, or failed.

These background jobs can be disabled during data source creation by adding a specific tag to prevent automatic table statistics. This prevent statistics tag can be set on the app settings page by a system administrator. However, with automatic table statistics disabled these policies will be unavailable until the data source owner manually generates the fingerprint:

  • Masking with format preserving masking
  • Masking with k-anonymization
  • Masking using randomized response

Unhealthy Databricks data sources

Unhealthy data sources may fail their row count queries if they run against a cluster that has the Databricks query watchdog enabled.

Limitations

Data sources with over 1600 columns will not have health checks run, but will still appear as healthy. The health check cannot be run automatically or manually.