Freshness and Volume (Pipeline Reliability)

Freshness and Volume are Bigeye's pipeline reliability metrics. They give you confidence that your data are showing up on time and in the amount expected. There are two ways that Freshness and Volume can be used -- both metadata-based and column-based are available.

Metadata-based Freshness and Volume

Metadata-based Freshness and Volume are available only for Snowflake, BigQuery, and Redshift. Bigeye measures the freshness of your tables by reading from system metadata tables.

For Freshness, it records the timestamps of all of the loads it sees. For the first run of a Freshness metric, Bigeye looks back to get all of the loads over the last 28 days. On subsequent executions, Bigeye looks back over the last 2 days. If using Autothresholds, Bigeye will learn the typical delay between loads, and alert you if there hasn't been a new load within that time.

For Volume, Bigeye will similarly look back over your data and aggregate the number of rows loaded into hourly buckets. 28 days is used as the lookback for the first run and 2 days are used for subsequent runs. If using Autothresholds, Bigeye will learn the loading pattern for your data and alert you if loads don't fit that pattern.

Column-based Freshness and Volume

For data source types which don't have system metadata tables, you can use Column-based Freshness and Volume measurements. In Bigeye, these are called Hours Since Latest Value (HSLV) and Row Count. Note that these metrics are also available for Snowflake, BigQuery, and Redshift in addition to metadata-based Freshness and Volume.

To use HSLV, you must have a Row Creation Time (RCT) set on your table or on the metric. This can be done while deploying the metric. When the metric is evaluated, you will get a measurement of the the age of the latest data point in the table. For example, let's assume I have a table Table_A with a RCT of insert_time and the most recent timestamp in insert_timeis 01:30 AM. If my metric runs at 5 AM, the HSLV metric value for that point will be 3 hours and 30 minutes. If no new data comes in and the metric runs again at 11 AM, the HSLV metric value for that run will be 9 hours and 30 minutes.

Row count can use either the Data Time or Data Time Window window type. If it is configured with the Data Time window type, it will look back over the configurable lookback window (default 2 days) and count the number of rows with a RCT in that lookback window. If it is configured as Data Time Window, it will do the same lookback, but bucket the results into configurable windows (days or hours, default is days).