Freshness and Volume (Pipeline Reliability)

Freshness and Volume are Bigeye's pipeline reliability metrics. They give you confidence that your data are showing up on time and in the amount expected. There are a few types of Freshness and Volume metrics in Bigeye.

Freshness and Volume

Freshness and Volume are available only for database tables (i.e. not views) in Snowflake, BigQuery, Databricks, and Redshift.

🚧
Snowflake Enterprise Edition Dependency
Metadata-based Freshness and Volume are only available if you are using the Enterprise Edition of Snowflake and not on Snowflake Standard Edition. If you are on Snowflake standard, you must use column-based Row Count and Hours Since Latest Value for volume and freshness.

Bigeye measures the freshness of your tables by reading time timestamps of loads into your tables from system metadata tables.

For Freshness, Bigeye records the timestamps of all of the loads it sees, and store that with hourly granularity. For the first run of a Freshness metric, Bigeye looks back to get all of the loads over the last 28 days. On subsequent executions, Bigeye looks back over the last 2 days. If using Autothresholds, Bigeye will learn the typical delay between loads, and alert you if there hasn't been a new load within that time.

For Volume, Bigeye will similarly look back over your data and aggregate the number of rows loaded into hourly buckets. For the first run, again Bigeye looks back to get all of the loads over the last 28 days and 2 days are used for subsequent runs. If using Autothresholds, Bigeye will learn the loading pattern for your data and alert you if loads don't fit that pattern.

Only one Freshness metric and one Volume metric can be deployed per table.

Freshness (data) and Volume (data)

Freshness (data) and Volume (data) are very similar to Freshness and Volume, but used when system metadata tables are not available. Specifically, this means that they are available only for all tables in sources other than Snowflake, BigQuery, Databricks, and Redshift, as well as views in Snowflake, BigQuery, Databricks, and Redshift. These metrics require a Row Creation Time (RCT) as a reference timestamp, and Bigeye measures the freshness of your tables by using the RCT values to infer the timestamps of loads into your tables.

For Freshness (data) and Volume (data) metrics, Bigeye buckets your data into hourly windows using the configured RCT. For hourly windows which have nonzero records with timestamps in said windows, a load is inferred to have happened. Beyond that, Freshness (data) and Volume (data) behave as Freshness and Volume, including the initial 28 day backfill of data.

Only one Freshness (data) metric and one Volume (data) metric can be deployed per table.

Hours Since Latest Value and Row Count

Hours Since Latest Value (HSLV) and Row Count are another data-based way of calculating Freshness and Volume which can be used on any table, in any source, with the caveat that HSLV requires a timestamp.

To use HSLV, you must have a Row Creation Time (RCT) set on your table or on the metric. This can be done while deploying the metric. When the metric is evaluated, you will get a measurement of the the age of the latest data point in the table. For example, let's assume I have a table Table_A with a RCT of insert_time and the most recent timestamp in insert_timeis 01:30 AM. If my metric runs at 5 AM, the HSLV metric value for that point will be 3 hours and 30 minutes. If no new data comes in and the metric runs again at 11 AM, the HSLV metric value for that run will be 9 hours and 30 minutes.

Row count can use either the Data Time or Data Time Window window type. If it is configured with the Data Time window type, it will look back over the configurable lookback window (default 2 days) and count the number of rows with a RCT in that lookback window. If it is configured as Data Time Window, it will do the same lookback, but bucket the results into configurable windows (days or hours, default is days).

You can deploy multiple HSLV and Row Count metrics on a single table.