Autothresholds

Bigeye uses thresholds to alert data teams to anomalies that need attention. While Bigeye users can apply their thresholds using methods such as standard deviation or constants, Autothresholds uses a proprietary machine-learning engine to generate thresholds for every data attribute you track automatically, giving you meaningful and actionable alerts with zero manual effort.

Autothresholds learn from your historical data, factors in seasonality and trend, and can adapt to natural changes in your data over time.

Autothresholds are periodically calculated using the following process:

Analyze the underlying structure of the series with some preliminary statistical tests.
Perform a blind prediction test with various techniques and select the most accurate.
Analyze past data to develop a model for the uncertainty of future values.
Integrate forecasts, information about the underlying structure, and uncertainty to calculate boundaries for the limits of “expected” behavior.
Adjust the expected range for user settings, such as sensitivity and bounds.

Configure Autothresholds

Set the training history and training frequency for autothresholds to get the desired alerts.

Autothresholds training history

Set the number of days of history that Bigeye’s Autothresholds must use to generate thresholds. To configure the training window, navigate to Advanced Settings > Autothresholds > Autothresholds training history and enter the days. The default training window is 21 days, which indicates all observed metric values in the past 21 days are used in model training.

When you enter larger numbers for Autothresholds training history, you include more history to determine trends. If this number exceeds your autometrics.backfill.days number, the metric needs additional time to generate thresholds.

Note that Bigeye can automatically backfill metric history if there is a row creation time set on the table. This enables Autothresholds immediately without waiting to build a training period. If there is no suitable row creation time for the table or the backfill is unavailable, the application generates Autothresholds after the metric has a sufficient amount of training data.

Authothresholds training frequency

Autothresholds periodically update to reflect the latest changes in your data, adding new data to the training history. In addition, Autothresholds are refreshed on a schedule to adapt to changes in data and to maximize accuracy.

To configure the autothresholds training frequency, navigate to Advanced Settings > Autothresholds > Autothresholds training frequency and set the cadence. Under default settings, Autothresholds pull in new data and retune models every 24 hours.

Adjust Autothresholds' Sensitivity

Autothreshold sensitivity can be adjusted after metric creation in case the metric is alerting too often or not enough. If the threshold is set wide or extra wide, it increases the bounds and produces fewer alerts; thresholds set too narrow decrease the bounds and alert more frequently.

To adjust autothreshold sensitivity, select the relevant metrics under Catalog, and then select Edit from the Action dropdown. In the Edit metric modal, click Thresholds to set the sensitivity to any of these values:

Narrow
Medium
Wide
Extra-wide

Set threshold bounds

Sometimes, you only want to track the upper or lower bounds for a metric. For example, freshness metrics are upper bound only by default - meaning they alert when the data is late but not if it is delivered earlier than usual. Similarly, for some columns, you may not care if the percent of NULL values drops, but you do want to be notified if it increases. You can adjust autothreshold bound settings to upper and lower, upper only or lower only when editing a metric.

Autothresholds adapts to seasonality

Autothresholds detect and fit against seasonal patterns during the preliminary analysis of metric history and during blind-prediction model selection, integrating industry-leading and proprietary statistical and ML techniques.

Autothresholds follow classical statistical forecasting, which requires three or more cycles of a pattern to make strong inferences about seasonal patterns. For example, if you want to model the difference between Friday and Saturday web traffic, at least three weeks' worth of data must be within the training history.

This is partly why, by default, Autothresholds include three weeks of data in training history.

Autothresholds learns from user feedback

Users can provide feedback on autothreshold predictions when resolving issues in Bigeye. These annotations affect autothreshold sensitivity and training data so that autothresholds can better match business expectations over time.

If an issue is a false positive or considered normal, the user can indicate that by closing the issue as a “bad alert.” In these cases, autothresholds adapt to consider those points as expected behavior. Specifically, the previously alerting points are included in the training history, and the sensitivity is adjusted up one level.

If the issue accurately reflects a significant change in the underlying data, the user can indicate that by closing the issue as a “good alert.” They can specify whether it is the new normal moving forward by selecting “adapt thresholds,” in which case the previously alerting points are included in the training history. Alternatively, users can indicate they expect data to return to previous values by selecting “do not adapt thresholds,” in which case the alerting points are excluded from training history.