As a data engineer or data reliability engineer, you must quickly decide how to handle data quality issues that Bigeye detects. An issue is generated when a metric starts alerting. Changing the issue's status enables you to focus on the urgent problems, avoid duplicate efforts from teammates, and ease reporting progress in resolving the problems to your stakeholders.
Bigeye Issues have four statuses, each with its purpose:
- Triage: A new issue is created in this state by default when a metric detects a value outside the threshold. The status of issues in the triage state remains unchanged unless you act upon them.
- Acknowledged (ack'd): Setting an issue to this state implies that you have looked at the issue and that an active investigation is underway. The status of issues in the acknowledged state remains unchanged unless you act upon them.
- Monitoring: Setting an issue to this state means that a metric must return to a healthy state in future runs before Bigeye autocloses the issue.
- Closed: The issue is resolved because the metric has returned to a healthy value.
You can view your issues by selecting the Issues tab from the top of the Bigeye homepage. The default view is by the table, as shown in the image below. The badges show the number of open issues and their status.
Click the table name to drill into the issues on that table. This view presents issues on the table and the lineage highlighting issues on upstream and downstream tables.
Click View by to toggle to a list of individual issues. This view lists the issues and presents the current Open and Closed issue counts.
When you click an issue, you can see a time series visualization of the underlying alerting metric and a timeline of the issue.
Giving feedback to the Autothresholds model
Once you have reviewed the issue, you can changes its status.
Move an issue into the Acknowledged state mutes the issue for 24hrs and indicate to others on your team that you are looking into the root cause.
If the issue's metric has Autothresholds, you must give feedback when moving an issue to Closed or Monitoring states. This helps Autothresholds adapt more precisely to your data. The available options are:
- Maintain a threshold: The alerting data point is an anomaly but must return to the previous pattern and within the previous thresholds.
- Adapt threshold: The alerting data point is outside the previous thresholds but should not alert moving forward. It may be due to an autothresholds overfitting and being too sensitive or a new normal due to a change in the underlying data.
Feedback is a required field to close the issue based on autometrics.
Interacting with the issue lists
In order to help you address your most important data problems first, Issues now have a priority score of 1-100, where 100 is the highest possible priority. The Issues view is sorted by default in Priority order, though you can choose to sort by created date as well if you prefer. Priorities are broken into three categories:
You can see an Issue’s priority score by hovering your mouse over the priority icon:
Priority categorization (high/med/low) is based on alerts across all Bigeye workspaces, so if you have more 'high' priority issues, those alerts are among the most anomalous of all workspaces. If you have more 'low' issues, then give yourself a pat on the back because your data is among the most stable data of all users!
Priority scores (1-100) are currently based on an alert’s severity. The severity is a measure of how far away the metric’s actual value is from the expected (predicted) value. The more anomalous the metric value, the higher the severity and, thus, the priority. Because severity values can range from 0 to very large numbers, we normalize the severity to a 1-100 Priority score for simplicity. In the future, priority scores will also consider other factors (like how popular a table is, whether it is in an SLA, etc), and we'd love to hear your feedback on what you'd like to see included.
When an issue has more than one alert - and those alerts have varying severity values - the Priority is based on the highest severity score of all the alerts.
The Priority of a table is equal to the highest issue priority on the table.
Updated about 1 month ago