Getting Started

In under 5 minutes, start monitoring your most important data in Bigeye.

It’s easy to get started with Bigeye. Take control of your data pipelines today.

Overview

  1. Connect to your source(s) quickly and securely
  2. Deploy Autometrics for instant coverage of your most important data assets
  3. Create Collections to organize and optimize your alerts
  4. Triage Issues to resolve data problems before they impact your business
  5. Explore advanced features to ensure even your business-specific data requirements are monitored

1. Connect your source(s)

You can find our supported sources and instructions to connect each type under Data Sources. In general, we recommend creating a new service account and provisioning it only with the required permissions for the data you wish to monitor with Bigeye. To add a new source, go to Catalog > Add source. Note only admin users will be able to add new sources.

For some sources (Snowflake, BigQuery, Redshift), you will have the option to deploy Metadata Metrics right away. Metadata Metrics ensure every table is monitored for “time since last load” and “number of rows inserted”. This ensures you have broad coverage monitoring for the most common data quality problems.

Once your source is connected, Bigeye will automatically profile your data in order to recommend Autometrics, or suggested metrics to ensure your data is consistently accurate and reliable. To learn more about the heuristics used to recommend Autometrics, see Available Metrics.

2. Deploy Autometrics

Autometrics are suggested metrics that monitor anomalies in column-level data. Autometrics are automatically generated for all new datasets added to Bigeye. You can find Autometrics in the Catalog when viewing a source, schema, table or column page—simply navigate to the Autometrics tab. You can search, or use filters, to find metrics relevant to your monitoring goals. Select the relevant Autometrics, and click “enable”. By default, Bigeye metrics use Autothresholds to detect anomalies in your data and send alerts, so there’s no need to manually define thresholds. You have the option to configure metrics by setting a row creation time. Once a table's row creation time is set, Bigeye backfills metric history and enables Autothresholds without a training period.

3. Create collections & configure notifications

Collections group relevant metrics together to consolidate statuses, notifications, and issues for data owners and stakeholders. For example, you may wish to group all metrics on ingest tables together in an ingest collection, all metrics for a specific pipeline in a pipeline collection, or all metrics on a dashboard reporting table in a dashboard collection.

To add metrics to a Collection, select “metrics” in the catalog and click “action” > “Add to Collection”. Here you can create a new Collection or select an existing Collection.

On the Collections page, you can view the status of all metrics in the collection and triage-related issues. We recommend setting notifications, via either Slack or email on a collection.

4. Triage issues

When Bigeye metrics detect values outside of the expected threshold, an Issue is created. Issues help your team triage and resolve data quality problems.

If you have notifications configured on a metric, or any collection that metric is a part of, you will receive a notification when an Issue is created with a link. Alternatively, you can triage all issues from the Collections page, any catalog page, or the Issues tab in the Bigeye app, depending on what data is relevant to you.

You can acknowledge, mute, or close issues to track progress on resolution. Choosing to acknowledge an issue will automatically mute it for 24 hours.

5. Advanced features

Once you’ve set up Bigeye metrics and collections on your priority datasets, you can explore Bigeye’s advanced features to ensure even business-specific data requirements are monitored.

Take advantage of virtual tables and template metrics to monitor business rules. Setup deltas to monitor data migrations between sources or changes between dataset versions.