Bigeye metrics can be run as part of your Airflow DAG. This is done by creating an extra task after a table has been loaded to run all the checks on the table. If you want to include your Bigeye metrics as part of your ETL definition, you can also use the operators to create or update the metric definitions.

Getting started

You can use our plugin by referencing our Github project in your requirements.txt by adding the following line:

bigeye-airflow

In order to communicate with Bigeye, you will have to define an HTTP connection in Airflow; your login and password are the same as you use to log into the Bigeye UI:

1664

All operations need to be run on a specific warehouse and you will need to provide the warehouse id. The easiest way to find the warehouse id is to log into the Bigeye UI and navigate to a warehouse of interest; the ID is 414 in the screenshot below:

Run metrics on a table

You can use the RunMetricsOperator to run all the metrics defined on a table. This operator will throw a ValueError if it fails and will produce logs about how many and which metrics failed.

In order to run all the metrics on the users table in the analytics schema from the warehouse seen above and using the bigeye_connection connection that we've defined, you can build the operator as follows:

from bigeye_airflow.operators.run_metrics_operator import RunMetricsOperator
....
run_metrics = RunMetricsOperator(
    task_id='run_bigeye_metrics',
    connection_id='bigeye_connection',
    warehouse_id=414,
    schema_name="analytics",
    table_name="users",
    dag=dag
)

Send dbt Core job run information

You can use the BigeyeDbtCoreOperator to send information to Bigeye about the results of dbt Core job runs. Add the following to your DAG. The url properties are optional, and when the DAG runs, these URLs (if present) will be added to the catalog pages for the tables affected by the dbt Core job.

from bigeye_airflow.operators.bigeye_dbt_operator import BigeyeDbtCoreOperator
....
BigeyeDbtCoreOperator(
        task_id="dbt_core_sync",
        connection_id="bigeye_connection_in_airflow",
        workspace_id="<bigeye_workspace_id>",
        target_path="path_to/target",
        job_run_url="<url_for_your_job_run>",
        job_url="<url_for_your_job>",
        project_url="<url_for_your_project>",
        dag=dag
    )