Data Catalogs

Data consumers benefit from data health insights; however, not all teams spend time on the Bigeye platform or receive notifications from Bigeye and, therefore, might not have critical information about the health of their data before making a decision.

The Bigeye Data Health CLI allows teams to extend pipeline health and data quality insights to popular data catalog solutions such as Alation, Atlan, and data.world. By integrating data health insights from Bigeye, data teams help consumers know where the data is and how it can be used.

Implementation

Bigeye's solution is a container-based CLI and can be hosted anywhere Docker or Kubernetes can be run.

See options for implementation

Supported Catalogs

Explore data health insights with the following supported catalog solutions:

Implementation (deprecated)

Bigeye’s solution is an agent-based, batch solution. It is deployed as a docker container and can run in common ephemeral compute solutions — like AWS Lambda. The solution can be hosted by Bigeye or the customer and, therefore, offers flexibility to comply with security standards.

Setup

Hosting Strategies

Bigeye Hosts

  1. Requires access to the customer’s catalog instance—either public net or VPC Private Link.
  2. The customer provides a credential (described below) either by:
    1. Providing cross-account access to the customer’s own AWS Secrets Manager Secret, or
    2. Providing the credential details and Bigeye stores the secret in the Bigeye AWS account.

Customer Hosts

  1. It does not require customer security integration with Bigeye
  2. Bigeye provides a Terraform script for the customer to run against its own AWS account
  3. The customer creates the credential and stores that secret in their own AWS Secrets Manager. These steps can be found in bullets 1 and 2 of the Process section below.

Scheduling

The Bigeye Data Health Agent uses a batch process, and each customer can decide their own batch schedule. The standard scheduling period is every 60 minutes.

Process

  1. Bigeye Data Health Agent is triggered by Cloudwatch with a unique EventBridge trigger for the customer. The event contains the name of the AWS Secret used to store the credential for the customer’s external catalog.
  2. Agent queries AWS Secrets Manager for the required credentials using the secret name provided in the event.
  3. The agent pulls the source details from Bigeye.
  4. The agent pulls the data health details from Bigeye.
  5. The agent pulls catalog asset details from the customer’s catalog.
  6. Agent matches assets between Bigeye and the customer’s catalog.
  7. Agent pushes data health details to matched assets in the customer’s catalog.