Alation Data Health

Bigeye integrates with Alation via the Bigeye Data Health Agent.

📘

BETA

The Bigeye Data Health Agent is a beta feature. Reach out for support.

How it works

In Alation, Bigeye will raise a deprecation at the table level to alert users of a potential data health concern for that table. This deprecation is in red. Further, columns on that table have a health deprecation count in the Health column on the Overview Page.

1111

The Data Health Agent also integrates into Alation Lineage. The Warning Panel collates warnings that have propagated from other tables. The warnings are in yellow. Further, deprecations on table and column assets appear in the Lineage tab and can warn consumers of upstream causes and downstream impacts.

1499

Finally, the Health tab will contain a list of Bigeye Metrics (rules) each with an associated column name (object name), status, the current observed value for the metric, description of the metric, and last updated timestamp. Users can click on the Rule link to access the metric in Bigeye, where they can get additional detail about an alert and review issues that have been raised for an alert. They can also click on the object name to access the Alation Column page and see the status for each.

1428

Implementation

Bigeye’s solution is an agent-based, batch solution. It is deployed as a docker container and can run in common ephemeral compute solutions — like AWS Lambda. The solution can be hosted by Bigeye or the customer and, therefore, offers flexibility for our more security-minded customers.

📘

API Requirements

Required Alation API Version: greater than 2022.3

1163

Setup

Hosting Strategies

Bigeye Hosts

  1. Requires access to the customer’s Alation instance—either public net or VPC Private Link.
  2. The customer provides a credential (described below) either by:
    1. providing cross account access to the customer’s own AWS Secrets Manager Secret, or
    2. providing the credential details and Bigeye stores the secret in the Bigeye AWS account.

Customer Hosts

  1. Does not require customer security integration with Bigeye
  2. Bigeye provides a Terraform script for the Customer to run against its own AWS account
  3. Customer creates a credential (described below) and stores that secret in their own AWS Secrets Manager

Customer Credential

Each customer needs to provide the authentication detail for both Bigeye and their external catalog. Follow the below credential format for Alation.

{
    "alation_user_id": 1234,
    "alation_refresh_token": "1234sometoken1234",
    "alation_base_url": "https://myorg.alation.com",
    "bigeye_username": "[email protected]",
    "bigeye_password": "myserviceaccountpasswordisfake",
    "bigeye_base_url": "https://app.bigeye.com"
}

Sourcing Credential Details

  1. Basic auth credential for Bigeye.
    • Bigeye URL
    • Bigeye Service Account User ID
      • Bigeye user IDs are email addresses
    • Bigeye Service Account Password
      • ****For Bigeye SSO:****
        1. Go to https://<customer base URL>/request-password-reset?email=<email>
        2. Set password
  2. API credential for external catalog.
    • Alation
366
  • Alation Base URL
    • https://CUSTOMER.alationcatalog.com for cloud SaaS OR the customer’s vanity URL.
  • Alation User ID
    • Find this in the URL for the User Profile
      • https://<<customer>>.alationcatalog.com/user/1234/ 1234 is the user ID
  • Alation Refresh Token
    • Create a refresh token under the Account Settings :: Authorization menu. Make sure you are creating a refresh token and not an API token.

Scheduling

The Bigeye Data Health Agent uses a batch process and each customer can decide their own batch schedule. The standard is every 15 minutes.

Process

  1. Bigeye Data Health Agent is triggered by Cloudwatch with a unique EventBridge trigger for the customer. The event contains the name of the AWS Secret used to store the credential for the customer’s external catalog.
  2. Agent queries AWS Secrets Manager for the required credentials using the secret name provided in the event
  3. Agent pulls the source details from Bigeye
  4. Agent pulls the data health details from Bigeye
  5. Agent pulls catalog asset details from the customer’s catalog
  6. Agent matches assets between Bigeye and the customer’s catalog
  7. Agent pushes data health details to matched assets in the customer’s catalog.