Integration Implementation

The integration may be deployed as a standard Docker container instance using docker compose run. You may also use Kubernetes. Bigeye provides options for running it on both.

Requirements

Pre-Install Checklist

You will need the following information to install the Integration.

  1. Set up a VM / host

    • Minimum HW size: 1 CPU, 2GB mem. (If the Bigeye agent and data health integration will be running on the same host, minimum HW requirements are 4CPU, 16GB mem)
      i. AWS t3.small
      ii. GCP e2-small
      iii. Azure B1ms
    • Network access for agent subnets
      i. API access to Bigeye
      ii. API access to destination catalog
      iii. Access to pull the Datahealth Integration image from docker.io
  2. Bigeye credentials

    • You will need a Bigeye account
    • To fully configure the integration, the user must be an Admin in Bigeye
    • To run the integration, the user only needs to have edit permissions
  3. Docker access token

    • Bigeye will provide this
  4. OPTIONAL: Work with your security team to verify if a custom certificate authority is required to access the destination catalog.

    • Custom SSL CA cert if a custom SSL cert is installed on the destination

Infrastructure Requirements

🚧

Other Docker platforms such as AWS ECS, or orchestration systems such as Nomad will work, but their installation is not supported by Bigeye.

Host with docker

  1. Install docker if it is not already installed [CentOS][RHEL][Debian][Ubuntu]
    1. NOTE for RHEL, the official docker instructions don't quite work. Using docker's CENTOS yum repository instead is a good workaround.
    2. sudo yum install -y yum-utils
      sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
      sudo yum install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
      sudo systemctl start docker
      # test it 
      sudo docker run hello-world
      
  2. You can test if docker is running by using docker info and getting a non-error response
  3. docker compose should already work if you've installed docker with the above instructions. If not, install docker compose (Instructions)
    1. Run docker compose version to test if it is installed correctly. The version should be 2.0 or higher.

Host with Kubernetes

  1. Kubernetes is available on different cloud platforms; e.g. AWS EKS, Azure AKS, GCP GKE. It's also available through Rancher, Red Hat OpenShift, etc.

Configuration

Installs with custom SSL certs (optional)

If your catalog is using a custom SSL certificate, place the SSL Root CA into the file name below. The file must be named private_ca.pem.

mkdir bigeye_datahealth_config
cp my_root_ca.pem bigeye_datahealth_config/private_ca.pem

Create docker-compose.yaml

Download datahealth-integration-docker-compose.yaml as docker-compose.yaml and pull the latest image.

docker logout
docker login --username bigeyedata --password <docker access token>
wget https://bigeye-public-web.s3.amazonaws.com/datahealth-integration-docker-compose.yaml --output-document=docker-compose.yaml
docker compose pull

The data health CLI comes with a configure command to create the necessary file for the destination catalog.

# If not created already
mkdir bigeye_datahealth_config

# Run the configure command
docker compose run --rm datahealth-integration configure

 ────────────────────────────────────────────────────────────────────── Bigeye Integration Setup ─────────────────────────────────────────────────────────────────────── 
  Steps to configure the integration between Bigeye and a destination catalog                                                                                            
 ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
Please choose an option from the list below
1) ALATION
2) ATLAN
3) DATADOTWORLD
(0 to exit) [1/2/3/0] (1):

You may also create the file yourself. See docs.bigeye.com on how to format the datahealth.json file

Running the CLI

Other than the configure command, all others REQUIRE the credential file referenced above. You have two options for how to access this file when a command is invoked:

  1. Mount the file(s) to the container or pod
  2. Store the contents of the file in AWS Secrets Manager

Docker

The CLI can be run as a container using docker compose run. You can use the --help flag to see the available commands or use it for a specific command to see the available arguments. NOTE: The commands below demonstrate using files mounted to the container.

# View available commands
docker compose run --rm datahealth-integration --help

╭─ All catalogs ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ configure         Configure an integration by creating the credential file required to run the different features available.                                                                                                      │
│ run-integration   Run the main integration between Bigeye and a destination data catalog. It can be run from a local file mounted to the container or pod, or by pulling from AWS Secrets Manager                                 │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Alation ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ sync-deltas       Run the integration to create a Bigeye Delta Summary for tables in Alation. It can be run from a local file mounted to the container or pod, or by pulling from AWS Secrets Manager.                            │
│ sync-tags         Run the integration to show Alation tags as Bigeye table attributes. It can be run from a local file mounted to the container or pod, or by pulling from AWS Secrets Manager.                                   │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Alation and Atlan ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ sync-entities     Run the integration to create links from Bigeye table attributes to a destination data catalog. It can be run from a local file mounted to the container or pod, or by pulling from AWS Secrets Manager.        │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

# View help for specific command
docker compose run --rm datahealth-integration run-integration --help

╭─ Mounted file configuration ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --file-name        TEXT  The name of the mounted file containing the configuration for an integration. [default: None]                                                                                                               │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ AWS configuration ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --aws-secret-name        TEXT  The name in AWS Secrets Manager containing the configuration for an integration [default: None]                                                                                                    │
│ --aws-region             TEXT  The region in AWS to access Secrets Manager [default: None]                                                                                                                                        │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Adding a scheduled sync

To schedule a sync every hour, you could add the docker run to cron, The below will run it every hour, at the top of the hour. Use crontab -e to edit cron jobs and add the line below to the crontab file.

0 * * * * docker compose run --rm datahealth-integration run-integration --file-name datahealth.json >> /tmp/datahealth_integration.log 2>&1

Kubernetes

The CLI can be run on Kubernetes using a CronJob.

  1. Download bigeye-datahealth-cronjob-file-mount.yaml
  2. Download bigeye-datahealth-cronjob-aws.yaml

Configure Kubernetes

# Download yaml files
wget https://bigeye-public-web.s3.amazonaws.com/bigeye-datahealth-cronjob-aws.yaml
wget https://bigeye-public-web.s3.amazonaws.com/bigeye-datahealth-cronjob-file-mount.yaml

# Create a namespace for Bigeye resources
kubectl create namespace bigeye

# Set current namespace to bigeye (optional)
kubectl config set-context --current --namespace=bigeye

# Create a docker-registry secret (the default in the yaml file is bigeyecred) 
kubectl create secret \
docker-registry bigeyecred \
--docker-username=bigeyedata \
--docker-password=<docker PAT provided by Bigeye>

Running the intergration

The integration will run on the schedule specified in the YAML files. You may duplicate these files to create additional jobs to run other features of the integration.

  1. If you are using credential files, the file above references a secret called datahealth-config. See how to create it below. NOTE: You may also use a ConfigMap . If so, you must create it, and update the file to reference it.
# Create a secret for the configuration files above
kubectl create secret \
generic datahealth-config \
--from-file=bigeye_datahealth_config/

# Create the job:
kubectl create -f bigeye-datahealth-cronjob-file-mount.yaml
  1. If you are using AWS, you need to update bigeye-datahealth-cronjob-aws.yaml to reference the correct secret name
    # Create the job:
    kubectl create -f bigeye-datahealth-cronjob-aws.yaml
    

Logging with Kubernetes

Logs for a CronJob can be retrieved, only while the pod is available. It is recommended to review Logging Architecture for Kubernetes and implement the solution that works best for you, in order to retain logs. To view logs from pods that are available, see example below.

# See pods that are available.
kubectl get pods

NAME                                       READY   STATUS      RESTARTS   AGE
demo-alation-integration-28877250-96gzp    0/1     Completed   0          30m
demo-alation-integration-28877220-zwg8t    0/1     Completed   0          60m
demo-alation-integration-28877235-jqkfq    0/1     Completed   0          45m

# View logs for specific pod
kubectl logs demo-alation-integration-28877235-jqkfq

Troubleshooting

If you've redirected the cron log to file, viewing the file will show success runs as well as error messages. Below is

{"status":"Bigeye: https://app.bigeye.com sync to Target: <catalog URL> successful."}

Appendix

Environment variable for AWS access keys

When running via docker compose you may need to provide AWS access to the container in order to retrieve secrets from Secrets Manager. You can set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY as environment variables and reference them indocker-compose.yaml. The variables are already in the file but commented out. Remove the comments and set the variables.

Write a script that can retrieve the keys

Write a script that can retrieve secrets from your password keeper and use that to populate the environment variables.

export AWS_ACCESS_KEY_ID=$(access_key_script.sh)  
export AWS_SECRET_ACCESS_KEY=$(access_secret_script.sh)

Import Image Archive (if docker pull was not used)

Save the container image as a file

Follow the docker compose pull instructions on a test system to pull the agent image. Then export the image to a file using docker save.

# Export image to file
docker save docker.io/bigeyedata/datahealth-integration:latest > datahealth_integration_$(date '+%Y-%m-%d').tar

Import the container image on the production system

Copy the image archive file to the system where the integration will be run. Import it onto the host that will be running the integration using the commands below.

# Import image
docker load -i < Path to image archive here >

# Verify the image was loaded successfully
docker image ls docker.io/bigeyedata/datahealth-integration:latest