Integration Implementation
The integration may be deployed as a standard Docker container instance using docker compose run
. You may also use Kubernetes. Bigeye provides options for running it on both.
Requirements
Pre-Install Checklist
You will need the following information to install the Integration.
-
Set up a VM / host
- Minimum HW size: 1 CPU, 2GB mem. (If the Bigeye agent and data health integration will be running on the same host, minimum HW requirements are 4CPU, 16GB mem)
i. AWS t3.small
ii. GCP e2-small
iii. Azure B1ms - Network access for agent subnets
i. API access to Bigeye
ii. API access to destination catalog
iii. Access to pull the Datahealth Integration image from docker.io
- Minimum HW size: 1 CPU, 2GB mem. (If the Bigeye agent and data health integration will be running on the same host, minimum HW requirements are 4CPU, 16GB mem)
-
Bigeye credentials
- You will need a Bigeye account
- To fully configure the integration, the user must be an Admin in Bigeye
- To run the integration, the user only needs to have edit permissions
-
Docker access token
- Bigeye will provide this
-
OPTIONAL: Work with your security team to verify if a custom certificate authority is required to access the destination catalog.
- Custom SSL CA cert if a custom SSL cert is installed on the destination
Infrastructure Requirements
Other Docker platforms such as AWS ECS, or orchestration systems such as Nomad will work, but their installation is not supported by Bigeye.
Host with docker
- Install docker if it is not already installed [CentOS][RHEL][Debian][Ubuntu]
- NOTE for RHEL, the official docker instructions don't quite work. Using docker's CENTOS yum repository instead is a good workaround.
-
sudo yum install -y yum-utils sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo sudo yum install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin sudo systemctl start docker # test it sudo docker run hello-world
- You can test if docker is running by using
docker info
and getting a non-error response docker compose
should already work if you've installed docker with the above instructions. If not, install docker compose (Instructions)- Run
docker compose version
to test if it is installed correctly. The version should be 2.0 or higher.
- Run
Host with Kubernetes
- Kubernetes is available on different cloud platforms; e.g. AWS EKS, Azure AKS, GCP GKE. It's also available through Rancher, Red Hat OpenShift, etc.
Configuration
Installs with custom SSL certs (optional)
If your catalog is using a custom SSL certificate, place the SSL Root CA into the file name below. The file must be named private_ca.pem.
mkdir bigeye_datahealth_config
cp my_root_ca.pem bigeye_datahealth_config/private_ca.pem
Create docker-compose.yaml
Download datahealth-integration-docker-compose.yaml as docker-compose.yaml and pull the latest image.
docker logout
docker login --username bigeyedata --password <docker access token>
wget https://bigeye-public-web.s3.amazonaws.com/datahealth-integration-docker-compose.yaml --output-document=docker-compose.yaml
docker compose pull
The data health CLI comes with a configure
command to create the necessary file for the destination catalog.
# If not created already
mkdir bigeye_datahealth_config
# Run the configure command
docker compose run --rm datahealth-integration configure
────────────────────────────────────────────────────────────────────── Bigeye Integration Setup ───────────────────────────────────────────────────────────────────────
Steps to configure the integration between Bigeye and a destination catalog
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Please choose an option from the list below
1) ALATION
2) ATLAN
3) DATADOTWORLD
(0 to exit) [1/2/3/0] (1):
You may also create the file yourself. See docs.bigeye.com on how to format the datahealth.json file
Running the CLI
Other than the configure command, all others REQUIRE the credential file referenced above. You have two options for how to access this file when a command is invoked:
- Mount the file(s) to the container or pod
- Store the contents of the file in AWS Secrets Manager
Docker
The CLI can be run as a container using docker compose run
. You can use the --help
flag to see the available commands or use it for a specific command to see the available arguments. NOTE: The commands below demonstrate using files mounted to the container.
# View available commands
docker compose run --rm datahealth-integration --help
╭─ All catalogs ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ configure Configure an integration by creating the credential file required to run the different features available. │
│ run-integration Run the main integration between Bigeye and a destination data catalog. It can be run from a local file mounted to the container or pod, or by pulling from AWS Secrets Manager │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Alation ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ sync-deltas Run the integration to create a Bigeye Delta Summary for tables in Alation. It can be run from a local file mounted to the container or pod, or by pulling from AWS Secrets Manager. │
│ sync-tags Run the integration to show Alation tags as Bigeye table attributes. It can be run from a local file mounted to the container or pod, or by pulling from AWS Secrets Manager. │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Alation and Atlan ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ sync-entities Run the integration to create links from Bigeye table attributes to a destination data catalog. It can be run from a local file mounted to the container or pod, or by pulling from AWS Secrets Manager. │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
# View help for specific command
docker compose run --rm datahealth-integration run-integration --help
╭─ Mounted file configuration ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --file-name TEXT The name of the mounted file containing the configuration for an integration. [default: None] │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ AWS configuration ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --aws-secret-name TEXT The name in AWS Secrets Manager containing the configuration for an integration [default: None] │
│ --aws-region TEXT The region in AWS to access Secrets Manager [default: None] │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Adding a scheduled sync
To schedule a sync every hour, you could add the docker run
to cron, The below will run it every hour, at the top of the hour. Use crontab -e
to edit cron jobs and add the line below to the crontab file.
0 * * * * docker compose run --rm datahealth-integration run-integration --file-name datahealth.json >> /tmp/datahealth_integration.log 2>&1
Kubernetes
The CLI can be run on Kubernetes using a CronJob.
- Download bigeye-datahealth-cronjob-file-mount.yaml
- Download bigeye-datahealth-cronjob-aws.yaml
Configure Kubernetes
# Download yaml files
wget https://bigeye-public-web.s3.amazonaws.com/bigeye-datahealth-cronjob-aws.yaml
wget https://bigeye-public-web.s3.amazonaws.com/bigeye-datahealth-cronjob-file-mount.yaml
# Create a namespace for Bigeye resources
kubectl create namespace bigeye
# Set current namespace to bigeye (optional)
kubectl config set-context --current --namespace=bigeye
# Create a docker-registry secret (the default in the yaml file is bigeyecred)
kubectl create secret \
docker-registry bigeyecred \
--docker-username=bigeyedata \
--docker-password=<docker PAT provided by Bigeye>
Running the intergration
The integration will run on the schedule specified in the YAML files. You may duplicate these files to create additional jobs to run other features of the integration.
- If you are using credential files, the file above references a secret called datahealth-config. See how to create it below. NOTE: You may also use a ConfigMap . If so, you must create it, and update the file to reference it.
# Create a secret for the configuration files above
kubectl create secret \
generic datahealth-config \
--from-file=bigeye_datahealth_config/
# Create the job:
kubectl create -f bigeye-datahealth-cronjob-file-mount.yaml
- If you are using AWS, you need to update
bigeye-datahealth-cronjob-aws.yaml
to reference the correct secret name# Create the job: kubectl create -f bigeye-datahealth-cronjob-aws.yaml
Logging with Kubernetes
Logs for a CronJob can be retrieved, only while the pod is available. It is recommended to review Logging Architecture for Kubernetes and implement the solution that works best for you, in order to retain logs. To view logs from pods that are available, see example below.
# See pods that are available.
kubectl get pods
NAME READY STATUS RESTARTS AGE
demo-alation-integration-28877250-96gzp 0/1 Completed 0 30m
demo-alation-integration-28877220-zwg8t 0/1 Completed 0 60m
demo-alation-integration-28877235-jqkfq 0/1 Completed 0 45m
# View logs for specific pod
kubectl logs demo-alation-integration-28877235-jqkfq
Troubleshooting
If you've redirected the cron log to file, viewing the file will show success runs as well as error messages. Below is
{"status":"Bigeye: https://app.bigeye.com sync to Target: <catalog URL> successful."}
Appendix
Environment variable for AWS access keys
When running via docker compose
you may need to provide AWS access to the container in order to retrieve secrets from Secrets Manager. You can set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY as environment variables and reference them indocker-compose.yaml
. The variables are already in the file but commented out. Remove the comments and set the variables.
Write a script that can retrieve the keys
Write a script that can retrieve secrets from your password keeper and use that to populate the environment variables.
export AWS_ACCESS_KEY_ID=$(access_key_script.sh)
export AWS_SECRET_ACCESS_KEY=$(access_secret_script.sh)
Import Image Archive (if docker pull was not used)
Save the container image as a file
Follow the docker compose pull instructions on a test system to pull the agent image. Then export the image to a file using docker save.
# Export image to file
docker save docker.io/bigeyedata/datahealth-integration:latest > datahealth_integration_$(date '+%Y-%m-%d').tar
Import the container image on the production system
Copy the image archive file to the system where the integration will be run. Import it onto the host that will be running the integration using the commands below.
# Import image
docker load -i < Path to image archive here >
# Verify the image was loaded successfully
docker image ls docker.io/bigeyedata/datahealth-integration:latest
Updated 12 days ago