Lineage Plus Agent
Managed by the Bigeye agent installer
Requirements
Pre-install checklist
-
Set up a VM / host (Ubuntu (20.04+) or Redhat Linux (RHEL 8+) preferred)
- Minimum hardware size
- 25GB of RAM
- 4 CPU
- 55 GB disk space
- Networking
- Firewall access to the hostname and URL paths provided below:
app-metacenter-portal.bigeye.com
app-metacenter-solr.bigeye.com
- The firewall rules should NOT strip any Authorization headers for the below mentioned host/domain names.
- Egress (outbound) Access to the data sources you wish to add to track Lineage in Bigeye
- Egress (outbound) Access to the Bigeye SaaS environment
- app.bigeye.com
- Ingress (inbound) Access to retrieve Licenses from Bigeye for Agent CLI
- Access to pull images from docker.io
- Firewall access to the hostname and URL paths provided below:
- Minimum hardware size
-
Bigeye information (provided by Bigeye)
- The company name associated with your agent installs
- The password to authenticate to your tenant and get your associated Lineage Plus License
-
Docker PAT
- Provided by Bigeye
Updating the Lineage Plus Agent in the CLI
Use the command below to update the lineage plus agent through the Agent CLI.
./bigeye-agent lineage upgrade
Job Scheduling for Lineage
This guide explains how to schedule lineage jobs to run automatically using the Bigeye Agent CLI.
Prerequisites
- Job scheduling is supported on Linux and macOS only (not Windows)
- The Bigeye Agent CLI must be installed and configured with your lineage connectors
Commands
List Scheduled Jobs
View all currently scheduled jobs:
./bigeye-agent-installer jobs list
Schedule a Lineage Job
Create or update a scheduled job to run lineage collection:
./bigeye-agent jobs upsert --name "my-lineage-job"
You'll be prompted to provide:
- Cron schedule: When the job should run (e.g.,
0 2 * * *
for daily at 2 AM) - Command: The lineage command to execute
Example Command
To schedule a daily lineage run for Snowflake at 2 AM:
./bigeye-agent jobs upsert --name "snowflake-lineage"
When prompted, enter:
- Cron schedule:
0 2 * * *
- Command:
./bigeye-agent lineage run -c snowflake
Remove a Scheduled Job
Delete a scheduled job:
./bigeye-agent jobs remove --name "my-lineage-job"
Cron Schedule Examples
- Every day at 2 AM:
0 2 * * *
- Every 6 hours:
0 */6 * * *
- Every Monday at 1 AM:
0 1 * * 1
- Every hour:
0 * * * *
How It Works
The job scheduler uses cron to automatically run lineage collection at specified intervals. Jobs are:
- Saved to your unified configuration file
- Written to your system's crontab
- Executed automatically by cron at the scheduled times
Note: If you update your agent configuration, existing jobs will continue to run with the updated settings.
Run on Kubernetes
Partial SupportThe steps below, and the chart provided, do not support scenarios where custom jars are required for lineage collection, or scenarios where any customization exceeds the size limit of a Kubernetes ConfigMap (1 MiB). Full support for Lineage Plus on Kubernetes is still in the development stage.
To run on Kubernetes, the Agent CLI is required. Follow the steps of the setup section, and then return here to complete the following prerequisites:
- Installing the Lineage Plus agent
- Add any connectors for lineage collection. (This generates the necessary files for lineage collection)
# Install the agent with the for-kubernetes flag (only valid for Lineage Plus agent)
./bigeye-agent install --for-kubernetes
# Add the necessary connectors
./bigeye-agent add-connector
What to expectRunning the install command will create a file called
bigeye_agent.yml
. This will store information for Bigeye, the Lineage Plus agent, and connection information for sources where lineage will be collected.The add-connector command will create a directory called lineage_config. Within that directory will be all the necessary files for the lineage process to run. These files will be used to run the process as a Kubernetes job.
Configure Kubernetes
Download the chart for Lineage Plus on Kubernetes. It can be an ad hoc Job or CronJob
# Download K8s ad hoc Job yaml (the namepsace set in the file is bigeye)
wget https://bigeye-public-web.s3.amazonaws.com/lineage-plus-kubernetes.yaml
# Download CronJob yaml
wget https://bigeye-public-web.s3.us-west-2.amazonaws.com/lineage-plus-kubernetes-cronjob.yaml
-
Update the resource limits in
lineage-plus-kubernetes.yaml
to match the value entered during the Lineage Plus installation. This is the value of themax_memory
parameter inbigeye_agent.yml
. -
Update the
log_dir
variable inlineage_config/global_settings.sh
to match themountPath
of thelogs-vol
volume mount of the container. -
Create a
configMap
of the necessary files.Each connector type is differentThe first two files listed in the command below will have path and file names dependent upon the connector type specified. For example, a connector for Postgres would have
postgresql
in the path with files namedpostgresql.properties
andpostgresql.sh
. The example below shows a connector for Snowflake. Verify paths and names by looking at thelineage_config/connectors
directory# Example configMap for Snowflake kubectl create configmap -n bigeye tmp-lineage-plus-config \ --from-file=lineage_config/connectors/snowflake/snowflake.properties \ --from-file=lineage_config/connectors/snowflake/snowflake.sh \ --from-file=lineage_config/lineage_plus.properties \ --from-file=lineage_config/global_settings.sh \ --from-file=lineage_config/system.properties \ --from-file=lineage_config/lineage_plus.lic \ --from-file=lineage_config/application-context.xml
-
Verify in
lineage-plus-kubernetes.yaml
that the container command executes/app/lineage_plus/scripts/<connector_type>.sh
Also, there are mount paths in theinitContainer
that need to reference the correct connector type. These will look likemountPath: /tmp/lineage_plus/scripts/<connector_type>.sh
andmountPath: /tmp/lineage_plus/config/snapshot/<connector_type>.properties
. -
Run Lineage Plus
# Apply the job kubectl apply -f lineage-plus-kubernetes.yaml # View pods kubectl get pods -n bigeye NAME READY STATUS RESTARTS AGE bigeye-lineage-plus-dgwn3 0/1 Running 0 16m # View logs (add -f to tail) kubectl logs -n bigeye bigeye-lineage-plus-dgwn3 # Delete the job when it completes kubectl delete -f lineage-plus-kubernetes.yaml
Updated 1 day ago