Agent Connection

Bigeye Agent Installation Guide

Bigeye provides a Docker image for hosting the Bigeye Data Source Agent on your infrastructure. The agent may be deployed as a standard Docker container instance using tools such as docker run or docker-compose. You may also use container orchestration systems such as Kubernetes.

The agent itself only makes outbound connections to send metric results to Bigeye's SaaS infrastructure, so no inbound access to your infrastructure is required.

Requirements

Pre-Install Checklist

You will need the following information to install the Bigeye agent.

  1. Set up a VM / host

    • Minimum HW size: 4 CPU, 8GB mem, 25GB disk space on /var is recommended.
      i. AWS t3.xlarge
      ii. GCP e2-standard-4
      iii. Azure B4/D4
    • Network access for agent subnets
      i. No ingress (inbound) networking access for the agent is required (other than for internal administration; i.e. SSH)
      ii. Egress (outbound) Access to the data sources you wish to add to Bigeye for monitoring
      iii. Egress (outbound) Access to the Bigeye SaaS environment to pull work
      iv. Egress (outbound) Access to pull the agent image from docker.io
  2. Bigeye credentials

  3. Company prefix

    • This is the first part of your Bigeye URL (e.g. “app” for app.bigeye.com)
  4. Docker access token

    • Bigeye will provide this

Infrastructure Requirements

🚧

Other Docker platforms such as AWS ECS, orchestration systems such as Nomad, or container solutions like podman will work, but their installation is not supported by Bigeye.

Host with docker

  1. Install docker if it is not already installed [CentOS][RHEL][Debian][Ubuntu]
    1. NOTE for RHEL, the official docker instructions don't quite work. Using docker's CENTOS yum repository instead is a good workaround.
    2. sudo yum install -y yum-utils
      sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
      sudo yum install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
      sudo systemctl start docker
      # test it 
      sudo docker run hello-world
      
  2. You can test if docker is running by using docker info and getting a non-error response
  3. docker compose should already work if you've installed docker with the above instructions. If not, install docker compose (Instructions)
    1. Run docker compose version to test if it is installed correctly. The version should be 2.0 or higher.

Host with Kubernetes

  1. Kubernetes is available on different cloud platforms; e.g. AWS EKS, Azure AKS, GCP GKE. It's also available through Rancher, Red Hat OpenShift, etc.
    NOTE: Even if deployed on Kubernetes, Docker is required for a one time configuration of the agent (steps below).

Configuration

1. Generate agent encryption keys and configuration

docker logout
docker login --username bigeyedata --password <docker access token>
mkdir agent_config

# Run setup script
# -p app - single-tenant saas customers will use something different (ask Bigeye)

docker run --rm -it -v "${PWD}/agent_config:/app/config" --entrypoint setup_agent.sh docker.io/bigeyedata/agent:latest -p app

The setup script will prompt you for your Agent API Key. This is used to exchange Encryption certs with Bigeye's infrastructure.

A description of the files that are generated by the above script can be found in the Appendix.

2. Add data sources to the agent

Edit agent_config/agent.yaml and add each source. For each source you will need:

  • Warehouse type (MySQL, Oracle, Redshift, etc)
  • Hostname or IP address
  • Port number
  • Username
  • Password (double quotes in the password should be escaped with a , ie "my_pass\"_with_a_double_quote")
  • Database name

Workspaces

To limit agent sources to only certain workspaces, you can configure the allowed workspace IDs in the agent's config.yaml. You can set the workspaces at the agent level or the source level. We will use the agent level settings as a fallback if the source level settings are not set. If neither are set, we will allow all workspaces as before. These settings are optional, but highly recommended.

An example of such a config at the agent level is given below, under the defaultAllowedWorkspaces key:

companyUuid: <company_uuid>

defaultAllowedWorkspaces: [1, 2]
workflowUrl: app-workflows.bigeye.com

agentApiKey: bigeye_agent_*********

publicEncryptionKeyDir: /app/config
privateEncryptionKeyPath: /app/config/private.pem

logQueries: false
healthLoggerEnabled: true

Another example of a config at the source level is given below, under the allowedWorkspaces key:

sources:
  - identifier: 'Oracle test agent'
    allowedWorkspaces: [2, 3]
    connectionFactory:
      type: oracle
      host: test.test
      port: 1521
      user: <username>
      password: <password>
      databaseName: <databaseName>

Install

Docker

Docker compose is a docker native tool that is used to define docker containers as yaml configuration instead of having to manage them on the command-line with flags etc. It will be used heavily for installing and running the agent.

Create docker-compose.yaml

Download bigeye-agent-docker-compose.yaml as docker-compose.yaml and pull the latest agent image.

wget https://bigeye-public-web.s3.amazonaws.com/bigeye-agent-docker-compose.yaml --output-document=docker-compose.yaml
docker compose pull

Anytime you wish to install the latest version of the agent, docker compose pull can be run again to pull the latest image.

If docker compose is not able to be installed, the native docker commands are in the Appendix and can be used instead.

If docker pull is not possible on the system where you will be running the agent, there are instructions on how to import/export the container from a test system instead in the Appendix.

Running the Agent

Note: All configuration steps above must be complete before deploying the agent.

# The -d flag (detatch) runs the container in the background
docker compose up -d

# Check that the container is running
docker ps -f name=bigeye-agent

# View container logs (-f can be used to “tail” the logs)
docker compose logs

# Stopping the agent if desired
docker compose down

To confirm that the container started successfully, look for a line in the Docker log similar to the following:

INFO Agent started successfully in 4076ms. Polling for activities...

Note that you can launch more instances of the Agent on the same machine by copying the bigeye-agent block in docker-compose.yaml (be sure to name it something else ie bigeye-agent-xxx).

Kubernetes

Download bigeye-agent-kubernetes.yaml

Configure Kubernetes

# Download K8s yaml
wget https://bigeye-public-web.s3.amazonaws.com/bigeye-agent-kubernetes.yaml 

# Create a namespace for Bigeye resources (the default in the yaml file is bigeye)
kubectl create namespace bigeye

# Create a docker-registry secret (the default in the yaml file is bigeyecred) 
kubectl create secret docker-registry bigeyecred --docker-username=bigeyedata --docker-password=<docker access token> -n bigeye

# Create a configmap for the configuration files above (the default in the yaml file is agent-config)
kubectl create configmap agent-config --from-file=agent_config/ -n bigeye

Running the Agent

Note: All configuration steps above must be complete before deploying the agent.

# Create the deployment with the yaml file
kubectl create -f bigeye-agent-kubernetes.yaml

# Check that the pod is running.
kubectl get pods -n bigeye

NAME                            READY   STATUS    RESTARTS   AGE
bigeye-agent-1abg9c7987-l72xq   1/1     Running   0           1m

# View logs (-f can be used to “tail” the logs)
kubectl logs bigeye-agent-1abg9c7987-l72xq -n bigeye --previous=false

# Delete the deployment if desired
kubectl delete -f bigeye-agent-kubernetes.yaml

To confirm that the agent started successfully, look for a line in the log similar to the following:

INFO Agent started successfully in 4076ms. Polling for activities...

Note: A secret can be used for the agent configuration, as well.

# Create generic secret
kubectl create secret generic agent-config --from-file=agent_config/ -n bigeye

The yaml file would need to be updated to reference the secret instead of the configmap.

volumes:
  # - name: agent-config
  #   configMap:
  #     name: agent-config
  - name: agent-config
    secret:
      secretName: agent-config

Add data sources in Bigeye UI

Log into the Bigeye UI ie https://app.bigeye.com for multi-tenant saas or https://.bigeye.com for single-tenant saas.

  1. Go to _Catalogs _view
  2. Select the Add source button in the upper right hand
  3. Select the data source type (ie. Snowflake)
  4. Select Connect with in-VPC agent
  5. The Name field in the next screen much match exactly the Identifier field that was used in agent.yaml

Troubleshooting

Common agent errors in agent log

io.grpc.StatusRuntimeException: PERMISSION_DENIED: Request unauthorized

javax.net.ssl.SSLException: error:10000412:SSL routines:OPENSSL_internal:SSLV3_ALERT_BAD_CERTIFICATE

  • Description: Certificate given to Bigeye in “Step 2. Upload agent authentication key” does not match.
  • Fix: Work with Bigeye support to resolve this (config/mtls_ca.pem md5 should be checked and in Bigeye’s workflow service)

Deadline exceeded (DEADLINE_EXCEEDED)

  • Description: Network connectivity between the Bigeye agent and the Bigeye workflow service does not work
  • Fix:
    • Check that workflowUrl is set correctly in ./config/agent.yaml
    • Check that the port is open, ie nmap -Pn -p 443
    • Nmap should return “open” as the state of the port and not “filtered”
      PORT    STATE SERVICE
      443/tcp open  https
      
    • Find out why network connectivity is not working and resolve

Exception in thread "main" io.grpc.StatusRuntimeException: CANCELLED: RST_STREAM closed stream. HTTP/2 error code: CANCEL

  • Description: Something in the networking infrastructure is closing the connection between the agent and Bigeye SaaS.
  • Fix: Work with your security / networking team to find where this is happening. Something that helps narrow things down greatly is to prove the agent and configs work from outside of your production network. IE start the agent with the same configuration files on a laptop at home/office (somewhere not in the production network).

jdbc timeout / error reaching host:port

  • Description: Networking connectivity between the agent and the data source is not set up.
  • Fix: Likely culprits are firewalls or subnets that do not have a route to reach the subnet that the data source is running on

io.grpc.StatusRuntimeException: UNAVAILABLE: error reading from server: EOF

  • Description: Yaml formatting error in ./config/agent.yaml
  • Fix: Typically this will be improper indentation or a missing quote in the agent config file.

Http2Exception: First received frame was not SETTINGS. Hex dump for first 5 bytes: 485454502f

  • Description: The agent is detecting that an "HTTP" connection is being established on the server side, but we are using GRPC for communication which is HTTP2. This can happen if there is a proxy in the self-hosted environment between the agent and Bigeye infrastructure.
  • Fix: Ensure that the proxy can handle HTTP2 connections or route agent connections directly to Bigeye infrastructure and do not go through the proxy

Exception: java.lang.OutOfMemoryError

java.sql.SQLException: Unable to load class: com.teradata.jdbc.TeraDriver

Teradata requires downloading their JDBC driver directly. See instructions

Appendix

Import Image Archive

If docker pull can not be used on the production system where the agent will be run, use the following instructions to save and transfer the docker image from a test system / local workstation.

save the agent image as a file

Follow the docker compose pull instructions on a test system to pull the agent image. Then export the image to a file using docker save.

# Export image to file
docker save docker.io/bigeyedata/agent:latest > bigeye_agent_$(date '+%Y-%m-%d').tar

import the agent image on the production system

Copy the image archive file to the system where the agent will be run. Import it onto the host that will be running the agent using the commands below.

# Import image
docker load -i < Path to image archive here >

# Verify the image was loaded successfully
docker image ls bigeyedata/agent:latest

Running without docker compose

It's possible to use docker run directly if a newer version of docker that has docker compose is not available.

# “docker logout” is required first if you are already logged in as a different account

docker login --username bigeyedata --password <docker access token>

docker pull docker.io/bigeyedata/agent:latest

docker run -d --name bigeye-agent -v ${PWD}/agent_config:/app/config/ --restart always docker.io/bigeyedata/agent:latest

docker logs bigeye-agent

AWS Secrets manager

The agent can read your data source password from AWS Secrets Manager instead of having the password in plain text in agent.yaml. Use the asmSecretNameForPassword key instead of password within the connectionFactory. By default, the agent will use the AWS IAM Role associated with your EC2 instance or container. If you choose to use an AWS IAM User credentials, you must also specify the following values in agent.yaml:

  • awsAccessKey - from the IAM User
  • awsSecretKey - from the IAM User
  • region - the AWS region where the secrets are
sources:
  - identifier: some snowflake db1 using aws secrets manager for password
    connectionFactory:
      type: snowflake
      host: mycompany.snowflakecomputing.com
      port:
      user: user
      # AWS Secrets Manager secret name
      asmSecretNameForPassword: my/aws/secret_manager/secret_name
      databaseName: somedb1

# Only specify the following 3 if you are using an AWS IAM User instead of AWS IAM Role
awsAccessKey: AKIAIOSFODNN7EXAMPLE (ie what normally goes in AWS_ACCESS_KEY_ID)
awsSecretKey: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY (ie what normally goes in AWS_SECRET_ACCESS_KEY)
region: us-east-1 (AWS region where AWS secrets manager secret resides)

Environment variable for password

The agent.yaml can accept environment variables in place of the plain text. This is useful for things like data source credentials. Update agent.yaml to use an environment variable for the user and password like shown below. The syntax is specific, do not wrap the variable in quotes: ${good} "${bad}"

# Example data source using environment vars for credentials
  - identifier: db with env vars for creds
    connectionFactory:
      type: snowflake
      host: mycompany.snowflakecomputing.com
      port: 443
      user: ${SECRET_USERNAME}
      password: ${SECRET_PASSWORD}
      databaseName: somedb

Then update docker-compose.yaml to pass through the environment variables (last 3 lines):

version: "3.9"
services:
  bigeye-agent:
    image: "docker.io/bigeyedata/agent:latest"
    restart: always
    volumes:
      - "${PWD}/agent_config:/app/config"

    environment:
      - SECRET_USERNAME
      - SECRET_PASSWORD

Write a script that can retrieve the credentials

Finally write a script that can retrieve secrets from your password keeper (HashiCorp Vault, Azure Key Vault, etc) and use that to populate the environment variables

export SECRET_USERNAME=$(username_script.sh)
export SECRET_PASSWORD=$(password_script.sh)

docker compose up

(Optional) Automate

There are many ways of automating this. A low effort way would be to put the variable exports into your .bashrc or .bash_profile so the environment variable is populated on login, then you don't have to remember to set it before running docker compose up.

Description of files in the agent_config directory

This is a catalog of the files that get created during agent setup in agent_config and a description of what they are used for.

Configuration

  • agent.yaml - Configuration file for the agent. Datasource connection info goes here, along with flags for the agent to enable query logging, setting company uuid etc.

Encryption certs (encryption at rest)

  • <company_uuid>.pem, bigeye.pem - these are used for signing/verifying signing of encryption key pair
  • private.pem - public certificate used to encrypt payloads for the agent
  • private.key - private key used to decrypt payloads from Bigeye

mTLS certs (If you have moved to API Keys, you may delete these)

  • mtls.key - private key used in mTLS connections with Bigeye infrastructure
  • mtls.pem - public certificate used in mTLS connections with Bigeye infrastructure
  • mtls_ca.key - private key to use as a certificate authority for self-signing mtls certs
  • mtls_ca.pem - public certificate for local certificate authority. This is sent to Bigeye so it can be trusted from Bigeye's infrastructure. Bigeye's infrastructure will then trusts connections using mtls key pairs that are generated from the mtls_ca.key
  • bigeye_workflows_ca.pem - public certificate for Bigeye's CA. This allows the agent to trust connections from Bigeye infrastructure.

Use custom trusted CA certificates

You may wish to use mTLS certificates signed by your own internal certificate authority for connecting to data sources. The agent supports adding additional certificates to the trusted CA list as of agent version 1.0.6.

To use additional CA trust certificates with the agent, put your custom certs in a new subdirectory agent_config/user_ca and restart the agent. The agent will import all CA certificates in that directory into its trust list.

mkdir agent_config/user_ca

# Copy the CA certificate file to the new directory
cp < path to CA certificate file > agent_config/user_ca/

# Restart the container to load the new CA certificates
docker compose restart 

Note" The CA certificates file names must end in .pem (i.e. custom_ca.pem).

Use a proxy

If all requests on your VM for the Bigeye agent are required to go through a proxy, then there may be some additional steps required to get everything installed successfully.

  1. Configure docker daemon with systemd file

    1. See official docker documentation here - This is required to be able to login and pull images from our private docker repository where the bigeye agent images are hosted.
    2. If this step is missed, then when running docker login you will receive this error message.
      Error response from daemon: Get "https://registry-1.docker.io/v2/": net/http: TLS handshake timeout
  2. Set the HTTP_PROXY and HTTPS_PROXY environment variables for the VM

    1. In order to download our docker compose file, you will need to set these environment variables using the below commands. This will allow you to run the wget command in our instructions to download our docker compose file.
    2. export HTTP_PROXY=http://someproxy.etc.com:8080
      export HTTPS_PROXY=http://someproxy.etc.com:8080
      
  3. Configure your docker config.json to use a proxy server

    1. See official docker documentation here - This is required to configure the Docker CLI to use proxies via environment variables in containers.
    2. If this step is missed, then you will have errors while running the setup script for the bigeye agent.
  4. Update the Bigeye docker-compose.yaml file

    1. Set the proxy settings in the bigeye docker-compose.yaml file so that the Bigeye agent knows to forward all requests through that proxy. See example below, the proxy settings should be placed using the JDK_JAVA_OPTIONS variable under the environment section of the docker-compose.yaml file.
    JDK_JAVA_OPTIONS=-Dhttp.proxyHost=someproxy.etc.com -Dhttp.proxyPort=8080 -Dhttps.proxyHost=someproxy.etc.com -Dhttps.proxyPort=8080
    

Set the amount of memory allocated by the agent

By default, the agent is set to allocate 75% of system memory for the JVM. If there are other processes that run on the host, this will lead to the Out of Memory condition. You can set the amount of memory allocated for the JVM by setting the MAX_RAM_PERCENTAGE environment variable to something smaller than 75 (that's the default). 25 for example would translate to 25% of system memory.

Setting this to something smaller than default not recommended unless there are other processes sharing the host. If the agent is the only thing that will run on the host, increase the memory on the host to solve this problem.

An example bigeye-agent-docker-compose.yaml setting the MAX_RAM_PERCENTAGE environment variable to 25% of system memory is below

version: "3.9"
services:

  bigeye-agent:
    # defines the image being used (ie what would normally be done with docker pull <image>
    image: "docker.io/bigeyedata/agent:latest"

    # This tells docker to restart if the container fails, the OS is restarted etc
    restart: always

    # This is how we will pass configuration files into the container, by mounting a directory.
    # The path for the source directory can be changed as desired, but the path on the docker
    # container that is being mounted must remain /app/config
    volumes:
      - "${PWD}/agent_config:/app/config"
    # Rotate logs to restrict total size to 1GB
    logging:
      driver: "json-file"
      options:
        max-size: "100m"
        max-file: "10"

    environment:
      MAX_RAM_PERCENTAGE: "25"

Teradata JDBC driver

It is possible to include custom JDBC drivers for the agent to load at runtime. Teradata for example, does not allow redistribution of their JDBC driver, so you will need to download it and make it available to the Bigeye Data Source Agent.

Download the driver into the agent_config directory, it will be loaded automatically when the agent is started.


curl -O --output-dir ./agent_config https://repo1.maven.org/maven2/com/teradata/jdbc/terajdbc/20.00.00.25/terajdbc-20.00.00.25.jar

Custom path to JDBC driver

If you would like to place custom JDBC drivers in a directory that is not the agent_config directory, you can do so by

  1. add your custom directory to the list of volumes mounted by the container in bigeye-agent-docker-compose.yaml
volumes:
      - "${PWD}/agent_config:/app/config"
      - "/lib/custom_jdbc_drivers:/app/custom_jdbc_drivers"
  1. set the CLASSPATH environment variable to include the directory. A wildcard is useful for convenience vs listing out every jar file. Note that the driver files must have a .jar extension.
    environment:
      CLASSPATH: "/app/custom_jdbc_drivers/*"

Custom JDBC parameters

Requires agent version 1.20.0 or later.

If you need some custom parameters to be added to the JDBCUrl, this can be configured in the agent_config/agent.yaml

# defaults to false; when true logs the urls to the console 
logUrls: true

sources:
  - identifier: snowflake agent
    connectionFactory:
      type: snowflake
      host: mysubdomain.snowflakecomputing.com
      user: myuser
      password: mypass
      port: 443
      databaseName: myWarehouse
      additionalJdbcProperties:
        customIntProperty: 0
        customBooleanProperty: TRUE
        customStringProperty: "some_string"

This will append the custom properties to the JDBC url according to the appropriate syntax for that driver.

To validate, you can add logUrls: true to the agent_config/agent.yaml to log the urls that we are using. This will produce a line like the following:

13:50:58.290 INFO  c.b.agent.BigeyeAgentApplication - Connecting to source "snowflake agent" with url "jdbc:snowflake://mysubdomain.snowflakecomputing.com/?user=<username>&warehouse=myWarehouse&STATEMENT_TIMEOUT_IN_SECONDS=600&JDBC_QUERY_RESULT_FORMAT=JSON&customIntProperty=0&customBooleanProperty=TRUE&customStringProperty=some_string"

Disabling the flag (the default) will disable those logs again.