Matillion ETL

The Matillion ETL connector extracts lineage from Matillion Orchestration and Transformation jobs. It supports both REST API integration and file-based operation, building lineage across Matillion pipelines at the table and column level.

Prerequisites

  • Matillion version 1.54.7 or higher with Enterprise Mode enabled
  • Matillion user credentials with API access
  • Supported dialects: Snowflake, Delta Lake on Databricks, Amazon Redshift

Supported Features

REST API-based metadata extraction

  • Connects to Matillion ETL via REST API using username/password authentication
  • Extracts groups, projects, jobs (Orchestration and Transformation), components, and steps
  • Retrieves lineage at table and field level where metadata is available

File-based lineage extraction

  • Runs the connector on exported job JSON files
  • Produces consistent lineage results as API-based extraction
  • Useful for POC scenarios or security-restricted environments

Internal lineage modeling

  • Tracks data flow within jobs and across jobs
  • Establishes source-to-target mapping relationships and directed transformation graphs
  • Uses SQL parsing to extract field-level lineage from embedded queries

Known Limitations

  • Stored procedures — Lineage within stored procedures is not supported in the current release
  • Non-SQL components — Components with logic in arbitrary scripting languages are not parsed; only components that expose SQL or metadata are supported
  • File-based sources/sinks — Lineage for sources like Amazon S3 or other file storage is not yet supported

Configuration Parameters

Create a properties file (for example, matillion.properties) with your connection configuration:

PropertyTypeRequiredDescriptionExample
environment.name.NStringYesEnvironment identifier used to group projectsProd
bigeye.host.NURLYesBigeye instance URLhttps://app.bigeye.com
bigeye.apikey.NStringYesBigeye API keybigeye_pak_abc123
bigeye.allowed.workspaces.NInteger ListYesComma-separated workspace IDs123
matillion.instance.url.NURLYesURL of the Matillion ETL instancehttps://matillion.company.com
matillion.api.version.NStringNoMatillion API versionv1
matillion.username.NStringYesMatillion usernamebigeye_service
matillion.password.NStringYesMatillion password
matillion.environment.NStringNoMatillion environment nameProduction
matillion.include.groups.NString ListNoGroups to include (comma-separated)ETL,Analytics
matillion.exclude.groups.NString ListNoGroups to exclude (comma-separated)Dev,Test
matillion.include.projects.NString ListNoProjects to include (comma-separated)DW_Load
matillion.exclude.projects.NString ListNoProjects to exclude (comma-separated)Sandbox
matillion.startTimestamp.NLongNoStart time in UTC milliseconds. Defaults to start of current day1700000000000
matillion.endTimestamp.NLongNoEnd time in UTC milliseconds. Defaults to current time1700086400000

Sample Properties File

environment.name.1=Matillion Production
bigeye.host.1=https://app.bigeye.com
bigeye.apikey.1=bigeye_pak_acbdefg123456
bigeye.allowed.workspaces.1=123
matillion.instance.url.1=https://matillion.company.com
matillion.username.1=bigeye_service
matillion.password.1=${MATILLION_PASSWORD}
matillion.environment.1=Production
matillion.include.groups.1=ETL,Analytics

Running the Connector

With the Agent CLI (recommended)

# Install and configure the Lineage Plus agent
./bigeye-agent install

# Add the Matillion connector
./bigeye-agent add-connector -c matillion

# Run the connector
./bigeye-agent lineage run -c matillion

With Docker

docker run --rm \
  -v /path/to/config:/app/config \
  --entrypoint bash bigeyedata/source-connector:latest \
  -c "bigeye-connector run -c matillion -p /app/config/matillion.properties"

Performance Considerations

  • Job metadata queries are made per job, which may impact runtime for large environments
  • Field-level detail varies: some transformations expose fields clearly, while others (for example, SQL blocks) require parsing or inference
  • Lineage completeness depends on the level of detail Matillion exposes per component