Matillion ETL Lineage Connector

Overview

The Matillion ETL Lineage Connector enables automated extraction of lineage from Matillion Orchestration and Transformation jobs. It supports both live API integration and file-based operation, helping teams visualize data flow across Matillion pipelines at the table and column level.

This document outlines current capabilities, limitations, and relevant details for customers and sales conversations.


Supported Features (Current State)

1. REST API-Based Metadata Extraction

  • Connects to Matillion ETL via REST API using:
    • Username/password authentication
    • Configurable parameters for group, project, environment, and time window
  • Supported object types:
    • Groups
    • Projects
    • Jobs (Orchestration & Transformation)
    • Components (e.g., SQL scripts, table input/output)
    • Steps (sub-SQL or sub-component breakdown)
  • Retrieves lineage at table and field level where metadata is available

2. File-Based Lineage Extraction

  • Supports running the connector on exported job JSON files
  • Enables usage in POC scenarios or security-restricted environments
  • Produces consistent lineage results as API-based extraction

3. Internal Lineage Modeling

  • Tracks data flow within jobs and across jobs
  • Establishes:
    • Mapping Source → Mapping Target relationships
    • Directed graphs of transformation pipelines
  • Uses SQL parsing to extract field-level lineage from embedded queries

4. Supported Versions

  • Matillion connector supports version 1.54.7 or higher with Enterprise Mode feature enabled.

5. Supported Dialects

  • Matillion connector is supported for Snowflake, Delta Lake on Databricks, and Amazon Redshift dialects within Matillion ETL

Known Limitations

The connector does not currently support:

1. Stored Procedures

  • Lineage within stored procedures is out of scope in the current release

2. Non-SQL Component Parsing

  • Components with logic in arbitrary scripting languages are not supported
  • Only components that expose SQL or metadata are parsed

3. File-Based Source/Sink Lineage

  • Lineage for sources like Amazon S3 or other files is not yet supported
  • Requires File Data Service (FDS) support, currently unavailable in Lineage Plus

Configuration Parameters

The following fields can be configured:

  • matillion.instance.url
  • matillion.api.version
  • matillion.username
  • matillion.password
  • matillion.environment
  • matillion.include.groups / exclude.groups
  • matillion.include.projects / exclude.projects
  • matillion.startTimestamp / endTimestamp

Timestamps are in UTC milliseconds. If not provided, defaults to the current day.


Performance Considerations

  • Job metadata queries are made per job, which may impact runtime for large environments.
  • Field-level detail varies:
    • Some transformations expose fields clearly
    • Others (e.g., SQL blocks) require parsing or inference
  • Lineage completeness depends on the level of detail Matillion exposes per component

Use Cases

The connector is well-suited for:

  • Data observability and impact analysis within Matillion jobs
  • Proof-of-concept installations in secure environments using file mode
  • Visualizing job-to-job and field-level transformations

Roadmap (Planned / Future Capabilities)

  • Support for stored procedures
  • Enhanced parsing of complex SQL and scripting logic
  • Integration with File Data Service (FDS) for file lineage
  • Additional Matillion component support based on real customer data

Have Questions?

We’re happy to help evaluate whether your Matillion setup is compatible with the current version of the connector. Reach out with sample jobs or component details.