Azure Data Factory

The Azure Data Factory (ADF) Lineage Connector enables Bigeye to visualize and understand data movement and transformations within your ADF pipelines, including column-level lineage.

Connection Modes

The connector supports two modes for extracting ADF metadata:

API mode

Connects directly to Azure Data Factory using service principal credentials. This is the recommended approach for automated, recurring lineage collection.

Requirements:

  • Azure service principal with read access to your ADF instance
  • Tenant ID, Client ID, and Client Secret
  • Subscription ID, Resource Group, and Factory Name

File mode

Processes exported ADF pipeline JSON files from a local directory. Use this when direct API access is not available or for one-time imports.

Requirements:

  • Exported ADF pipeline definitions (JSON) in a local directory

Supported Capabilities

Lineage patterns

The connector extracts column-level lineage for the following ADF activity types:

Activity TypeLineage Support
Copy Activity (with explicit column mappings)Column-level lineage between source and target
Copy Activity (passthrough, matching column names)Inferred column-level lineage
Mapping Data FlowColumn-level lineage through dataflow transformations
Script Activity (SQL-based)Column-level lineage from INSERT...SELECT statements
ExecutePipelineCross-pipeline lineage links
ForEachLineage for inner activities with static definitions. Dynamic table lists resolved at runtime are not supported.
IfConditionLineage for activities in each branch
UntilLineage for activities within the loop body

Metadata extraction

In API mode, the connector also captures:

  • Pipeline definitions and folder structure
  • Dataset and linked service configurations
  • Pipeline run history (last 30 days, up to 5 runs per pipeline)
  • Activity execution metrics for successful runs

Filtering

You can control which pipelines are processed using include/exclude filters:

  • Pipeline name — Substring matching
  • Folder path — Glob pattern matching
  • Annotations — Exact match (case-insensitive)

Configuration Parameters

Create a properties file (for example, adf.properties) with your connection configuration:

PropertyTypeRequiredDescription
environment.name.NStringYesEnvironment identifier
bigeye.host.NURLYesBigeye instance URL
bigeye.apikey.NStringYesBigeye API key
bigeye.allowed.workspaces.NInteger ListYesComma-separated workspace IDs
adf.connection.type.NStringYesAPI or File
adf.tenant.id.NStringIf APIAzure tenant ID
adf.client.id.NStringIf APIAzure service principal client ID
adf.client.secret.NStringIf APIAzure service principal client secret
adf.subscription.id.NStringIf APIAzure subscription ID
adf.resource.group.NStringIf APIAzure resource group name
adf.factory.name.NStringIf APIADF factory name
adf.pipeline.files.location.NPathIf FilePath to directory containing exported pipeline JSON files
adf.pipelines.include.NString ListNoPipeline name substrings to include
adf.pipelines.exclude.NString ListNoPipeline name substrings to exclude
adf.folders.include.NString ListNoFolder path glob patterns to include
adf.folders.exclude.NString ListNoFolder path glob patterns to exclude
adf.annotations.include.NString ListNoAnnotation values to include (case-insensitive)
adf.annotations.exclude.NString ListNoAnnotation values to exclude (case-insensitive)
adf.default.warehouse.id.NIntegerNoDefault Bigeye warehouse ID for column-level lineage resolution

Sample properties file (API mode)

environment.name.1=ADF Production
bigeye.host.1=https://app.bigeye.com
bigeye.apikey.1=bigeye_pak_acbdefg123456
bigeye.allowed.workspaces.1=123
adf.connection.type.1=API
adf.tenant.id.1=12345678-abcd-efgh-ijkl-123456789012
adf.client.id.1=abcdefgh-1234-5678-9012-abcdefghijkl
adf.client.secret.1=your-client-secret
adf.subscription.id.1=sub-12345678-abcd-efgh-ijkl
adf.resource.group.1=my-resource-group
adf.factory.name.1=my-adf-factory

Sample properties file (File mode)

environment.name.1=ADF Import
bigeye.host.1=https://app.bigeye.com
bigeye.apikey.1=bigeye_pak_acbdefg123456
bigeye.allowed.workspaces.1=123
adf.connection.type.1=File
adf.pipeline.files.location.1=/path/to/exported/adf/pipelines

Running the Connector

With the Agent CLI (recommended)

# Install and configure the Lineage Plus agent
./bigeye-agent install

# Add the ADF connector
./bigeye-agent add-connector -c adf

# Run the connector
./bigeye-agent lineage run -c adf

With Docker

docker run --rm \
  -v /path/to/config:/app/config \
  --entrypoint bash bigeyedata/source-connector:latest \
  -c "bigeye-connector run -c adf -p /app/config/adf.properties"

Known Limitations

The following ADF patterns are not currently supported:

  • Stored Procedure activities — No source/sink metadata available for lineage extraction
  • Web/Webhook activities — Not data movement activities
  • Runtime expression resolution — Dynamic expressions that require runtime context (for example, parameterized table names in ForEach, IfCondition, or Until activities) have limited support