Azure Data Factory
The Azure Data Factory (ADF) Lineage Connector enables Bigeye to visualize and understand data movement and transformations within your ADF pipelines, including column-level lineage.
Connection Modes
The connector supports two modes for extracting ADF metadata:
API mode
Connects directly to Azure Data Factory using service principal credentials. This is the recommended approach for automated, recurring lineage collection.
Requirements:
- Azure service principal with read access to your ADF instance
- Tenant ID, Client ID, and Client Secret
- Subscription ID, Resource Group, and Factory Name
File mode
Processes exported ADF pipeline JSON files from a local directory. Use this when direct API access is not available or for one-time imports.
Requirements:
- Exported ADF pipeline definitions (JSON) in a local directory
Supported Capabilities
Lineage patterns
The connector extracts column-level lineage for the following ADF activity types:
| Activity Type | Lineage Support |
|---|---|
| Copy Activity (with explicit column mappings) | Column-level lineage between source and target |
| Copy Activity (passthrough, matching column names) | Inferred column-level lineage |
| Mapping Data Flow | Column-level lineage through dataflow transformations |
| Script Activity (SQL-based) | Column-level lineage from INSERT...SELECT statements |
| ExecutePipeline | Cross-pipeline lineage links |
| ForEach | Lineage for inner activities with static definitions. Dynamic table lists resolved at runtime are not supported. |
| IfCondition | Lineage for activities in each branch |
| Until | Lineage for activities within the loop body |
Metadata extraction
In API mode, the connector also captures:
- Pipeline definitions and folder structure
- Dataset and linked service configurations
- Pipeline run history (last 30 days, up to 5 runs per pipeline)
- Activity execution metrics for successful runs
Filtering
You can control which pipelines are processed using include/exclude filters:
- Pipeline name — Substring matching
- Folder path — Glob pattern matching
- Annotations — Exact match (case-insensitive)
Configuration Parameters
Create a properties file (for example, adf.properties) with your connection configuration:
| Property | Type | Required | Description |
|---|---|---|---|
environment.name.N | String | Yes | Environment identifier |
bigeye.host.N | URL | Yes | Bigeye instance URL |
bigeye.apikey.N | String | Yes | Bigeye API key |
bigeye.allowed.workspaces.N | Integer List | Yes | Comma-separated workspace IDs |
adf.connection.type.N | String | Yes | API or File |
adf.tenant.id.N | String | If API | Azure tenant ID |
adf.client.id.N | String | If API | Azure service principal client ID |
adf.client.secret.N | String | If API | Azure service principal client secret |
adf.subscription.id.N | String | If API | Azure subscription ID |
adf.resource.group.N | String | If API | Azure resource group name |
adf.factory.name.N | String | If API | ADF factory name |
adf.pipeline.files.location.N | Path | If File | Path to directory containing exported pipeline JSON files |
adf.pipelines.include.N | String List | No | Pipeline name substrings to include |
adf.pipelines.exclude.N | String List | No | Pipeline name substrings to exclude |
adf.folders.include.N | String List | No | Folder path glob patterns to include |
adf.folders.exclude.N | String List | No | Folder path glob patterns to exclude |
adf.annotations.include.N | String List | No | Annotation values to include (case-insensitive) |
adf.annotations.exclude.N | String List | No | Annotation values to exclude (case-insensitive) |
adf.default.warehouse.id.N | Integer | No | Default Bigeye warehouse ID for column-level lineage resolution |
Sample properties file (API mode)
environment.name.1=ADF Production
bigeye.host.1=https://app.bigeye.com
bigeye.apikey.1=bigeye_pak_acbdefg123456
bigeye.allowed.workspaces.1=123
adf.connection.type.1=API
adf.tenant.id.1=12345678-abcd-efgh-ijkl-123456789012
adf.client.id.1=abcdefgh-1234-5678-9012-abcdefghijkl
adf.client.secret.1=your-client-secret
adf.subscription.id.1=sub-12345678-abcd-efgh-ijkl
adf.resource.group.1=my-resource-group
adf.factory.name.1=my-adf-factorySample properties file (File mode)
environment.name.1=ADF Import
bigeye.host.1=https://app.bigeye.com
bigeye.apikey.1=bigeye_pak_acbdefg123456
bigeye.allowed.workspaces.1=123
adf.connection.type.1=File
adf.pipeline.files.location.1=/path/to/exported/adf/pipelinesRunning the Connector
With the Agent CLI (recommended)
# Install and configure the Lineage Plus agent
./bigeye-agent install
# Add the ADF connector
./bigeye-agent add-connector -c adf
# Run the connector
./bigeye-agent lineage run -c adfWith Docker
docker run --rm \
-v /path/to/config:/app/config \
--entrypoint bash bigeyedata/source-connector:latest \
-c "bigeye-connector run -c adf -p /app/config/adf.properties"Known Limitations
The following ADF patterns are not currently supported:
- Stored Procedure activities — No source/sink metadata available for lineage extraction
- Web/Webhook activities — Not data movement activities
- Runtime expression resolution — Dynamic expressions that require runtime context (for example, parameterized table names in ForEach, IfCondition, or Until activities) have limited support
Updated 19 days ago
