IBM DataStage Connector
Overview
The IBM DataStage connector captures job metadata and lineage from InfoSphere DataStage. It ingests job definitions, parameters, and dependencies to build end-to-end lineage.
The connector supports two distinct approaches:
-
Approach 1 -- Active Extraction
The connector connects directly to the DataStage metadata repository and engine, invoking DataStage client utilities (dsjob, dsexport) to export job XML definitions, then ingests those XMLs.
-
Approach 2 -- Passive Mode
If job XML files are already available (e.g., exported by administrators or stored in version control), the connector skips active export and directly ingests the files.
Choose the approach that best fits your environment and security model.
Setup Instructions
Approach 1 -- Active Extraction (Direct Connection)
Use this approach when you want the connector to drive DataStage export automatically. The connector should run under a dedicated service account with the lowest privileges needed to read job metadata.
-
Suite Component Role: DataStage and QualityStage User --- required to access DataStage modules.
-
Project Role: DataStage Super Operator --- allows viewing jobs, logs, and objects but not modifying them (sufficient for lineage extraction).
Steps to Create and Assign the Service Account
-
Create the service user in Information Server
-
Log in to the Information Server Web Console.
-
Go to Administration → Users and Groups.
-
Create a new user (e.g., ds_lineage_service).
-
Assign the DataStage and QualityStage User suite role.
-
-
Assign project-level roles
-
Navigate to each target DataStage project.
-
Add the ds_lineage_service account.
-
Assign the DataStage Super Operator role.
-
-
Configure Engine Credentials
-
Go to Administration → Domain Management → Engine Credentials.
-
Map ds_lineage_service to an OS-level account (e.g., ds_exec).
-
Provide the OS account's username/password.
-
Ensure the OS account has:
-
Permission to run DataStage client utilities.
-
Read/write access to XML export directories.
-
"Log on as a batch job" rights (Windows) or equivalent execution rights (Linux).
-
-
Remaining Steps (post service account creation)
-
Install DataStage client utilities (dsjob, dsexport) on the Lineage Plus host.
-
Validate connectivity
-
Ensure JDBC access to the metadata repository.
-
Confirm firewall policies allow shell command execution.
-
-
Configure parameters
-
Set:
-
datastage.db.username.1, datastage.db.password.1, datastage.hibernate.connection.url.1
-
datastage.exe.location.1, datastage.script.1, datastage.command.1, datastage.terminate.command.1
-
datastage.file.extract.1=Yes
-
-
Configure output locations (datastage.job.file.path.1, datastage.project.joblist.path.1).
-
Set datastage.project.name.1 with the project(s) to extract.
-
-
Test script execution
- Run cmd.exe /C dagdatastagerun.bat (Windows) or bash dagdatastagerun.sh (Linux) manually to confirm XML export works.
-
Run the connector
- The connector will execute the script, export job XMLs, and ingest them into lineage.
Approach 2 -- Passive Mode (Admin-Provided Files)
Use this approach if admins provide XML files or you manage your own export process (e.g., through version control).
-
Arrange XML file exports
-
Have DataStage administrators export jobs to XML using Designer, dsjob, or dsexport.
-
Ensure job list and parameter files are also produced if needed.
-
-
Make XMLs accessible
-
Place the XMLs on a shared file system accessible by the connector.
-
Optionally, organize them in a root folder structure for version control.
-
-
Configure parameters
-
Set datastage.file.extract.1=No (disables active export).
-
Point connector to the XML file location:
-
datastage.job.root.folder.1 or datastage.job.file.path.1
-
datastage.project.joblist.path.1 (if using job lists)
-
-
Optional: configure datastage.paramfile.1 and datastage.parameterset.files.1 for parameter resolution.
-
-
Validate access
-
Confirm the service account has read access to the XML directories.
-
Run the connector --- it will skip export and parse existing files directly.
-
Configurable Parameters
Key | Example | Description |
---|---|---|
Database Connection Parameters | ||
datastage.db.username.1 | db_user | Username for the DataStage metadata database. |
datastage.db.password.1 | db_pass | Password for the DataStage metadata database. |
db.type.1 | db2 | Type of database (e.g., db2, oracle). |
datastage.hibernate.connection.url.1 | jdbc:db2://host:50001/xmeta | JDBC URL to the DataStage metadata repository. |
Active Extraction Parameters | ||
datastage.file.extract.1 | Yes | Set to Yes to actively export XML files. |
datastage.command.1 | cmd.exe | Shell or command interpreter to run the DataStage script. |
datastage.terminate.command.1 | /C | Flag telling OS to terminate the process after completion. |
datastage.script.1 | dagdatastagerun.bat | Script used to extract jobs and generate XML exports. |
datastage.exe.location.1 | C:\IBM\InformationServer1\Clients\Classic\ | Path to DataStage client directory containing dsexport , dsjobs , etc. |
datastage.domain.name.1 | VM-DATASTAGE81:9080 | Domain and port of the DataStage installation. |
datastage.server.name.1 | VM-DATASTAGE81 | Server hostname where DataStage is installed. |
datastage.username.1 | datastage_user | DataStage user account. |
datastage.password.1 | datastage_user_password | DataStage user password. |
datastage.project.name.1 | MyFirstProject | Comma-delimited list of projects whose jobs will be processed. |
datastage.project.joblist.path.1 | C:\DataStage\jobs\test<projectname>.txt | Path of text file with the list of jobs to be processed. |
datastage.job.file.path.1 | C:\DataStage\jobs\test<jobname>.xml | Path where job XML files are written. |
Passive Mode Parameters | ||
datastage.file.extract.1 | No | Set to No to disable export and rely on pre-existing XML files. |
datastage.job.root.folder.1 | C:\metacenter_home\resources\DataStage\Jobs\SERVICES | Root folder for recursive job XML file definitions. |
datastage.job.file.path.1 | C:\DataStage\jobs\test<jobname>.xml | Path where job XML files are stored. |
datastage.project.joblist.path.1 | C:\DataStage\jobs\test<projectname>.txt | Path of text file with the list of jobs to be processed. |
datastage.project.name.1 | MyFirstProject | Comma-delimited list of projects whose jobs will be processed. |
Parameter & Variable Resolution | ||
datastage.paramfile.1 | C:\metacenter_home\resources\DataStage\DSParamFiles\DSParams | Directory containing parameter files. |
datastage.parameterset.files.1 | C:\metacenter_home\resources\DataStage\SERVICES\ParameterSets.xml | Path to the project parameter set file. |
datastage.parameterset.valuefile.mapping.1 | *:Db2ConnectDom | Mapping of override value files when not specified in snapshot. |
datastage.job.valuefilename.path.1 | C:\metacenter_home\resources\DataStage\SERVICES\jobs.txt | File listing command-line options for job execution. |
datastage.variables.to.not.resolve.1 | pCurrDate,pSampleDtMinus3 | Variables the snapshot should not resolve. |
Filtering & Exclusion | ||
datastage.jobname.exclude.1 | Copy,Test | Comma-separated list of jobs to exclude. |
datastage.folderpath.exclude.1 | /Developer_,/Jobs/Monthly_ | Comma-separated list of folder paths to exclude (wildcards supported). |
Schema Files & Overrides | ||
datastage.schema.files.location.1 | C:\DataStage\AcctFact\SchemaFiles | Locations where schema files are stored. |
datastage.job.component.schema.file.1 | "regular_inspect.Sequential_File=schemafile.osd" | Overrides schema file for a specific component. |
Lineage Plus Config | ||
environment.name.1 | dev | Logical environment name. |
snapshot.output.1 | metacenter.jcr#Snapshots/<instId>/IBM Datastage/<repo> | Repository path for snapshot output. |
Sample Properties File
datastage.db.username.1=ds_meta_user
datastage.db.password.1=ds_meta_pwd
db.type.1=db2
datastage.hibernate.connection.url.1=jdbc:db2://dsmeta.host:50001/xmeta
datastage.exe.location.1=C:\\IBM\\InformationServer1\\Clients\\Classic\\
datastage.script.1=dagdatastagerun.bat
datastage.terminate.command.1=/C
datastage.command.1=cmd.exe
datastage.file.extract.1=Yes
datastage.domain.name.1=VM-DATASTAGE81:9080
datastage.server.name.1=VM-DATASTAGE81
datastage.username.1=ds_lineage_service
datastage.password.1=secure_pw
environment.name.1=dev
datastage.job.file.path.1=C:\\DataStage\\jobs\\export\\<jobname>.xml
datastage.project.joblist.path.1=C:\\DataStage\\jobs\\export\\<projectname>.txt
datastage.project.name.1=MyFirstProject
snapshot.output.1=metacenter.jcr#Snapshots/inst01/IBM Datastage/RepoA
Troubleshooting
-
Connector cannot log in
- Cause: Wrong suite/project role or missing engine credentials.
- Resolution: Ensure the service account has DataStage and QualityStage User + Super Operator roles, and proper engine credential mapping.
-
Export fails
- Cause: Script not found or OS user lacks permissions.
- Resolution: Validate
datastage.exe.location
, script path, and OS account execution rights.
-
No XML files produced
- Cause: Wrong export directory or filters applied.
- Resolution: Check
datastage.job.file.path
, review filters, and rerun extraction.
-
Some job definitions skipped
- Cause: Exclusion filters or standalone job removal settings.
- Resolution: Review
datastage.folderpath.exclude
,datastage.jobname.exclude
, anddatastage.remove.standalone.jobs
.
-
Connector fails to connect to DB
- Cause: JDBC URL wrong or firewall blocked.
- Resolution: Verify JDBC URL, port, hostname, and network access.
-
Table not found or permission error
- Cause: Insufficient DB privileges.
- Resolution: Grant the service account SELECT rights on required tables (e.g.
DATASTAGEX_XMETAGEN_DSJOBDEFC2E76D84
).
-
Performance slowness
- Cause: Exporting many large jobs.
- Resolution: Limit project scope with filters or run incremental extractions.
Updated about 3 hours ago