IBM DataStage Connector

Overview

The IBM DataStage connector captures job metadata and lineage from InfoSphere DataStage. It ingests job definitions, parameters, and dependencies to build end-to-end lineage.

The connector supports two distinct approaches:

Approach 1 -- Active Extraction

The connector connects directly to the DataStage metadata repository and engine, invoking DataStage client utilities (dsjob, dsexport) to export job XML definitions, then ingests those XMLs.
Approach 2 -- Passive Mode

If job XML files are already available (e.g., exported by administrators or stored in version control), the connector skips active export and directly ingests the files.

Choose the approach that best fits your environment and security model.

Setup Instructions

Approach 1 -- Active Extraction (Direct Connection)

Use this approach when you want the connector to drive DataStage export automatically. The connector should run under a dedicated service account with the lowest privileges needed to read job metadata.

Suite Component Role: DataStage and QualityStage User --- required to access DataStage modules.
Project Role: DataStage Super Operator --- allows viewing jobs, logs, and objects but not modifying them (sufficient for lineage extraction).

Steps to Create and Assign the Service Account

Create the service user in Information Server
- Log in to the Information Server Web Console.
- Go to Administration → Users and Groups.
- Create a new user (e.g., ds_lineage_service).
- Assign the DataStage and QualityStage User suite role.
Assign project-level roles
- Navigate to each target DataStage project.
- Add the ds_lineage_service account.
- Assign the DataStage Super Operator role.
Configure Engine Credentials
- Go to Administration → Domain Management → Engine Credentials.
- Map ds_lineage_service to an OS-level account (e.g., ds_exec).
- Provide the OS account's username/password.
- Ensure the OS account has:
  - Permission to run DataStage client utilities.
  - Read/write access to XML export directories.
  - "Log on as a batch job" rights (Windows) or equivalent execution rights (Linux).

Remaining Steps (post service account creation)

Install DataStage client utilities (dsjob, dsexport) on the Lineage Plus host.
Validate connectivity
- Ensure JDBC access to the metadata repository.
- Confirm firewall policies allow shell command execution.
Configure parameters
- Set:
  - datastage.db.username.1, datastage.db.password.1, datastage.hibernate.connection.url.1
  - datastage.exe.location.1, datastage.script.1, datastage.command.1, datastage.terminate.command.1
  - datastage.file.extract.1=Yes
- Configure output locations (datastage.job.file.path.1, datastage.project.joblist.path.1).
- Set datastage.project.name.1 with the project(s) to extract.
Test script execution
- Run cmd.exe /C dagdatastagerun.bat (Windows) or bash dagdatastagerun.sh (Linux) manually to confirm XML export works.
Run the connector
- The connector will execute the script, export job XMLs, and ingest them into lineage.

Approach 2 -- Passive Mode (Admin-Provided Files)

Use this approach if admins provide XML files or you manage your own export process (e.g., through version control).

Arrange XML file exports
- Have DataStage administrators export jobs to XML using Designer, dsjob, or dsexport.
- Ensure job list and parameter files are also produced if needed.
Make XMLs accessible
- Place the XMLs on a shared file system accessible by the connector.
- Optionally, organize them in a root folder structure for version control.
Configure parameters
- Set datastage.file.extract.1=No (disables active export).
- Point connector to the XML file location:
  - datastage.job.root.folder.1 or datastage.job.file.path.1
  - datastage.project.joblist.path.1 (if using job lists)
- Optional: configure datastage.paramfile.1 and datastage.parameterset.files.1 for parameter resolution.
Validate access
- Confirm the service account has read access to the XML directories.
- Run the connector --- it will skip export and parse existing files directly.

Configurable Parameters

Key	Example	Description
Database Connection Parameters
datastage.db.username.1	db_user	Username for the DataStage metadata database.
datastage.db.password.1	db_pass	Password for the DataStage metadata database.
db.type.1	db2	Type of database (e.g., db2, oracle).
datastage.hibernate.connection.url.1	jdbc:db2://host:50001/xmeta	JDBC URL to the DataStage metadata repository.
Active Extraction Parameters
datastage.file.extract.1	Yes	Set to Yes to actively export XML files.
datastage.command.1	cmd.exe	Shell or command interpreter to run the DataStage script.
datastage.terminate.command.1	/C	Flag telling OS to terminate the process after completion.
datastage.script.1	dagdatastagerun.bat	Script used to extract jobs and generate XML exports.
datastage.exe.location.1	C:\IBM\InformationServer1\Clients\Classic\	Path to DataStage client directory containing `dsexport`, `dsjobs`, etc.
datastage.domain.name.1	VM-DATASTAGE81:9080	Domain and port of the DataStage installation.
datastage.server.name.1	VM-DATASTAGE81	Server hostname where DataStage is installed.
datastage.username.1	datastage_user	DataStage user account.
datastage.password.1	datastage_user_password	DataStage user password.
datastage.project.name.1	MyFirstProject	Comma-delimited list of projects whose jobs will be processed.
datastage.project.joblist.path.1	C:\DataStage\jobs\test<projectname>.txt	Path of text file with the list of jobs to be processed.
datastage.job.file.path.1	C:\DataStage\jobs\test<jobname>.xml	Path where job XML files are written.
Passive Mode Parameters
datastage.file.extract.1	No	Set to No to disable export and rely on pre-existing XML files.
datastage.job.root.folder.1	C:\metacenter_home\resources\DataStage\Jobs\SERVICES	Root folder for recursive job XML file definitions.
datastage.job.file.path.1	C:\DataStage\jobs\test<jobname>.xml	Path where job XML files are stored.
datastage.project.joblist.path.1	C:\DataStage\jobs\test<projectname>.txt	Path of text file with the list of jobs to be processed.
datastage.project.name.1	MyFirstProject	Comma-delimited list of projects whose jobs will be processed.
Parameter & Variable Resolution
datastage.paramfile.1	C:\metacenter_home\resources\DataStage\DSParamFiles\DSParams	Directory containing parameter files.
datastage.parameterset.files.1	C:\metacenter_home\resources\DataStage\SERVICES\ParameterSets.xml	Path to the project parameter set file.
datastage.parameterset.valuefile.mapping.1	*:Db2ConnectDom	Mapping of override value files when not specified in snapshot.
datastage.job.valuefilename.path.1	C:\metacenter_home\resources\DataStage\SERVICES\jobs.txt	File listing command-line options for job execution.
datastage.variables.to.not.resolve.1	pCurrDate,pSampleDtMinus3	Variables the snapshot should not resolve.
Filtering & Exclusion
datastage.jobname.exclude.1	Copy,Test	Comma-separated list of jobs to exclude.
datastage.folderpath.exclude.1	/Developer_,/Jobs/Monthly_	Comma-separated list of folder paths to exclude (wildcards supported).
Schema Files & Overrides
datastage.schema.files.location.1	C:\DataStage\AcctFact\SchemaFiles	Locations where schema files are stored.
datastage.job.component.schema.file.1	"regular_inspect.Sequential_File=schemafile.osd"	Overrides schema file for a specific component.
Lineage Plus Config
environment.name.1	dev	Logical environment name.
snapshot.output.1	`metacenter.jcr#Snapshots/<instId>/IBM Datastage/<repo>`	Repository path for snapshot output.

Sample Properties File

datastage.db.username.1=ds_meta_user
datastage.db.password.1=ds_meta_pwd
db.type.1=db2
datastage.hibernate.connection.url.1=jdbc:db2://dsmeta.host:50001/xmeta

datastage.exe.location.1=C:\\IBM\\InformationServer1\\Clients\\Classic\\
datastage.script.1=dagdatastagerun.bat
datastage.terminate.command.1=/C
datastage.command.1=cmd.exe
datastage.file.extract.1=Yes

datastage.domain.name.1=VM-DATASTAGE81:9080
datastage.server.name.1=VM-DATASTAGE81
datastage.username.1=ds_lineage_service
datastage.password.1=secure_pw

environment.name.1=dev
datastage.job.file.path.1=C:\\DataStage\\jobs\\export\\<jobname>.xml
datastage.project.joblist.path.1=C:\\DataStage\\jobs\\export\\<projectname>.txt
datastage.project.name.1=MyFirstProject

snapshot.output.1=metacenter.jcr#Snapshots/inst01/IBM Datastage/RepoA

Troubleshooting

Connector cannot log in
- Cause: Wrong suite/project role or missing engine credentials.
- Resolution: Ensure the service account has DataStage and QualityStage User + Super Operator roles, and proper engine credential mapping.
Export fails
- Cause: Script not found or OS user lacks permissions.
- Resolution: Validate datastage.exe.location, script path, and OS account execution rights.
No XML files produced
- Cause: Wrong export directory or filters applied.
- Resolution: Check datastage.job.file.path, review filters, and rerun extraction.
Some job definitions skipped
- Cause: Exclusion filters or standalone job removal settings.
- Resolution: Review datastage.folderpath.exclude, datastage.jobname.exclude, and datastage.remove.standalone.jobs.
Connector fails to connect to DB
- Cause: JDBC URL wrong or firewall blocked.
- Resolution: Verify JDBC URL, port, hostname, and network access.
Table not found or permission error
- Cause: Insufficient DB privileges.
- Resolution: Grant the service account SELECT rights on required tables (e.g. DATASTAGEX_XMETAGEN_DSJOBDEFC2E76D84).
Performance slowness
- Cause: Exporting many large jobs.
- Resolution: Limit project scope with filters or run incremental extractions.