IBM DataStage

The IBM DataStage connector captures job metadata and lineage from InfoSphere DataStage. It ingests job definitions, parameters, and dependencies to build end-to-end lineage.

The IBM DataStage connector captures job metadata and lineage from InfoSphere DataStage by ingesting job XML definitions, parameters, and dependencies.

Connection Modes

The connector supports two distinct modes:

  • Active extraction — The connector connects directly to the DataStage metadata repository and engine, invoking client utilities (dsjob, dsexport) to export job XML definitions, then ingests those XMLs.
  • Passive mode — If job XML files are already available (for example, exported by administrators or stored in version control), the connector skips active export and directly ingests the files.

Choose the approach that best fits your environment and security model.

Setup

Active extraction (direct connection)

Use this approach when you want the connector to drive DataStage export automatically. The connector should run under a dedicated service account with the lowest privileges needed to read job metadata.

  • Suite Component Role: DataStage and QualityStage User — required to access DataStage modules
  • Project Role: DataStage Super Operator — allows viewing jobs, logs, and objects but not modifying them

Steps to create and assign the service account

  1. Create the service user in Information Server

    • Log in to the Information Server Web Console
    • Go to Administration > Users and Groups
    • Create a new user (for example, ds_lineage_service)
    • Assign the DataStage and QualityStage User suite role
  2. Assign project-level roles

    • Navigate to each target DataStage project
    • Add the ds_lineage_service account
    • Assign the DataStage Super Operator role
  3. Configure engine credentials

    • Go to Administration > Domain Management > Engine Credentials
    • Map ds_lineage_service to an OS-level account (for example, ds_exec)
    • Provide the OS account's username/password
    • Ensure the OS account has:
      • Permission to run DataStage client utilities
      • Read/write access to XML export directories
      • "Log on as a batch job" rights (Windows) or equivalent execution rights (Linux)
  4. Install DataStage client utilities (dsjob, dsexport) on the Lineage Plus host

  5. Validate connectivity

    • Ensure JDBC access to the metadata repository
    • Confirm firewall policies allow shell command execution
  6. Test script execution

    • Run cmd.exe /C dagdatastagerun.bat (Windows) or bash dagdatastagerun.sh (Linux) manually to confirm XML export works

Passive mode (admin-provided files)

Use this approach if admins provide XML files or you manage your own export process.

  1. Have DataStage administrators export jobs to XML using Designer, dsjob, or dsexport
  2. Place the XMLs on a shared file system accessible by the connector
  3. Set datastage.file.extract.1=No in your properties file
  4. Point the connector to the XML file location using datastage.job.root.folder or datastage.job.file.path
  5. Confirm the service account has read access to the XML directories

Configuration Parameters

Create a properties file with your connection configuration:

PropertyRequiredDescriptionExample
Database connection
datastage.db.username.NYesUsername for the DataStage metadata databasedb_user
datastage.db.password.NYesPassword for the DataStage metadata database
db.type.NYesType of database (db2, oracle)db2
datastage.hibernate.connection.url.NYesJDBC URL to the DataStage metadata repositoryjdbc:db2://host:50001/xmeta
Extraction mode
datastage.file.extract.NYesYes for active extraction, No for passive modeYes
datastage.command.NIf activeShell or command interpretercmd.exe
datastage.terminate.command.NIf activeFlag telling OS to terminate the process after completion/C
datastage.script.NIf activeScript used to extract jobs and generate XML exportsdagdatastagerun.bat
datastage.exe.location.NIf activePath to DataStage client directory containing dsexport, dsjobsC:\IBM\InformationServer1\Clients\Classic\
datastage.domain.name.NIf activeDomain and port of the DataStage installationVM-DATASTAGE81:9080
datastage.server.name.NIf activeServer hostname where DataStage is installedVM-DATASTAGE81
datastage.username.NIf activeDataStage user accountdatastage_user
datastage.password.NIf activeDataStage user password
Project and job selection
datastage.project.name.NYesComma-delimited list of projects to processMyFirstProject
datastage.project.joblist.path.NNoPath to text file listing jobs to processC:\DataStage\jobs\test\<projectname>.txt
datastage.job.file.path.NYesPath where job XML files are written (active) or stored (passive)C:\DataStage\jobs\test\<jobname>.xml
datastage.job.root.folder.NIf passiveRoot folder for recursive job XML file discoveryC:\metacenter_home\resources\DataStage\Jobs\
Parameter resolution
datastage.paramfile.NNoDirectory containing parameter filesC:\metacenter_home\resources\DataStage\DSParamFiles\DSParams
datastage.parameterset.files.NNoPath to the project parameter set fileC:\metacenter_home\resources\DataStage\ParameterSets.xml
datastage.parameterset.valuefile.mapping.NNoMapping of override value files*:Db2ConnectDom
datastage.job.valuefilename.path.NNoFile listing command-line options for job execution
datastage.variables.to.not.resolve.NNoVariables the snapshot should not resolvepCurrDate,pSampleDtMinus3
Filtering
datastage.jobname.exclude.NNoComma-separated list of jobs to exclude_Copy_,_Test_
datastage.folderpath.exclude.NNoComma-separated list of folder paths to exclude (wildcards supported)/Developer_,/Jobs/Monthly_
Schema files
datastage.schema.files.location.NNoLocation where schema files are storedC:\DataStage\AcctFact\SchemaFiles
datastage.job.component.schema.file.NNoOverride schema file for a specific componentregular_inspect.Sequential_File=schemafile.osd
Output
environment.name.NYesLogical environment namedev
snapshot.output.NYesRepository path for snapshot outputmetacenter.jcr#Snapshots/<instId>/IBM Datastage/<repo>

Sample Properties File

datastage.db.username.1=ds_meta_user
datastage.db.password.1=${DS_DB_PASSWORD}
db.type.1=db2
datastage.hibernate.connection.url.1=jdbc:db2://dsmeta.host:50001/xmeta

datastage.exe.location.1=C:\\IBM\\InformationServer1\\Clients\\Classic\\
datastage.script.1=dagdatastagerun.bat
datastage.terminate.command.1=/C
datastage.command.1=cmd.exe
datastage.file.extract.1=Yes

datastage.domain.name.1=VM-DATASTAGE81:9080
datastage.server.name.1=VM-DATASTAGE81
datastage.username.1=ds_lineage_service
datastage.password.1=${DS_PASSWORD}

environment.name.1=dev
datastage.job.file.path.1=C:\\DataStage\\jobs\\export\\<jobname>.xml
datastage.project.joblist.path.1=C:\\DataStage\\jobs\\export\\<projectname>.txt
datastage.project.name.1=MyFirstProject

snapshot.output.1=metacenter.jcr#Snapshots/inst01/IBM Datastage/RepoA

Troubleshooting

  • Connector cannot log in — Ensure the service account has DataStage and QualityStage User and Super Operator roles, and proper engine credential mapping.
  • Export fails — Validate datastage.exe.location, script path, and OS account execution rights.
  • No XML files produced — Check datastage.job.file.path, review filters, and rerun extraction.
  • Some job definitions skipped — Review datastage.folderpath.exclude, datastage.jobname.exclude, and datastage.remove.standalone.jobs.
  • Cannot connect to DB — Verify JDBC URL, port, hostname, and network access.
  • Table not found or permission error — Grant the service account SELECT rights on required tables.
  • Performance slowness — Limit project scope with filters or run incremental extractions.