IBM DataStage
The IBM DataStage connector captures job metadata and lineage from InfoSphere DataStage. It ingests job definitions, parameters, and dependencies to build end-to-end lineage.
The IBM DataStage connector captures job metadata and lineage from InfoSphere DataStage by ingesting job XML definitions, parameters, and dependencies.
Connection Modes
The connector supports two distinct modes:
- Active extraction — The connector connects directly to the DataStage metadata repository and engine, invoking client utilities (
dsjob,dsexport) to export job XML definitions, then ingests those XMLs. - Passive mode — If job XML files are already available (for example, exported by administrators or stored in version control), the connector skips active export and directly ingests the files.
Choose the approach that best fits your environment and security model.
Setup
Active extraction (direct connection)
Use this approach when you want the connector to drive DataStage export automatically. The connector should run under a dedicated service account with the lowest privileges needed to read job metadata.
- Suite Component Role: DataStage and QualityStage User — required to access DataStage modules
- Project Role: DataStage Super Operator — allows viewing jobs, logs, and objects but not modifying them
Steps to create and assign the service account
-
Create the service user in Information Server
- Log in to the Information Server Web Console
- Go to Administration > Users and Groups
- Create a new user (for example,
ds_lineage_service) - Assign the DataStage and QualityStage User suite role
-
Assign project-level roles
- Navigate to each target DataStage project
- Add the
ds_lineage_serviceaccount - Assign the DataStage Super Operator role
-
Configure engine credentials
- Go to Administration > Domain Management > Engine Credentials
- Map
ds_lineage_serviceto an OS-level account (for example,ds_exec) - Provide the OS account's username/password
- Ensure the OS account has:
- Permission to run DataStage client utilities
- Read/write access to XML export directories
- "Log on as a batch job" rights (Windows) or equivalent execution rights (Linux)
-
Install DataStage client utilities (
dsjob,dsexport) on the Lineage Plus host -
Validate connectivity
- Ensure JDBC access to the metadata repository
- Confirm firewall policies allow shell command execution
-
Test script execution
- Run
cmd.exe /C dagdatastagerun.bat(Windows) orbash dagdatastagerun.sh(Linux) manually to confirm XML export works
- Run
Passive mode (admin-provided files)
Use this approach if admins provide XML files or you manage your own export process.
- Have DataStage administrators export jobs to XML using Designer,
dsjob, ordsexport - Place the XMLs on a shared file system accessible by the connector
- Set
datastage.file.extract.1=Noin your properties file - Point the connector to the XML file location using
datastage.job.root.folderordatastage.job.file.path - Confirm the service account has read access to the XML directories
Configuration Parameters
Create a properties file with your connection configuration:
| Property | Required | Description | Example |
|---|---|---|---|
| Database connection | |||
datastage.db.username.N | Yes | Username for the DataStage metadata database | db_user |
datastage.db.password.N | Yes | Password for the DataStage metadata database | |
db.type.N | Yes | Type of database (db2, oracle) | db2 |
datastage.hibernate.connection.url.N | Yes | JDBC URL to the DataStage metadata repository | jdbc:db2://host:50001/xmeta |
| Extraction mode | |||
datastage.file.extract.N | Yes | Yes for active extraction, No for passive mode | Yes |
datastage.command.N | If active | Shell or command interpreter | cmd.exe |
datastage.terminate.command.N | If active | Flag telling OS to terminate the process after completion | /C |
datastage.script.N | If active | Script used to extract jobs and generate XML exports | dagdatastagerun.bat |
datastage.exe.location.N | If active | Path to DataStage client directory containing dsexport, dsjobs | C:\IBM\InformationServer1\Clients\Classic\ |
datastage.domain.name.N | If active | Domain and port of the DataStage installation | VM-DATASTAGE81:9080 |
datastage.server.name.N | If active | Server hostname where DataStage is installed | VM-DATASTAGE81 |
datastage.username.N | If active | DataStage user account | datastage_user |
datastage.password.N | If active | DataStage user password | |
| Project and job selection | |||
datastage.project.name.N | Yes | Comma-delimited list of projects to process | MyFirstProject |
datastage.project.joblist.path.N | No | Path to text file listing jobs to process | C:\DataStage\jobs\test\<projectname>.txt |
datastage.job.file.path.N | Yes | Path where job XML files are written (active) or stored (passive) | C:\DataStage\jobs\test\<jobname>.xml |
datastage.job.root.folder.N | If passive | Root folder for recursive job XML file discovery | C:\metacenter_home\resources\DataStage\Jobs\ |
| Parameter resolution | |||
datastage.paramfile.N | No | Directory containing parameter files | C:\metacenter_home\resources\DataStage\DSParamFiles\DSParams |
datastage.parameterset.files.N | No | Path to the project parameter set file | C:\metacenter_home\resources\DataStage\ParameterSets.xml |
datastage.parameterset.valuefile.mapping.N | No | Mapping of override value files | *:Db2ConnectDom |
datastage.job.valuefilename.path.N | No | File listing command-line options for job execution | |
datastage.variables.to.not.resolve.N | No | Variables the snapshot should not resolve | pCurrDate,pSampleDtMinus3 |
| Filtering | |||
datastage.jobname.exclude.N | No | Comma-separated list of jobs to exclude | _Copy_,_Test_ |
datastage.folderpath.exclude.N | No | Comma-separated list of folder paths to exclude (wildcards supported) | /Developer_,/Jobs/Monthly_ |
| Schema files | |||
datastage.schema.files.location.N | No | Location where schema files are stored | C:\DataStage\AcctFact\SchemaFiles |
datastage.job.component.schema.file.N | No | Override schema file for a specific component | regular_inspect.Sequential_File=schemafile.osd |
| Output | |||
environment.name.N | Yes | Logical environment name | dev |
snapshot.output.N | Yes | Repository path for snapshot output | metacenter.jcr#Snapshots/<instId>/IBM Datastage/<repo> |
Sample Properties File
datastage.db.username.1=ds_meta_user
datastage.db.password.1=${DS_DB_PASSWORD}
db.type.1=db2
datastage.hibernate.connection.url.1=jdbc:db2://dsmeta.host:50001/xmeta
datastage.exe.location.1=C:\\IBM\\InformationServer1\\Clients\\Classic\\
datastage.script.1=dagdatastagerun.bat
datastage.terminate.command.1=/C
datastage.command.1=cmd.exe
datastage.file.extract.1=Yes
datastage.domain.name.1=VM-DATASTAGE81:9080
datastage.server.name.1=VM-DATASTAGE81
datastage.username.1=ds_lineage_service
datastage.password.1=${DS_PASSWORD}
environment.name.1=dev
datastage.job.file.path.1=C:\\DataStage\\jobs\\export\\<jobname>.xml
datastage.project.joblist.path.1=C:\\DataStage\\jobs\\export\\<projectname>.txt
datastage.project.name.1=MyFirstProject
snapshot.output.1=metacenter.jcr#Snapshots/inst01/IBM Datastage/RepoATroubleshooting
- Connector cannot log in — Ensure the service account has DataStage and QualityStage User and Super Operator roles, and proper engine credential mapping.
- Export fails — Validate
datastage.exe.location, script path, and OS account execution rights. - No XML files produced — Check
datastage.job.file.path, review filters, and rerun extraction. - Some job definitions skipped — Review
datastage.folderpath.exclude,datastage.jobname.exclude, anddatastage.remove.standalone.jobs. - Cannot connect to DB — Verify JDBC URL, port, hostname, and network access.
- Table not found or permission error — Grant the service account SELECT rights on required tables.
- Performance slowness — Limit project scope with filters or run incremental extractions.
Updated 8 days ago
