IBM DataStage Connector

Overview

The IBM DataStage connector captures job metadata and lineage from InfoSphere DataStage. It ingests job definitions, parameters, and dependencies to build end-to-end lineage.

The connector supports two distinct approaches:

  • Approach 1 -- Active Extraction

    The connector connects directly to the DataStage metadata repository and engine, invoking DataStage client utilities (dsjob, dsexport) to export job XML definitions, then ingests those XMLs.

  • Approach 2 -- Passive Mode

    If job XML files are already available (e.g., exported by administrators or stored in version control), the connector skips active export and directly ingests the files.

Choose the approach that best fits your environment and security model.

Setup Instructions

Approach 1 -- Active Extraction (Direct Connection)

Use this approach when you want the connector to drive DataStage export automatically. The connector should run under a dedicated service account with the lowest privileges needed to read job metadata.

  • Suite Component Role: DataStage and QualityStage User --- required to access DataStage modules.

  • Project Role: DataStage Super Operator --- allows viewing jobs, logs, and objects but not modifying them (sufficient for lineage extraction).

Steps to Create and Assign the Service Account

  1. Create the service user in Information Server

    • Log in to the Information Server Web Console.

    • Go to Administration → Users and Groups.

    • Create a new user (e.g., ds_lineage_service).

    • Assign the DataStage and QualityStage User suite role.

  2. Assign project-level roles

    • Navigate to each target DataStage project.

    • Add the ds_lineage_service account.

    • Assign the DataStage Super Operator role.

  3. Configure Engine Credentials

    • Go to Administration → Domain Management → Engine Credentials.

    • Map ds_lineage_service to an OS-level account (e.g., ds_exec).

    • Provide the OS account's username/password.

    • Ensure the OS account has:

      • Permission to run DataStage client utilities.

      • Read/write access to XML export directories.

      • "Log on as a batch job" rights (Windows) or equivalent execution rights (Linux).

Remaining Steps (post service account creation)

  1. Install DataStage client utilities (dsjob, dsexport) on the Lineage Plus host.

  2. Validate connectivity

    • Ensure JDBC access to the metadata repository.

    • Confirm firewall policies allow shell command execution.

  3. Configure parameters

    • Set:

      • datastage.db.username.1, datastage.db.password.1, datastage.hibernate.connection.url.1

      • datastage.exe.location.1, datastage.script.1, datastage.command.1, datastage.terminate.command.1

      • datastage.file.extract.1=Yes

    • Configure output locations (datastage.job.file.path.1, datastage.project.joblist.path.1).

    • Set datastage.project.name.1 with the project(s) to extract.

  4. Test script execution

    • Run cmd.exe /C dagdatastagerun.bat (Windows) or bash dagdatastagerun.sh (Linux) manually to confirm XML export works.
  5. Run the connector

    • The connector will execute the script, export job XMLs, and ingest them into lineage.

Approach 2 -- Passive Mode (Admin-Provided Files)

Use this approach if admins provide XML files or you manage your own export process (e.g., through version control).

  1. Arrange XML file exports

    • Have DataStage administrators export jobs to XML using Designer, dsjob, or dsexport.

    • Ensure job list and parameter files are also produced if needed.

  2. Make XMLs accessible

    • Place the XMLs on a shared file system accessible by the connector.

    • Optionally, organize them in a root folder structure for version control.

  3. Configure parameters

    • Set datastage.file.extract.1=No (disables active export).

    • Point connector to the XML file location:

      • datastage.job.root.folder.1 or datastage.job.file.path.1

      • datastage.project.joblist.path.1 (if using job lists)

    • Optional: configure datastage.paramfile.1 and datastage.parameterset.files.1 for parameter resolution.

  4. Validate access

    • Confirm the service account has read access to the XML directories.

    • Run the connector --- it will skip export and parse existing files directly.

Configurable Parameters

KeyExampleDescription
Database Connection Parameters
datastage.db.username.1db_userUsername for the DataStage metadata database.
datastage.db.password.1db_passPassword for the DataStage metadata database.
db.type.1db2Type of database (e.g., db2, oracle).
datastage.hibernate.connection.url.1jdbc:db2://host:50001/xmetaJDBC URL to the DataStage metadata repository.
Active Extraction Parameters
datastage.file.extract.1YesSet to Yes to actively export XML files.
datastage.command.1cmd.exeShell or command interpreter to run the DataStage script.
datastage.terminate.command.1/CFlag telling OS to terminate the process after completion.
datastage.script.1dagdatastagerun.batScript used to extract jobs and generate XML exports.
datastage.exe.location.1C:\IBM\InformationServer1\Clients\Classic\Path to DataStage client directory containing dsexport, dsjobs, etc.
datastage.domain.name.1VM-DATASTAGE81:9080Domain and port of the DataStage installation.
datastage.server.name.1VM-DATASTAGE81Server hostname where DataStage is installed.
datastage.username.1datastage_userDataStage user account.
datastage.password.1datastage_user_passwordDataStage user password.
datastage.project.name.1MyFirstProjectComma-delimited list of projects whose jobs will be processed.
datastage.project.joblist.path.1C:\DataStage\jobs\test<projectname>.txtPath of text file with the list of jobs to be processed.
datastage.job.file.path.1C:\DataStage\jobs\test<jobname>.xmlPath where job XML files are written.
Passive Mode Parameters
datastage.file.extract.1NoSet to No to disable export and rely on pre-existing XML files.
datastage.job.root.folder.1C:\metacenter_home\resources\DataStage\Jobs\SERVICESRoot folder for recursive job XML file definitions.
datastage.job.file.path.1C:\DataStage\jobs\test<jobname>.xmlPath where job XML files are stored.
datastage.project.joblist.path.1C:\DataStage\jobs\test<projectname>.txtPath of text file with the list of jobs to be processed.
datastage.project.name.1MyFirstProjectComma-delimited list of projects whose jobs will be processed.
Parameter & Variable Resolution
datastage.paramfile.1C:\metacenter_home\resources\DataStage\DSParamFiles\DSParamsDirectory containing parameter files.
datastage.parameterset.files.1C:\metacenter_home\resources\DataStage\SERVICES\ParameterSets.xmlPath to the project parameter set file.
datastage.parameterset.valuefile.mapping.1*:Db2ConnectDomMapping of override value files when not specified in snapshot.
datastage.job.valuefilename.path.1C:\metacenter_home\resources\DataStage\SERVICES\jobs.txtFile listing command-line options for job execution.
datastage.variables.to.not.resolve.1pCurrDate,pSampleDtMinus3Variables the snapshot should not resolve.
Filtering & Exclusion
datastage.jobname.exclude.1Copy,TestComma-separated list of jobs to exclude.
datastage.folderpath.exclude.1/Developer_,/Jobs/Monthly_Comma-separated list of folder paths to exclude (wildcards supported).
Schema Files & Overrides
datastage.schema.files.location.1C:\DataStage\AcctFact\SchemaFilesLocations where schema files are stored.
datastage.job.component.schema.file.1"regular_inspect.Sequential_File=schemafile.osd"Overrides schema file for a specific component.
Lineage Plus Config
environment.name.1devLogical environment name.
snapshot.output.1metacenter.jcr#Snapshots/<instId>/IBM Datastage/<repo>Repository path for snapshot output.

Sample Properties File

datastage.db.username.1=ds_meta_user
datastage.db.password.1=ds_meta_pwd
db.type.1=db2
datastage.hibernate.connection.url.1=jdbc:db2://dsmeta.host:50001/xmeta

datastage.exe.location.1=C:\\IBM\\InformationServer1\\Clients\\Classic\\
datastage.script.1=dagdatastagerun.bat
datastage.terminate.command.1=/C
datastage.command.1=cmd.exe
datastage.file.extract.1=Yes

datastage.domain.name.1=VM-DATASTAGE81:9080
datastage.server.name.1=VM-DATASTAGE81
datastage.username.1=ds_lineage_service
datastage.password.1=secure_pw

environment.name.1=dev
datastage.job.file.path.1=C:\\DataStage\\jobs\\export\\<jobname>.xml
datastage.project.joblist.path.1=C:\\DataStage\\jobs\\export\\<projectname>.txt
datastage.project.name.1=MyFirstProject

snapshot.output.1=metacenter.jcr#Snapshots/inst01/IBM Datastage/RepoA

Troubleshooting

  • Connector cannot log in

    • Cause: Wrong suite/project role or missing engine credentials.
    • Resolution: Ensure the service account has DataStage and QualityStage User + Super Operator roles, and proper engine credential mapping.
  • Export fails

    • Cause: Script not found or OS user lacks permissions.
    • Resolution: Validate datastage.exe.location, script path, and OS account execution rights.
  • No XML files produced

    • Cause: Wrong export directory or filters applied.
    • Resolution: Check datastage.job.file.path, review filters, and rerun extraction.
  • Some job definitions skipped

    • Cause: Exclusion filters or standalone job removal settings.
    • Resolution: Review datastage.folderpath.exclude, datastage.jobname.exclude, and datastage.remove.standalone.jobs.
  • Connector fails to connect to DB

    • Cause: JDBC URL wrong or firewall blocked.
    • Resolution: Verify JDBC URL, port, hostname, and network access.
  • Table not found or permission error

    • Cause: Insufficient DB privileges.
    • Resolution: Grant the service account SELECT rights on required tables (e.g. DATASTAGEX_XMETAGEN_DSJOBDEFC2E76D84).
  • Performance slowness

    • Cause: Exporting many large jobs.
    • Resolution: Limit project scope with filters or run incremental extractions.