IBM InfoSphere DataStage · DSX / ISX · Parser Engine

Everything MigryX ingests from DataStage, the engine that converts it, and every target it produces.

DataStage Jobs & Exports

DSX (.dsx) — parallel & server jobs
ISX (.isx) interchange & bundles
Job sequences & orchestration
Containers (local / shared)
Stages, links, constraints & pivots
Parameter sets, job & env parameters
Before / after subroutines & routines
Director job runs & schedules
Connector & ODBC stages

Data Sources & Landing Zones

Oracle, SQL Server, Snowflake
Teradata, Db2, Netezza
Sequential files, datasets & blobs
Redshift, BigQuery, ADLS/S3
InfoSphere connectors & ODBC
Lineage across lookups, merges & joins

→

MigryX Parser

Deployment

dbt
Airflow
Openflow
Git / CI

→

Python Ecosystem

PySpark
Snowpark
Databricks
Dataproc
Fabric
EMR
Cloudera

Modern Warehouse

Snowflake
BigQuery
Fabric
Databricks
Redshift
Teradata
Iceberg

Migration Process

Analyze and Insights

Automatic assessment of DataStage jobs and DSX/ISX assets for rationalization and migration planning
Comprehensive dependency mapping with data and file lineage
Development of required frameworks and standards
Workflow complexity analysis, tool chains, and usage signals
Rationalize and standardize DataStage ETL and job patterns

Convert and Migrate

Automated translation of DataStage jobs (from DSX/ISX) to Python, SQL, and Spark with modernization
Multi code conversion with enhanced optimization and unit testing
Metadata preservation and comprehensive documentation
Visual execution on Databricks, Snowflake, and cloud platforms
Native integration with DBT, Airflow & Git

Test and Validate

End to end automated testing of data pipelines
Comprehensive data validation and schema mapping
Side by side output comparison and metrics validation
Test data generation and cut over preparation
Partitioned validation with automated error detection

🚀 Go Live and Hyper Care Streamlined transition with dedicated support and monitoring to ensure optimal performance

🧭

COMPASS

Migration Intelligence Platform

Understand Your DataStage Estate Before You Migrate

COMPASS scans your DataStage footprint — .dsx job exports, .isx interchange, parallel and server jobs, job sequences, shared containers, and stage-level dependencies — then classifies each asset as MIGRATE, ARCHIVE, or DELETE. Convert only what matters. Archive the rest. Delete the noise.

The Migration Challenge

DataStage estates grow for years — overlapping parallel jobs, copy-pasted server jobs, duplicate DSX drops, stale sequences, and unclear container reuse. Moving off Designer and Director without inventory is high risk.

📊 Massive Scale

Hundreds of thousands of files with no visibility into what's actively used.

💸 Mounting Costs

Legacy licenses and storage costs growing — much of it stale, unused data.

❓ Unknown Dependencies

Complex web of programs and datasets. What connects to what is a major challenge.

⚠️ Migration Risk

Can't migrate everything at once. Need data-driven decisions, not guesswork.

🕒 Time Pressure

Manual analysis takes months. Business needs a clear, prioritized plan — fast.

📉 No Visibility

Which assets drive value? Which are technical debt? Unknown without automation.

COMPASS Solves This

One system to scan, score, and classify every DataStage asset automatically — from DSX/ISX inputs and Designer project folders to Director job runs and logs.

🔍

Complete Inventory

Automated scanning of every file, every program, every execution log. Build a comprehensive catalog with metadata, dependencies, and usage patterns.

🎯

Smart Recommendations

Intelligent scoring engine evaluates each asset on multiple criteria. Get clear MIGRATE, ARCHIVE, or DELETE decisions with phased priorities.

📊

Dependency Mapping

Parse code to extract relationships and external references. Understand full impact before making any change.

💰

Cost Optimization

Identify archival and cleanup opportunities. Project savings across storage, licensing, and cloud migration costs before committing.

Intelligent Classification Logic

Multi-factor scoring balances cost reduction with migration speed

✅ Increases Migration Priority

Recent access — used in last 6 months
High frequency — accessed regularly in execution logs
Small size — easy to migrate quickly and cheaply
SQL-heavy — simpler, faster conversion path
Low complexity — fewer dependencies, lower risk
Error-free — clean execution history, no fixes needed

⚠️ Decreases Migration Priority

Stale data — no access in 2+ years → ARCHIVE candidate
Never used — zero execution in logs → DELETE candidate
Large files — higher migration cost and risk, later phases
High complexity — many dependencies, careful planning needed
Execution errors — must be fixed or reviewed before migration
Orphaned — no dependents found → safe to delete

Classification Output

5

MIGRATE — Critical

High value, low effort → Phase 1

Daily reports, frequently accessed data, SQL-heavy code, business-critical assets

3–4

MIGRATE — Standard

Active workloads → Phase 2–3

Regular usage, moderate complexity, important but not critical

A

Typical Customer Outcomes

What teams discover when they inventory DataStage (.dsx / .isx, parallel & server jobs) before a platform change

60–80%

Data Reduction

Archive or cleanup candidates

5–10×

Analysis Speed

vs. manual assessment

Hours

To Complete Scan

Not weeks or months

100%

Asset Coverage

Complete inventory visibility

Built for Enterprise Scale Production-ready. Handles real-world complexity.

01

File Scanner

High-performance parallel processing. Hashing detects duplicates automatically.

02

Code Parser

Parses DSX/ISX job definitions, stage graphs, Transformer expressions, and embedded SQL with high accuracy.

03

Log Analyzer

Parses execution logs to track usage patterns, errors, and performance.

04

Usage Tracking

Evidence-based decisions: which assets are active vs. never accessed.

05

Migration Scoring

Multi-factor algorithm: usage, recency, size, complexity, execution quality.

06

Phased Planning

Priority-based phase assignment optimized for risk and business continuity.

07

Rich Reports

Executive dashboards, migration plans, cost projections, exportable data.

08

Queryable Database

All analysis stored structurally. Run custom queries for any ad-hoc need.

Simple Process. Powerful Results.

From deployment to actionable migration intelligence — fast.

1

Configure & Scan

Point COMPASS at Designer project exports, DSX (.dsx) bundles, ISX (.isx) drops, or source-control checkouts. Automated scanning begins immediately.

2

Analyze & Score

Parse job XML, stage links and constraints, analyze Director logs, build a dependency graph. Score every job for priority.

3

Review & Decide

Interactive reports show exactly what to migrate, archive, or delete.

4

Execute & Save

Follow the phased plan. Track progress and realize immediate cost savings.

Analyze. Inventory. Lineage.

Scan IBM DataStage parallel and server jobs from .dsx exports, .isx interchange, and Designer project metadata to build a complete inventory. Discover Transformer and Lookup chains, shared containers, partition/sort keys, and parameter propagation — plus fan-in or fan-out hot spots. Produce visual lineage and impact maps that guide the entire migration.

Inventory jobs, sequences, containers, links, and datasets
Dependency mapping with visual lineage (file + data + stages)
Complexity and usage signals from Designer and Director

InventoryLineageComplexityValidationRisk

Visual lineage map — Visual lineage. Precise dependency graph.

Convert. Generate modern code.

Parser conversion turns DataStage job logic (from DSX/ISX) — Transformer, Lookup, Merge, Copy, and connector stages — into Python, PySpark, Snowpark, and SQL for Snowflake, Databricks, BigQuery, Redshift, and Fabric. All translations are explainable and auditable.

Interprets parallel layouts, partitions, Lookup links, and Transformer logic for matched outputs
Translated workflows to notebooks and pipelines
Auto documentation for each converted artifact

PythonPySparkSnowparkSQLTemplatesAuto docs

Targets we generate — Python and PySpark. Snowpark and SQL.

Execute. Orchestrate pipelines.

Run converted workloads in the right order with a driver notebook or job runner. Standardize on Delta and cloud storage, schedule, monitor, and auto retry with centralized logs and metrics.

Visual execution on Databricks, Snowflake
Native integration with DBT, Airflow, Git
Validate results and capture lineage

Visual orchestrationSchedulingRetriesLogsCI ready

Execution orchestration — Visual execution with centralized logs.

Validate. Prove parity.

Partitioned validation compares row level and aggregate outputs between legacy and modern systems. Automatic schema checks, data matching reports, and exception trails give confidence to go live.

Visual execute to Snowflake and Databricks. Shows Visual lineage along with the live code in a direct session. You see each step and the exact stop point.
Streamlines troubleshooting, cuts retesting, provides audit ready logs, lowers engineering and compute costs.
Lower risk. Visual Lineage shows upstream and downstream impact, so teams retest only what matters.

Row countsCommon columnsMismatched columnsEvidence

Data matching validation — Data matching. Evidence your stakeholders trust.

Merlin AI. Assist and accelerate.

Context aware assistance that knows your inventory, lineage, and conversion plans. Generate unit tests, explain diffs, suggest mappings, and draft notebooks with your rules applied.

Inline explanations for converted modules
Debug errors, and improve efficiency
Enterprise safe. Runs in your environment

Inline explainsMapping assistTest scaffoldSecure in your env

Merlin AI assistant — Developer assist powered by your context.

Execution

Visual Execution

Visual execution runs directly on Snowflake and Databricks, combining lineage and live code in one workspace with a direct warehouse session and step-by-step visibility to any failure point.

Visual execute to Snowflake and Databricks. One view shows visual lineage along with live code with a direct session. You see each step and the exact stop point.
Streamlines troubleshooting, cuts retesting, provides audit ready logs, lowers engineering and compute costs.
Lower risk. Visual Lineage shows upstream and downstream impact, so teams retest only what matters.

Visual Execution on Snowflake and Databricks

Modules

DataStage migration across the full lifecycle

Code Analysis

Assess thousands of parallel and server jobs from DSX and ISX inputs, map complexity across stages and containers, and flag readiness. Get clear scope, a prioritized plan, safer cutovers, and faster production.

Visual Lineage

Visualize stages, links, constraints, and SQL across jobs, sequences, and shared containers. Speeds impact checks, lowers migration risk, supports audits, and proves outputs match.

Automated DataStage conversion to Python and Snowpark

Code Conversion

Convert DataStage jobs into Python, PySpark, Snowpark, or SQL with matched outputs — preserving Transformer derivations, lookups, and pushdown SQL where it makes sense. Modernize faster, keep logic intact, and avoid risky rewrites.

Jupyter notebooks for validation and development

Data Mapper

Automatically map legacy schemas to Snowflake or Databricks with clear mappings. Cut migration risk, enforce naming and data types, and get audit-ready visibility.

Auto Docs

Automatic documentation captures your DataStage jobs and the new target code, detailing stage types, job parameters, environment variables, and cross-job dependencies for clear traceability.

Data Matching

Compares source and target outputs at scale using configurable keys and rules. Flags mismatches, duplicates, and gaps with actionable reports for fast fixes.

Source: IBM DataStage

This page is dedicated to migrating IBM DataStage assets — DSX (.dsx) job exports, ISX (.isx) interchange, parallel and server jobs, job sequences, shared containers, and Designer/Director metadata — into modern Python and cloud targets. Need other legacy engines? See the full platform.

DSX exports

Parallel & server jobs, sequences, containers (.dsx)

ISX & metadata

Interchange bundles, lineage, project objects (.isx)

Runtime & Director

Schedules, credentials, run logs, job history

Targets we generate

Python (Pandas), PySpark, Snowflake/Snowpark, Databricks, and cloud platforms.

PySpark

Distributed DataFrame and SQL workloads

Snowpark

Python APIs for Snowflake compute

Databricks

Delta Lake pipelines and notebooks

Dataproc

Managed Spark on Google Cloud

Fabric

Microsoft Fabric Lakehouse and pipelines

EMR

AWS EMR Spark and Hive workloads

Cloudera

On‑prem or hybrid Hadoop distributions

Deployment

Simple, secure, on premise deployment

Everything runs inside your network. No external connections. No data leaves your environment in any scenario.

Security posture

Fully air gapped operation supported.
Outbound connections none. External API calls none.
All processing occurs inside the container and host network.
SSL for VS Code, Jupyter, nginx proxy, and backend API.
Local PostgreSQL only. Logs stored on local disk.

Self-Service Pilot Options

Run the Pilot Yourself

No consultant. No RFP. Install MigryX in your environment, convert real DataStage jobs from DSX/ISX, and see results — on your own terms.

Self-service. No consultant needed.Deploy yourself in under an hour

Convert. Generate modern code.Run on your own schedule. Iterate freely.

All data stays in your network.Air-gapped capable. Zero external calls.

Migration Readiness

1 week

Discovery & Insights

Scope: 100K LoC - Unlimited
Deliverables: Inventory workflows, macros, and configs. Map dependencies with visual data and file lineage. Analyze complexity with block labels and LoC.
Reports: Inventory, visual lineage, and risk assessment. share via HTML reports
Access: Self-service. No consultant required. Runs entirely in your environment.

Full Pilot

4 to 6 weeks

End-to-end

Scope: Discovery, plus 10K LoC across legacy programs or workflows.
Deliverables: Discovery, plus pilot code conversion and data matching to the target system.
Reports: Discovery, plus data matching, validation and enterprise data workflows.
Access: Self-service. No consultant required. Runs entirely in your environment.

Large Scale Pilot

2 to 4 months

Enterprise

Scope: Same as end-to-end, but with larger sets of legacy data and programs for discovery, convertion, validation and execution to modern workloads.
Deliverables: Same as end-to-end
Reports: Same as end-to-end
Access: Self-service. No consultant required. Runs entirely in your environment.

Type	Migration Readiness	Full Pilot	Large Scale Pilot
Discovery	100,000 LoC	100,000 LoC	1 Million LoC
Conversion	N/A	10,000 LoC	100,000 LoC
Duration	1 week	4 to 6 weeks	2 to 4 months
Deliverables	Project reports Risk analysis	Full reports Executed code	Full reports Executed code
Reports	Inventory,lineage,risk	Full project	Full project and JCL
Execution	In your environment	In your environment	In your environment

These pilots are fully self-service — install, run, and evaluate independently inside your own environment. No external consultants required. Pricing and scope can be adjusted to match complexity and urgency.

Reports

Project Reports and JCL Reports

Project Reports

A compact view of what exists, how it connects, and where risk lives.

Inventory Lineage Complexity Validation Risk

Inventory summary. Files and jobs counted. Macros and includes detected. Datasets referenced.
Dependency map. Fan in and fan out. Critical hubs identified. External calls flagged.
Complexity and risk. Pattern difficulty score. Unsupported items. Remediation priority.
Validation status. Errors and warnings. Coverage progress. Open issues.

JCL Reports

StepsPROCsDD statementsSchedulesDatasetsReadiness

End to end view of JCL structure, datasets, and run control with conversion readiness.

Job flow. Step order. PROC usage. Condition codes.
Datasets and lineage. Reads and writes. Temporary and persisted. Upstream and downstream.
Control and schedule. Triggers and dependencies. Calendars if present. Restart points.
Conversion readiness. Unsupported patterns. Parameterization needs. Proposed target control.

Datasheets

Snowflake

MigryX → Snowflake datasheet (PDF)

Platform overview

General MigryX datasheet (PDF)

Architecture

How MigryX fits your DataStage migration

Deployment

Install on your servers or VMs. Optionally deploy inside Kubernetes or OpenShift. Use private cloud networks only.

Connectors

Secure connectors to Snowflake, Databricks, BigQuery, and Redshift. Keys managed by you.

Storage

Project data stored inside your boundary. Logs and evidence live in your storage accounts.

Security and compliance

Private by design. You hold the keys.

Data residency

Run on premise or inside your private cloud. No data leaves your boundary.

Access control

Role based access. SSO and MFA integration. Fine grained permissions.

Auditability

Every action is logged. Evidence packs for internal and external reviews.

Governance

Templates, naming, and coding standards enforced at generate time.

Backups

Project backup and restore under your policies.

Isolation

No shared services. Your environment only.

FAQ

Answers to common questions

Where does MigryX run

Inside your environment. On your hardware or private cloud. You hold the keys.

What code is produced

Python, PySpark, Snowpark, SQL, DBT models, and Databricks notebooks with comments and mapping sheets.

How do we prove results

Validation reports and Data Matching show parity. Approval records provide evidence for audits.

Can I see a demo?

Absolutely. Book a live walkthrough where we parse your own DataStage jobs in real time and show you converted output, lineage, and validation results.

What about orchestration

Integrate with Airflow, ADF, Composer, or Control M. Keep existing schedules or modernize them.

How do we start

Begin with the pilot. Load a sample of DataStage jobs (DSX/ISX). Review lineage, conversion, runs, and validation. Scale with confidence.

Contact

Talk to our team

Questions about DataStage migration? Tell us about your estate and target platform.

Schedule a Demo

See MigryX parse your own code in a live walkthrough.

Book Time →

Request a POC

Submit your migration details and get a free proof of concept.

Start POC →

hello@migryx.com (617) 512-9530 Indianapolis • Boston • Hyderabad

From DataStage to modern pipelines

IBM InfoSphere DataStage · DSX / ISX · Parser Engine

Migration Process

Understand Your DataStage Estate Before You Migrate

Complete Inventory

Smart Recommendations

Dependency Mapping

Cost Optimization

Intelligent Classification Logic

✅ Increases Migration Priority

⚠️ Decreases Migration Priority

Typical Customer Outcomes

File Scanner

Code Parser

Log Analyzer

Usage Tracking

Migration Scoring

Phased Planning

Rich Reports

Queryable Database

Simple Process. Powerful Results.

Configure & Scan

Analyze & Score

Review & Decide

Execute & Save

Analyze. Inventory. Lineage.

Convert. Generate modern code.

Execute. Orchestrate pipelines.

Validate. Prove parity.

Merlin AI. Assist and accelerate.

Visual Execution

DataStage migration across the full lifecycle

Source: IBM DataStage

Targets we generate

Simple, secure, on premise deployment

Docker deployment (internal)

Cloud deployment

No consultants. No waiting.

Run the Pilot Yourself

Migration Readiness

Full Pilot

Large Scale Pilot

Project Reports and JCL Reports

Datasheets

How MigryX fits your DataStage migration

Deployment

Connectors

Storage

Private by design. You hold the keys.

Data residency

Access control

Auditability

Governance

Backups

Isolation

Answers to common questions

Where does MigryX run

What code is produced

How do we prove results

Can I see a demo?

What about orchestration

How do we start

Talk to our team

Schedule a Demo

Request a POC