ICML 2026 / Calibrated DFJSP Benchmarks

DynaSchedBench

Calibrated Dynamic Flexible Job Shop Scheduling benchmarks for studying how event design, observability, and agent strategy change scheduling quality. The release packages generation, simulation, evaluation, visualization, and agents under one import: dsbx.

Python import is dsbx,
the single namespace for the release code.
Release package is dsbx,
published on PyPI for direct installation.
Documentation lives at dsbx.readthedocs.io
for the full workflow.
Project site is dsbx7.github.io
for the public landing page.

Why DynaSchedBench

A release stack for calibrated instances, agent rollouts, and trajectory diagnosis.

DynaSchedBench is designed to be read as a working benchmark, not a pile of scripts: the same release provides the generator, the simulator, the evaluation tools, the visualizers, and the agent interfaces used to compare scheduling policies on matched inputs.

01

Controlled Instance Generation

Generate DFJSP instances from layered input models with target utilization, due-date tightness, variability, disturbance, and calibrated event streams. The release lets you move from a paper setting to a reproducible artifact without hand-editing inputs.

02

Event-Driven Simulation

Replay arrivals, breakdowns, priority changes, cancellations, maintenance, rework, processing-time changes, and route changes through explicit simulator snapshots. Every state transition is visible, so debugging a policy does not depend on guesswork.

03

Agent-Centric Evaluation

Run heuristics, PDRs, evolutionary search, and LLM schedulers on the same environment interface, then compare trajectories with shared metrics and plots. That keeps the benchmark fair even when the agents themselves come from very different design families.

Benchmark Surface

Designed to make observability and difficulty measurable.

DynaSchedBench replaces uncalibrated procedural sampling with event-stream refinement and SSI-based difficulty modeling. That means performance changes can be traced to the policy, the observability setting, or the event design instead of stochastic luck.

Plant Machines, groups, routes, families, and release pressure
SESC Event-space refinement, target convergence, and runtime efficiency
Dynamics Arrivals, priority shifts, cancellations, maintenance, and route changes
LLM Agents Observability levels, tool use, and prompt refinement strategies
Calibration Accuracy 68
SSI Coverage 43
LLM Robustness 76
Runtime Efficiency 82

Experiment Pipeline

From instance design to visual diagnosis.

The workflow is intentionally linear: generate a calibrated instance, run a scheduler, evaluate the resulting trajectory, and inspect the evidence with plots or summaries.

1

Generate

Calibrate event streams, export artifacts, and freeze the instance for reuse.

2

Run

Execute heuristics, PDRs, evolutionary agents, or LLM policies through one environment.

3

Evaluate

Check constraints, aggregate metrics, and compare trajectories on matched inputs.

4

Visualize

Inspect Gantt charts, event timelines, and metric curves to explain the outcome.

Install And Run

A compact CLI for the full benchmark loop.

The commands below mirror the release flow in the docs: install the package, generate a benchmark instance, run a policy, evaluate the trajectory, and turn it into a plot you can cite.

pip install dsbx

dsbx-gen gen \
  -i docs/examples/minimal_input_model.json \
  -o runs/minimal

dsbx-agent run \
  -d runs/minimal \
  -o runs/minimal/spt \
  -a pdr:SPT:LIT

dsbx-eval from-trajectory \
  -t runs/minimal/spt/trajectory_light.jsonl

dsbx-vis gantt \
  -t runs/minimal/spt/trajectory_light.jsonl \
  -o runs/minimal/spt/gantt.pdf

dsbx-gen

Instance generation, batch generation, MOO, hybrid calibration, and Pareto replay for calibrated inputs.

dsbx-agent

Baseline, PDR, evolutionary, and LLM agent rollouts with a shared environment API.

dsbx-eval

Trajectory metrics, event validation, schedule checks, and debug files for diagnosis.

dsbx-vis

Machine Gantt, job Gantt, metric curves, summaries, and event timelines.

dsbx-sim

Snapshot inspection without running a scheduling policy, useful for debugging state transitions.

Paper And Artifact

DynaSchedBench: Calibrated Dynamic Scheduling Benchmarks and Observability Paradox in LLM-based Scheduling Agents

The paper introduces SESC for calibrated event streams, SSI for difficulty stratification, and a modular simulation-evaluation stack for testing reactive and lookahead-based scheduling policies under different observability regimes.

What is released

The public package exposes the benchmark generator, simulator, evaluation helpers, visualization commands, and agent interfaces under the single Python import dsbx.

What to read first

Start with the installation and CLI pages, then follow the quickstart workflow to generate an instance, run a scheduler, and inspect the resulting trajectory.

What the paper emphasizes

The benchmark is not just a dataset. Its calibration model, observability settings, and evaluation loop are part of the experimental design.

Import dsbx
Commands dsbx-gen / eval / agent / vis / sim
Focus Calibration, simulation, and observability

Citation

Cite DynaSchedBench

Use the BibTeX below when you refer to the benchmark suite, the paper, or the release artifact in your own work.

@inproceedings{dynaschedbench2026,
  title     = {DynaSchedBench: Calibrated Dynamic Scheduling Benchmarks and Observability Paradox in LLM-based Scheduling Agents},
  author    = {Shijie Cao and Yuan Yuan and Jing Liu},
  booktitle = {Proceedings of the 43rd International Conference on Machine Learning},
  year      = {2026}
}