🤖 AI Summary
This work addresses the widespread challenge in control theory research wherein simulation results are often difficult to reproduce due to missing parameters and insufficient implementation details. To tackle this issue, the authors propose RESCORE, a novel framework that formally defines the task of “paper-to-simulation recoverability” and introduces a benchmark dataset comprising 500 papers from the IEEE Conference on Decision and Control (CDC). RESCORE employs a multi-agent large language model system composed of an Analyzer, Coder, and Verifier, integrating iterative execution feedback and visual comparison to enable end-to-end automated code generation. Experimental results demonstrate that RESCORE successfully reproduces 40.7% of the simulation tasks in the benchmark, significantly outperforming single-pass generation approaches in accuracy and achieving approximately a tenfold speedup over manual reproduction efforts.
📝 Abstract
Reconstructing numerical simulations from control systems research papers is often hindered by underspecified parameters and ambiguous implementation details. We define the task of Paper to Simulation Recoverability, the ability of an automated system to generate executable code that faithfully reproduces a paper's results. We curate a benchmark of 500 papers from the IEEE Conference on Decision and Control (CDC) and propose RESCORE, a three component LLM agentic framework, Analyzer, Coder, and Verifier. RESCORE uses iterative execution feedback and visual comparison to improve reconstruction fidelity. Our method successfully recovers task coherent simulations for 40.7% of benchmark instances, outperforming single pass generation. Notably, the RESCORE automated pipeline achieves an estimated 10X speedup over manual human replication, drastically cutting the time and effort required to verify published control methodologies. We will release our benchmark and agents to foster community progress in automated research replication.