🤖 AI Summary
This work addresses the challenge of generating scientific workflow code in the absence of input/output test cases, which hinders optimization via execution feedback. To this end, the authors propose MOSAIC, a novel framework that achieves effective code generation without I/O supervision for the first time. MOSAIC leverages a training-free multi-agent collaboration strategy, integrating structured problem decomposition, domain-specific exemplar guidance, and teacher-student knowledge distillation. It further introduces an innovative Consolidated Context Window mechanism to maintain consistency across multi-step reasoning and mitigate hallucinations. Evaluated on the SciCode benchmark, MOSAIC substantially improves the accuracy, executability, and numerical precision of generated code, outperforming all existing methods across the board.
📝 Abstract
Existing multi-agent Large Language Model (LLM) frameworks for code generation typically use execution feedback and improve iteratively using Input/Output (I/O) test cases. However, this does not work for scientific workflows, where I/O test cases do not exist, and generating them requires solving the very problem at hand. To address this, we introduce MOSAIC, a training-free multi-agent framework for scientific code generation without I/O supervision. Instead of execution feedback, MOSAIC employs a student-teacher knowledge distillation framework that grounds generation through domain-specific examples and structured problem decomposition. To further mitigate hallucinations across chained subproblems, we introduce a Consolidated Context Window (CCW) for maintaining consistent reasoning across agents. Experiments on the SciCode benchmark show that MOSAIC improves accuracy, executability, and numerical precision over existing approaches while relying on lightweight models.