Enhancing Automated Paper Reproduction via Prompt-Free Collaborative Agents

📅 2025-12-02

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

Existing automated paper reproduction frameworks lack built-in mechanisms for automatic validation and optimization during code generation, or rely excessively on manually engineered prompts, limiting their adaptability and scalability. To address this, we propose a prompt-free, dual-agent collaborative framework: a verification agent autonomously identifies output defects based solely on the original system prompt, while an optimization agent iteratively refines the code generation process accordingly—requiring no human intervention. Integrated into the Paper2Code system, our framework achieves ~15% and ~13% performance gains on Code-Dev and Paper2CodeBench, respectively, substantially outperforming baseline methods and Self-Refine. This work marks the first end-to-end, prompt-agnostic pipeline for paper-to-executable-code reproduction with integrated self-validation.

Technology Category

Application Category

📝 Abstract

Automated paper reproduction has emerged as a promising approach to accelerate scientific research, employing multi-step workflow frameworks to systematically convert academic papers into executable code. However, existing frameworks often lack mechanisms to verify and refine the outputs at each generation step, or rely heavily on manually designed prompts for self-refinement, which limits their adaptability and scalability. To address these limitations, we propose a prompt-free collaborative agent framework that automatically enhances the quality of paper-to-code generation. Our approach employs two collaborative agents: a verification agent that examines whether the outputs at each step satisfy the requirements specified in the corresponding system prompt, and a refinement agent that revises the outputs based on the identified issues. Unlike previous methods that require human experts to craft specific refinement prompts for each step, our framework achieves automatic verification and improvement by leveraging only the original system prompts. We integrate our collaborative agents into the Paper2Code framework and conduct comprehensive experiments on PaperBench Code-Dev and Paper2CodeBench datasets. Experimental results demonstrate that our approach significantly improves the accuracy and completeness of reproduced code, achieving performance gains of approximately 15% and 13%, respectively, compared to the baseline without our agents. Furthermore, comparative experiments against Self-Refine validate the robustness and consistency of our prompt-free approach across different datasets.

Problem

Research questions and friction points this paper is trying to address.

Automates verification and refinement in paper-to-code generation.

Eliminates reliance on manually crafted prompts for each step.

Improves accuracy and completeness of reproduced scientific code.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Collaborative agents verify outputs automatically

Refinement agent revises code without manual prompts

Leverages original system prompts for quality enhancement

🔎 Similar Papers

ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models