HiRAS: A Hierarchical Multi-Agent Framework for Paper-to-Code Generation and Execution

📅 2026-04-19
📈 Citations: 0
Influential: 0
📄 PDF

career value

220K/year
🤖 AI Summary
Existing approaches to paper-to-code generation rely on fixed-order agent pipelines that lack global coordination, limiting the robustness and performance of experimental reproduction. This work proposes HiRAS, a hierarchical multi-agent framework featuring a supervisory manager agent that orchestrates multiple specialized agents to enable end-to-end replication at fine-grained stages. The framework introduces a novel hierarchical task scheduling mechanism to enhance system robustness and presents Paper2Code-Extra, an evaluation protocol that incorporates repository-level information for more accurate assessment of reproduction quality. Built upon open-source large language models, the proposed system achieves over a 10% relative performance improvement over current state-of-the-art methods and substantially reduces hallucination in evaluation outputs.

Technology Category

Application Category

📝 Abstract
Recent advances in large language models have highlighted their potential to automate computational research, particularly reproducing experimental results. However, existing approaches still use fixed sequential agent pipelines with weak global coordination, which limits their robustness and overall performance. In this work, we propose Hierarchical Research Agent System (HiRAS), a hierarchical multi-agent framework for end-to-end experiment reproduction that employs supervisory manager agents to coordinate specialised agents across fine-grained stages. We also identify limitations in the reference-free evaluation of the Paper2Code benchmark and introduce Paper2Code-Extra (P2C-Ex), a refined protocol that incorporates repository-level information and better aligns with the original reference-based metric. We conduct extensive evaluation, validating the effectiveness and robustness of our proposed methods, and observing improvements, including >10\% relative performance gain beyond the previous state-of-the-art using open-source backbone models and significantly reduced hallucination in evaluation. Our work is available on GitHub: https://github.com/KOU-199024/HiRAS.
Problem

Research questions and friction points this paper is trying to address.

paper-to-code generation
experiment reproduction
multi-agent coordination
large language models
code generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Multi-Agent Framework
Paper-to-Code Generation
Manager-Agent Coordination
Paper2Code-Extra
Experiment Reproduction