🤖 AI Summary
This work addresses the challenge of achieving end-to-end automation across the entire scientific research lifecycle, where dynamic evolution of high-level research plans and reliable execution under uncertainty pose fundamental obstacles. We propose a novel dual-loop multi-agent collaboration framework: an outer loop, guided by a “Professor” agent, employs evolutionary algorithms to iteratively optimize research strategies; an inner loop, orchestrated by a “PhD Student” agent, enables real-time execution adaptation via context-aware pre- and post-meeting protocols, dynamic task scheduling, and environmental observation. Crucially, this framework formalizes plan evolution and execution feedback as a closed-loop co-adaptive process—its first such instantiation. Evaluated on benchmarks including ACLAward and Laboratory, our system achieves state-of-the-art scores in automatically generated paper quality, significantly outperforming strong baselines. Ablation studies confirm the critical contribution of the dual-loop design to overall performance gains.
📝 Abstract
Automating the end-to-end scientific research process poses a fundamental challenge: it requires both evolving high-level plans that are novel and sound, and executing these plans correctly amidst dynamic and uncertain conditions. To address this bilevel challenge, we propose a novel Double-Loop Multi-Agent (DLMA) framework to solve the given research problem automatically. The leader loop, composed of professor agents, is responsible for evolving research plans. It employs an evolutionary algorithm through involvement, improvement, and integration meetings to iteratively generate and refine a pool of research proposals, exploring the solution space effectively. The follower loop, composed of doctoral student agents, is responsible for executing the best-evolved plan. It dynamically adjusts the plan during implementation via pre-hoc and post-hoc meetings, ensuring each step (e.g., drafting, coding) is well-supported by contextual and external observations. Extensive experiments on benchmarks like ACLAward and Laboratory show that DLMA generates research papers that achieve state-of-the-art scores in automated evaluation, significantly outperforming strong baselines. Ablation studies confirm the critical roles of both loops, with evolution driving novelty and execution ensuring soundness.