🤖 AI Summary
This work addresses the limitations of existing single-agent systems in complex code generation and the inefficacy of many multi-agent approaches, which often suffer from prompt-driven interactions or homogeneous training that hinder effective error correction and strategic diversity. The authors propose the MARTI-MARS² framework, which models multi-agent collaboration as a learnable dynamic environment by integrating reinforcement learning with multi-agent tree search. This enables an evolution from homogeneous multi-role to heterogeneous multi-agent systems, complemented by the MARTI-MARS²-T+ inference strategy to unlock collaborative potential at test time. The study reveals, for the first time, a progressive scaling law—“single agent → homogeneous multi-role → heterogeneous multi-agent”—demonstrating that policy diversity is key to elevating performance ceilings. On a 32B-scale model, a two-agent configuration achieves 77.7% code generation accuracy, substantially outperforming strong baselines such as GPT-5.1 and confirming the advantages of heterogeneous multi-agent systems in performance, scalability, and diversity.
📝 Abstract
While the complex reasoning capability of Large Language Models (LLMs) has attracted significant attention, single-agent systems often encounter inherent performance ceilings in complex tasks such as code generation. Multi-agent collaboration offers a promising avenue to transcend these boundaries. However, existing frameworks typically rely on prompt-based test-time interactions or multi-role configurations trained with homogeneous parameters, limiting error correction capabilities and strategic diversity. In this paper, we propose a Multi-Agent Reinforced Training and Inference Framework with Self-Search Scaling (MARTI-MARS2), which integrates policy learning with multi-agent tree search by formulating the multi-agent collaborative exploration process as a dynamic and learnable environment. By allowing agents to iteratively explore and refine within the environment, the framework facilitates evolution from parameter-sharing homogeneous multi-role training to heterogeneous multi-agent training, breaking through single-agent capability limits. We also introduce an efficient inference strategy MARTI-MARS2-T+ to fully exploit the scaling potential of multi-agent collaboration at test time. We conduct extensive experiments across varied model scales (8B, 14B, and 32B) on challenging code generation benchmarks. Utilizing two collaborating 32B models, MARTI-MARS2 achieves 77.7%, outperforming strong baselines like GPT-5.1. Furthermore, MARTI-MARS2 reveals a novel scaling law: shifting from single-agent to homogeneous multi-role and ultimately to heterogeneous multi-agent paradigms progressively yields higher RL performance ceilings, robust TTS capabilities, and greater policy diversity, suggesting that policy diversity is critical for scaling intelligence via multi-agent reinforcement learning.