MARTI-MARS$^2$: Scaling Multi-Agent Self-Search via Reinforcement Learning for Code Generation

📅 2026-02-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing single-agent systems in complex code generation and the inefficacy of many multi-agent approaches, which often suffer from prompt-driven interactions or homogeneous training that hinder effective error correction and strategic diversity. The authors propose the MARTI-MARS² framework, which models multi-agent collaboration as a learnable dynamic environment by integrating reinforcement learning with multi-agent tree search. This enables an evolution from homogeneous multi-role to heterogeneous multi-agent systems, complemented by the MARTI-MARS²-T+ inference strategy to unlock collaborative potential at test time. The study reveals, for the first time, a progressive scaling law—“single agent → homogeneous multi-role → heterogeneous multi-agent”—demonstrating that policy diversity is key to elevating performance ceilings. On a 32B-scale model, a two-agent configuration achieves 77.7% code generation accuracy, substantially outperforming strong baselines such as GPT-5.1 and confirming the advantages of heterogeneous multi-agent systems in performance, scalability, and diversity.

Technology Category

Application Category

📝 Abstract
While the complex reasoning capability of Large Language Models (LLMs) has attracted significant attention, single-agent systems often encounter inherent performance ceilings in complex tasks such as code generation. Multi-agent collaboration offers a promising avenue to transcend these boundaries. However, existing frameworks typically rely on prompt-based test-time interactions or multi-role configurations trained with homogeneous parameters, limiting error correction capabilities and strategic diversity. In this paper, we propose a Multi-Agent Reinforced Training and Inference Framework with Self-Search Scaling (MARTI-MARS2), which integrates policy learning with multi-agent tree search by formulating the multi-agent collaborative exploration process as a dynamic and learnable environment. By allowing agents to iteratively explore and refine within the environment, the framework facilitates evolution from parameter-sharing homogeneous multi-role training to heterogeneous multi-agent training, breaking through single-agent capability limits. We also introduce an efficient inference strategy MARTI-MARS2-T+ to fully exploit the scaling potential of multi-agent collaboration at test time. We conduct extensive experiments across varied model scales (8B, 14B, and 32B) on challenging code generation benchmarks. Utilizing two collaborating 32B models, MARTI-MARS2 achieves 77.7%, outperforming strong baselines like GPT-5.1. Furthermore, MARTI-MARS2 reveals a novel scaling law: shifting from single-agent to homogeneous multi-role and ultimately to heterogeneous multi-agent paradigms progressively yields higher RL performance ceilings, robust TTS capabilities, and greater policy diversity, suggesting that policy diversity is critical for scaling intelligence via multi-agent reinforcement learning.
Problem

Research questions and friction points this paper is trying to address.

multi-agent collaboration
code generation
reinforcement learning
policy diversity
performance ceiling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Agent Reinforcement Learning
Code Generation
Policy Diversity
Heterogeneous Multi-Agent Training
Scaling Law
🔎 Similar Papers
No similar papers found.
S
Shijie Wang
Shanghai AI Laboratory
Pengfei Li
Pengfei Li
harbin institute of technology
Deep LearningRL
Yikun Fu
Yikun Fu
Beijing Institute of Technology
K
Kaifeng Liu
Harbin Institute of Technology
F
Fangyuan Li
Harbin Institute of Technology
Yang Liu
Yang Liu
PhD, Institute of Automation, Chinese Academy of Sciences
Computer Vision3D Perception3D Scene Reconstruction
X
Xiaowei Sun
Shanghai AI Laboratory, Fudan University
Zonglin Li
Zonglin Li
Anthropic
Language ModelingPretrainingInformation RetrievalLLM AgentsRAG
S
Siyao Zhao
Institute of Automation
J
Jian Zhao
Tsinghua University
K
Kai Tian
Tsinghua University, Frontis.AI
D
Dong Li
Harbin Institute of Technology
Junqi Gao
Junqi Gao
Shanghai AI Lab, 哈尔滨工业大学
Deep LearningGenerative ModelsContinual Learning
Y
Yutong Zhang
High School Affiliated to Fudan University
Yiqun Chen
Yiqun Chen
Renmin University of China
Information RetrievalRetrieval-Augmented GenerationReinforcement LearningMulti-Agent Systems
Yuqiang Li
Yuqiang Li
Central South University
Internal Combustion EngineCombustionEmissionsMechansim
Z
Zoe Li
University of Washington
W
Weinan Zhang
Harbin Institute of Technology
P
Peng Ye
Shanghai AI Laboratory
Shuyue Hu
Shuyue Hu
Shanghai Artificial Intelligence Lab
multiagent systemlarge language modelgame theory
Lei Bai
Lei Bai
Shanghai AI Laboratory
Foundation ModelScience IntelligenceMulti-Agent SystemAutonomous Discovery
Bowen Zhou
Bowen Zhou
Chair Professor, Department of Electrical Engineering, Tsinghua University; Founder of Frontis.ai
Machine LearningNatural Language ProcessingRepresentation Learning and ReasoningConversational
Kaiyan Zhang
Kaiyan Zhang
Tsinghua University
Foundation ModelCollective IntelligenceScientific Intelligence
B
Biqing Qi
Shanghai AI Laboratory