🤖 AI Summary
Existing reinforcement learning approaches for code generation are often constrained by insufficient trajectory diversity, hindering further performance gains. This work proposes a multi-agent collaborative tree search framework that, for the first time, deeply integrates multi-agent cooperation with structured tree search. In this framework, multiple heterogeneous agents concurrently explore a shared search tree, and a newly designed path-level group advantage function enables efficient credit assignment and coordinated policy optimization. Experimental results demonstrate that the proposed method significantly outperforms current state-of-the-art approaches across multiple code generation benchmarks, thereby validating the effectiveness of multi-agent tree search in enhancing reinforcement learning performance.
📝 Abstract
Reinforcement learning (RL) paradigms have demonstrated strong performance on reasoning-intensive tasks such as code generation. However, limited trajectory diversity often leads to diminishing returns, which constrains the achievable performance ceiling. Search-enhanced RL alleviates this issue by introducing structured exploration, which remains constrained by the single-agent policy priors. Meanwhile, leveraging multiple interacting policies can acquire more diverse exploratory signals, but existing approaches are typically decoupled from structured search. We propose \textbf{MARS$^2$} (Multi-Agent Reinforced Tree-Search Scaling), a unified RL framework in which multiple independently-optimized agents collaborate within a shared tree-structured search environment. MARS$^2$ models the search tree as a learnable multi-agent interaction environment, enabling heterogeneous agents to collaboratively generate and refine candidate solutions within a shared search topology. To support effective learning, we introduce a path-level group advantage formulation based on tree-consistent reward shaping, which facilitates effective credit assignment across complex search trajectories. Experiments on code generation benchmarks show that MARS$^2$ consistently improves performance across diverse model combinations and training settings, demonstrating the effectiveness of coupling multi-agent collaboration with tree search for enhancing reinforcement learning. Our code is publicly available at https://github.com/TsinghuaC3I/MARTI.