Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning

📅 2026-01-14

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This work proposes a training-free, test-time multi-agent reinforcement learning framework that addresses the challenges of high training costs, non-stationarity, and sparse, high-variance rewards. By incorporating structured textual experiences during inference, the approach orchestrates a team of expert agents that engage in multi-round deliberation. Efficient collaboration is achieved through test-time experience retrieval, consensus-based decision making, and a round-level credit assignment mechanism. Integrating large language model–driven coordination, experience retrieval, and credit allocation, the method significantly outperforms both traditional multi-agent and single-agent baselines, yielding average accuracy improvements of 3.67% and 8.67%, respectively, across benchmarks in medicine, mathematics, and education, while demonstrating enhanced robustness to distributional shifts.

Technology Category

Application Category

📝 Abstract

Multi-agent systems have evolved into practical LLM-driven collaborators for many applications, gaining robustness from diversity and cross-checking. However, multi-agent RL (MARL) training is resource-intensive and unstable: co-adapting teammates induce non-stationarity, and rewards are often sparse and high-variance. Therefore, we introduce \textbf{Multi-Agent Test-Time Reinforcement Learning (MATTRL)}, a framework that injects structured textual experience into multi-agent deliberation at inference time. MATTRL forms a multi-expert team of specialists for multi-turn discussions, retrieves and integrates test-time experiences, and reaches consensus for final decision-making. We also study credit assignment for constructing a turn-level experience pool, then reinjecting it into the dialogue. Across challenging benchmarks in medicine, math, and education, MATTRL improves accuracy by an average of 3.67\% over a multi-agent baseline, and by 8.67\% over comparable single-agent baselines. Ablation studies examine different credit-assignment schemes and provide a detailed comparison of how they affect training outcomes. MATTRL offers a stable, effective and efficient path to distribution-shift-robust multi-agent reasoning without tuning.

Problem

Research questions and friction points this paper is trying to address.

Multi-Agent Reinforcement Learning

Non-stationarity

Sparse Rewards

Test-Time Adaptation

Collaborative Reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Agent Test-Time Reinforcement Learning

Test-Time Experience Injection

Credit Assignment