Scaling Multiagent Systems with Process Rewards

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This work addresses the challenges of ambiguous credit assignment and inefficient interaction sampling in multi-agent system fine-tuning. The authors propose MAPPA, a novel method that introduces, for the first time, an AI feedback–based per-action process reward mechanism into multi-agent fine-tuning. By providing fine-grained supervision to individual agents’ actions prior to task completion, MAPPA enables efficient credit assignment and improved sample utilization, substantially reducing reliance on human annotations. Integrating multi-agent reinforcement learning with a tool-augmented task framework, the approach achieves performance gains of 5.0–17.5 percentage points on AIME and 7.8–17.2 percentage points on AMC mathematical competition benchmarks, increases success rates by 12.5 percentage points on data analyst tasks, and improves quality metrics by up to 30%.

Technology Category

Application Category

📝 Abstract

While multiagent systems have shown promise for tackling complex tasks via specialization, finetuning multiple agents simultaneously faces two key challenges: (1) credit assignment across agents, and (2) sample efficiency of expensive multiagent rollouts. In this work, we propose finetuning multiagent systems with per-action process rewards from AI feedback (MAPPA) to address both. Through assigning credit to individual agent actions rather than only at task completion, MAPPA enables fine-grained supervision without ground truth labels while extracting maximal training signal from each rollout. We demonstrate our approach on competition math problems and tool-augmented data analysis tasks. On unseen math problems, MAPPA achieves +5.0--17.5pp on AIME and +7.8--17.2pp on AMC. For data analysis tasks, our method improves success rate by +16.7pp while quality metrics improve by up to 47%, validating that per-action supervision can lead to improvements across different multiagent systems on various domains. By addressing these challenges, our work takes a first step toward scaling multiagent systems for complex, long-horizon tasks with minimal human supervision.

Problem

Research questions and friction points this paper is trying to address.

multiagent systems

credit assignment

sample efficiency

process rewards

fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

multiagent systems

process rewards

credit assignment