Multi-Agent Sampling: Scaling Inference Compute for Data Synthesis with Tree Search-Based Agentic Collaboration

📅 2024-12-22

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 1

career value

227K/year

🤖 AI Summary

The scaling behavior of reasoning compute in multi-agent systems remains poorly understood. This paper addresses data synthesis under multi-agent collaborative sampling and proposes TOA—the first framework integrating Monte Carlo Tree Search (MCTS) into multilingual model collaboration. TOA formalizes agent coordination as a dynamic tree-search decision process, enabling input-adaptive online workflow orchestration and reward-driven exploration. It unifies MCTS, reward modeling, multi-agent sampling, and synthetic data distillation to significantly improve reasoning compute scalability. Experiments demonstrate state-of-the-art performance on WMT and a 71.8% win rate on AlpacaEval. After fine-tuning with TOA-synthesized data, the resulting models surpass strong preference-learning baselines and achieve leading results on challenging benchmarks including Arena-Hard.

Technology Category

Application Category

📝 Abstract

Scaling laws for inference compute in multi-agent systems remain under-explored compared to single-agent scenarios. This work aims to bridge this gap by investigating the problem of data synthesis through multi-agent sampling, where synthetic responses are generated by sampling from multiple distinct language models. Effective model coordination is crucial for successful multi-agent collaboration. Unlike previous approaches that rely on fixed workflows, we treat model coordination as a multi-step decision-making process, optimizing generation structures dynamically for each input question. We introduce Tree Search-based Orchestrated Agents~(TOA), where the workflow evolves iteratively during the sequential sampling process. To achieve this, we leverage Monte Carlo Tree Search (MCTS), integrating a reward model to provide real-time feedback and accelerate exploration. Our experiments on alignment, machine translation, and mathematical reasoning demonstrate that multi-agent sampling significantly outperforms single-agent sampling as inference compute scales. TOA is the most compute-efficient approach, achieving SOTA performance on WMT and a 71.8% LC win rate on AlpacaEval. Moreover, fine-tuning with our synthesized alignment data surpasses strong preference learning methods on challenging benchmarks such as Arena-Hard and AlpacaEval.

Problem

Research questions and friction points this paper is trying to address.

Explores scaling laws for multi-agent inference compute in data synthesis

Optimizes dynamic model coordination via tree search decision-making

Demonstrates superior performance over single-agent sampling in multiple tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic multi-agent coordination via tree search

Monte Carlo Tree Search for real-time feedback

Synthetic data outperforms single-agent sampling

🔎 Similar Papers

Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining

2024-10-10arXiv.orgCitations: 6

💼 Related Jobs

Research Scientist, AI Language