WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning

πŸ“… 2026-02-04
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the limited collaboration and parallelism inherent in single-agent deep reasoning for wide-area information retrieval by proposing WideSeek, a master-slave multi-agent framework. WideSeek introduces the concept of breadth expansion into large language model systems, leveraging multi-agent reinforcement learning (MARL) to jointly optimize the master agent’s task decomposition and the parallel execution of multiple sub-agents. The framework integrates shared models, isolated contexts, and specialized tools, overcoming the constraints of traditional handcrafted workflows. Evaluated on the WideSearch benchmark, WideSeek-R1-4B achieves an entry-level F1 score of 40.0%, matching the performance of the much larger single-agent DeepSeek-R1-671B, with consistent performance gains as the number of sub-agents increases.

Technology Category

Application Category

πŸ“ Abstract
Recent advancements in Large Language Models (LLMs) have largely focused on depth scaling, where a single agent solves long-horizon problems with multi-turn reasoning and tool use. However, as tasks grow broader, the key bottleneck shifts from individual competence to organizational capability. In this work, we explore a complementary dimension of width scaling with multi-agent systems to address broad information seeking. Existing multi-agent systems often rely on hand-crafted workflows and turn-taking interactions that fail to parallelize work effectively. To bridge this gap, we propose WideSeek-R1, a lead-agent-subagent framework trained via multi-agent reinforcement learning (MARL) to synergize scalable orchestration and parallel execution. By utilizing a shared LLM with isolated contexts and specialized tools, WideSeek-R1 jointly optimizes the lead agent and parallel subagents on a curated dataset of 20k broad information-seeking tasks. Extensive experiments show that WideSeek-R1-4B achieves an item F1 score of 40.0% on the WideSearch benchmark, which is comparable to the performance of single-agent DeepSeek-R1-671B. Furthermore, WideSeek-R1-4B exhibits consistent performance gains as the number of parallel subagents increases, highlighting the effectiveness of width scaling.
Problem

Research questions and friction points this paper is trying to address.

width scaling
broad information seeking
multi-agent systems
parallel execution
organizational capability
Innovation

Methods, ideas, or system contributions that make the work stand out.

width scaling
multi-agent reinforcement learning
parallel execution
broad information seeking
lead-agent-subagent framework
πŸ”Ž Similar Papers