W&D:Scaling Parallel Tool Calling for Efficient Deep Research Agents

📅 2026-02-07

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

This work addresses the prevailing focus on increasing reasoning depth in research agents while overlooking the potential of parallel tool invocation (width), which limits both efficiency and effectiveness. To bridge this gap, we propose a “wide-and-deep” research agent framework that enables intrinsic parallel tool calls within a single reasoning step. Through a unified agent coordination mechanism and scheduling strategy, our approach systematically explores the synergistic optimization of width and depth for the first time, without relying on multi-agent orchestration or sophisticated context management. Evaluated on the BrowseComp benchmark, our method achieves a 62.2% accuracy—surpassing GPT-5-High (54.9%)—while substantially reducing interaction rounds and enhancing overall reasoning efficiency.

Technology Category

Application Category

📝 Abstract

Deep research agents have emerged as powerful tools for automating complex intellectual tasks through multi-step reasoning and web-based information seeking. While recent efforts have successfully enhanced these agents by scaling depth through increasing the number of sequential thinking and tool calls, the potential of scaling width via parallel tool calling remains largely unexplored. In this work, we propose the Wide and Deep research agent, a framework designed to investigate the behavior and performance of agents when scaling not only depth but also width via parallel tool calling. Unlike existing approaches that rely on complex multi-agent orchestration to parallelize workloads, our method leverages intrinsic parallel tool calling to facilitate effective coordination within a single reasoning step. We demonstrate that scaling width significantly improves performance on deep research benchmarks while reducing the number of turns required to obtain correct answers. Furthermore, we analyze the factors driving these improvements through case studies and explore various tool call schedulers to optimize parallel tool calling strategy. Our findings suggest that optimizing the trade-off between width and depth is a critical pathway toward high-efficiency deep research agents. Notably, without context management or other tricks, we obtain 62.2% accuracy with GPT-5-Medium on BrowseComp, surpassing the original 54.9% reported by GPT-5-High.

Problem

Research questions and friction points this paper is trying to address.

parallel tool calling

deep research agents

scaling width

agent efficiency

tool call scheduling

Innovation

Methods, ideas, or system contributions that make the work stand out.

parallel tool calling

wide and deep agent

scaling width