π€ AI Summary
Large language models (LLMs) exhibit weak long-horizon reasoning in deep browsing-style search, exacerbated by the scarcity of high-quality, difficulty-graded supervision data. Method: We propose DeepDiveβa framework featuring (1) knowledge-graph-guided generation of semantically complex, multi-hop reasoning questions to construct a high-fidelity training benchmark; (2) an end-to-end multi-turn reinforcement learning algorithm that jointly optimizes tool invocation sequences and long-horizon decision-making; and (3) a tool-augmented architecture enabling dynamic tool extension and parallel sampling at inference time. Contribution/Results: On the BrowseComp benchmark, DeepDive-32B achieves state-of-the-art open-source performance, substantially outperforming baselines including WebSailor, DeepSeek-R1-Browse, and Search-o1. Ablations confirm that synergistic optimization of hard-example generation and long-horizon RL is critical for advancing deep search capabilities.
π Abstract
Augmenting large language models (LLMs) with browsing tools substantially improves their potential as deep search agents to solve complex, real-world tasks. Yet, open LLMs still perform poorly in such settings due to limited long-horizon reasoning capacity with browsing tools and the lack of sufficiently difficult supervised data. To address these challenges, we present DeepDive to advance deep search agents. First, we propose a strategy to automatically synthesize complex, difficult, and hard-to-find questions from open knowledge graphs. Second, we apply end-to-end multi-turn reinforcement learning (RL) to enhance LLMs' long-horizon reasoning with deep search. Experiments show that DeepDive-32B achieves a new open-source competitive result on BrowseComp, outperforming WebSailor, DeepSeek-R1-Browse, and Search-o1. We demonstrate that multi-turn RL training improves deep search ability and significantly contributes to the performance improvements across multiple benchmarks. We observe that DeepDive enables test-time scaling of tool calls and parallel sampling. All datasets, models, and code are publicly available at https://github.com/THUDM/DeepDive.