DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL

πŸ“… 2025-09-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Large language models (LLMs) exhibit weak long-horizon reasoning in deep browsing-style search, exacerbated by the scarcity of high-quality, difficulty-graded supervision data. Method: We propose DeepDiveβ€”a framework featuring (1) knowledge-graph-guided generation of semantically complex, multi-hop reasoning questions to construct a high-fidelity training benchmark; (2) an end-to-end multi-turn reinforcement learning algorithm that jointly optimizes tool invocation sequences and long-horizon decision-making; and (3) a tool-augmented architecture enabling dynamic tool extension and parallel sampling at inference time. Contribution/Results: On the BrowseComp benchmark, DeepDive-32B achieves state-of-the-art open-source performance, substantially outperforming baselines including WebSailor, DeepSeek-R1-Browse, and Search-o1. Ablations confirm that synergistic optimization of hard-example generation and long-horizon RL is critical for advancing deep search capabilities.

Technology Category

Application Category

πŸ“ Abstract
Augmenting large language models (LLMs) with browsing tools substantially improves their potential as deep search agents to solve complex, real-world tasks. Yet, open LLMs still perform poorly in such settings due to limited long-horizon reasoning capacity with browsing tools and the lack of sufficiently difficult supervised data. To address these challenges, we present DeepDive to advance deep search agents. First, we propose a strategy to automatically synthesize complex, difficult, and hard-to-find questions from open knowledge graphs. Second, we apply end-to-end multi-turn reinforcement learning (RL) to enhance LLMs' long-horizon reasoning with deep search. Experiments show that DeepDive-32B achieves a new open-source competitive result on BrowseComp, outperforming WebSailor, DeepSeek-R1-Browse, and Search-o1. We demonstrate that multi-turn RL training improves deep search ability and significantly contributes to the performance improvements across multiple benchmarks. We observe that DeepDive enables test-time scaling of tool calls and parallel sampling. All datasets, models, and code are publicly available at https://github.com/THUDM/DeepDive.
Problem

Research questions and friction points this paper is trying to address.

Automatically synthesizing complex questions from knowledge graphs
Enhancing long-horizon reasoning with multi-turn reinforcement learning
Improving deep search agents' performance on difficult tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatically synthesizes complex questions from knowledge graphs
Applies multi-turn reinforcement learning for reasoning
Enables test-time scaling of tool calls
πŸ”Ž Similar Papers
No similar papers found.
R
Rui Lu
Tsinghua University
Zhenyu Hou
Zhenyu Hou
Tsinghua University
Language model reasoningGraph neural networks
Z
Zihan Wang
Northeastern University
H
Hanchen Zhang
Tsinghua University
X
Xiao Liu
Tsinghua University
Y
Yujiang Li
Tsinghua University
S
Shi Feng
Northeastern University
Jie Tang
Jie Tang
UW Madison
Computed Tomography
Yuxiao Dong
Yuxiao Dong
CS, Tsinghua University
Large Language ModelsVision Language ModelsLLM ReasoningLLM AgentGraph Machine Learning