DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL

📅 2025-09-12

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Large language models (LLMs) exhibit weak long-horizon reasoning in deep browsing-style search, exacerbated by the scarcity of high-quality, difficulty-graded supervision data. Method: We propose DeepDive—a framework featuring (1) knowledge-graph-guided generation of semantically complex, multi-hop reasoning questions to construct a high-fidelity training benchmark; (2) an end-to-end multi-turn reinforcement learning algorithm that jointly optimizes tool invocation sequences and long-horizon decision-making; and (3) a tool-augmented architecture enabling dynamic tool extension and parallel sampling at inference time. Contribution/Results: On the BrowseComp benchmark, DeepDive-32B achieves state-of-the-art open-source performance, substantially outperforming baselines including WebSailor, DeepSeek-R1-Browse, and Search-o1. Ablations confirm that synergistic optimization of hard-example generation and long-horizon RL is critical for advancing deep search capabilities.

Technology Category

Application Category

📝 Abstract

Augmenting large language models (LLMs) with browsing tools substantially improves their potential as deep search agents to solve complex, real-world tasks. Yet, open LLMs still perform poorly in such settings due to limited long-horizon reasoning capacity with browsing tools and the lack of sufficiently difficult supervised data. To address these challenges, we present DeepDive to advance deep search agents. First, we propose a strategy to automatically synthesize complex, difficult, and hard-to-find questions from open knowledge graphs. Second, we apply end-to-end multi-turn reinforcement learning (RL) to enhance LLMs' long-horizon reasoning with deep search. Experiments show that DeepDive-32B achieves a new open-source competitive result on BrowseComp, outperforming WebSailor, DeepSeek-R1-Browse, and Search-o1. We demonstrate that multi-turn RL training improves deep search ability and significantly contributes to the performance improvements across multiple benchmarks. We observe that DeepDive enables test-time scaling of tool calls and parallel sampling. All datasets, models, and code are publicly available at https://github.com/THUDM/DeepDive.

Problem

Research questions and friction points this paper is trying to address.

Automatically synthesizing complex questions from knowledge graphs

Enhancing long-horizon reasoning with multi-turn reinforcement learning

Improving deep search agents' performance on difficult tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatically synthesizes complex questions from knowledge graphs

Applies multi-turn reinforcement learning for reasoning

Enables test-time scaling of tool calls

🔎 Similar Papers

No similar papers found.