DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments

📅 2025-04-04

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Current large language models (LLMs) struggle with deep research tasks in real-world web environments due to reliance on hand-crafted prompts or constrained retrieval-augmented generation (RAG) setups, failing to handle the web’s openness, dynamism, and noise. To address this, we propose the first end-to-end reinforcement learning (RL) framework enabling LLM agents to directly interact with web pages via browser APIs—performing autonomous information retrieval, multi-source cross-verification, emergent planning, self-reflection, and honest refusal to answer. Our method integrates a multi-agent architecture, adaptive webpage structure extraction, and RL training guided by reward modeling. Experiments on open-domain research tasks show that our approach outperforms prompt-engineering baselines by +28.9 points and RAG-based RL baselines by +7.2 points, significantly improving factual consistency and research robustness under realistic web conditions.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) equipped with web search capabilities have demonstrated impressive potential for deep research tasks. However, current approaches predominantly rely on either manually engineered prompts (prompt engineering-based) with brittle performance or reinforcement learning within controlled Retrieval-Augmented Generation (RAG) environments (RAG-based) that fail to capture the complexities of real-world interaction. In this paper, we introduce DeepResearcher, the first comprehensive framework for end-to-end training of LLM-based deep research agents through scaling reinforcement learning (RL) in real-world environments with authentic web search interactions. Unlike RAG-based approaches that assume all necessary information exists within a fixed corpus, our method trains agents to navigate the noisy, unstructured, and dynamic nature of the open web. We implement a specialized multi-agent architecture where browsing agents extract relevant information from various webpage structures and overcoming significant technical challenges. Extensive experiments on open-domain research tasks demonstrate that DeepResearcher achieves substantial improvements of up to 28.9 points over prompt engineering-based baselines and up to 7.2 points over RAG-based RL agents. Our qualitative analysis reveals emergent cognitive behaviors from end-to-end RL training, including the ability to formulate plans, cross-validate information from multiple sources, engage in self-reflection to redirect research, and maintain honesty when unable to find definitive answers. Our results highlight that end-to-end training in real-world web environments is not merely an implementation detail but a fundamental requirement for developing robust research capabilities aligned with real-world applications. We release DeepResearcher at https://github.com/GAIR-NLP/DeepResearcher.

Problem

Research questions and friction points this paper is trying to address.

Enhancing deep research tasks with real-world web interactions

Overcoming limitations of manual prompts and controlled RAG environments

Training LLM agents to navigate noisy, dynamic open web content

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning in real-world web environments

Multi-agent architecture for web information extraction

End-to-end training for robust research capabilities

🔎 Similar Papers

A Role of Environmental Complexity on Representation Learning in Deep Reinforcement Learning Agents