Deep Research: A Survey of Autonomous Research Agents

📅 2025-08-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) suffer from static knowledge bases, limiting their ability to generate credible, evidence-based analytical reports grounded in real-time web data. To address this, we propose the “Deep Research Agent” paradigm—a four-stage technical pipeline encompassing question planning, adaptive querying, goal-directed web exploration, and evidence-driven report generation. We systematically formalize key technical challenges across all stages and introduce a synergistic optimization framework integrating active retrieval, multi-step reasoning, task decomposition, and information provenance tracking. Furthermore, we construct a dedicated benchmark for rigorous evaluation. Experiments demonstrate that our agent achieves end-to-end, traceable, and high-fidelity autonomous research on complex questions, significantly improving answer accuracy and evidence coverage. This work establishes both a theoretical foundation and a practical methodology for building trustworthy intelligent agents capable of dynamic knowledge expansion.

Technology Category

Application Category

📝 Abstract
The rapid advancement of large language models (LLMs) has driven the development of agentic systems capable of autonomously performing complex tasks. Despite their impressive capabilities, LLMs remain constrained by their internal knowledge boundaries. To overcome these limitations, the paradigm of deep research has been proposed, wherein agents actively engage in planning, retrieval, and synthesis to generate comprehensive and faithful analytical reports grounded in web-based evidence. In this survey, we provide a systematic overview of the deep research pipeline, which comprises four core stages: planning, question developing, web exploration, and report generation. For each stage, we analyze the key technical challenges and categorize representative methods developed to address them. Furthermore, we summarize recent advances in optimization techniques and benchmarks tailored for deep research. Finally, we discuss open challenges and promising research directions, aiming to chart a roadmap toward building more capable and trustworthy deep research agents.
Problem

Research questions and friction points this paper is trying to address.

Overcoming LLM knowledge boundaries via autonomous research agents
Surveying deep research pipeline stages and technical challenges
Advancing trustworthy autonomous research agent development
Innovation

Methods, ideas, or system contributions that make the work stand out.

Autonomous agents perform web-based evidence retrieval
Pipeline includes planning, question, exploration, reporting
Optimization techniques enhance research agent capabilities
🔎 Similar Papers
No similar papers found.
W
Wenlin Zhang
City University of Hong Kong, China
X
Xiaopeng Li
City University of Hong Kong, China
Yingyi Zhang
Yingyi Zhang
Bytedance
Content UnderstandingMLLMComputer VisionPalmprint RecognitionPose Estimation
Pengyue Jia
Pengyue Jia
PhD candidate of Data Science, City University of Hong Kong
Information RetrievalLarge Language ModelsGeoAI
Y
Yichao Wang
Huawei Noah’s Ark Lab, China
Huifeng Guo
Huifeng Guo
Huawei, Harbin Institute of Technology
Recommender SystemDeep LearningData Mining.
Y
Yong Liu
Huawei Noah’s Ark Lab, China
X
Xiangyu Zhao
City University of Hong Kong, China