🤖 AI Summary
Existing AI agents face two key bottlenecks in Open-Ended Deep Research (OEDR): (1) static, decoupled pipelines that separate planning from evidence acquisition, and (2) monolithic long-text generation prone to “intermediate token loss” and hallucination. This paper proposes a dynamic dual-agent framework that tightly couples a Planner and a Writer to enable closed-loop, iterative coordination among evidence retrieval, hierarchical outline evolution, and content generation. Its core contributions are: (1) memory-augmented dynamic evidence management; (2) iterative, hierarchical outline optimization; and (3) chunked retrieval-augmented generation with source attribution for faithful content synthesis. By departing from rigid pipeline and single-pass generation paradigms, the framework substantially mitigates context drift and factual inconsistency. It achieves state-of-the-art performance on DeepResearch Bench, DeepConsult, and DeepResearchGym—demonstrating significant improvements in report quality, factual accuracy, and structural coherence.
📝 Abstract
This paper tackles open-ended deep research (OEDR), a complex challenge where AI agents must synthesize vast web-scale information into insightful reports. Current approaches are plagued by dual-fold limitations: static research pipelines that decouple planning from evidence acquisition and one-shot generation paradigms that easily suffer from long-context failure issues like "loss in the middle" and hallucinations. To address these challenges, we introduce WebWeaver, a novel dual-agent framework that emulates the human research process. The planner operates in a dynamic cycle, iteratively interleaving evidence acquisition with outline optimization to produce a comprehensive, source-grounded outline linking to a memory bank of evidence. The writer then executes a hierarchical retrieval and writing process, composing the report section by section. By performing targeted retrieval of only the necessary evidence from the memory bank for each part, it effectively mitigates long-context issues. Our framework establishes a new state-of-the-art across major OEDR benchmarks, including DeepResearch Bench, DeepConsult, and DeepResearchGym. These results validate our human-centric, iterative methodology, demonstrating that adaptive planning and focused synthesis are crucial for producing high-quality, reliable, and well-structured reports.