🤖 AI Summary
Existing Open-Ended Deep Research (OEDR) agents often suffer from insufficient exploration during long-horizon knowledge integration due to information loss or reliance on implicit outline-based reasoning. This work proposes DualGraph, a novel memory architecture that decouples knowledge exploration from writing structure for the first time by jointly maintaining an Outline Graph and a Knowledge Graph. This design explicitly models knowledge relationships and gaps, enabling goal-directed, iterative exploration and report generation. Integrating topological analysis, structural signals, and LLM-driven query generation, DualGraph significantly outperforms current methods on benchmarks such as DeepResearch Bench, yielding reports with markedly improved depth, breadth, and factual accuracy—achieving a RACE score of 53.08 under GPT-5 evaluation.
📝 Abstract
Open-Ended Deep Research (OEDR) pushes LLM agents beyond short-form QA toward long-horizon workflows that iteratively search, connect, and synthesize evidence into structured reports. However, existing OEDR agents largely follow either linear ``search-then-generate''accumulation or outline-centric planning. The former suffers from lost-in-the-middle failures as evidence grows, while the latter relies on the LLM to implicitly infer knowledge gaps from the outline alone, providing weak supervision for identifying missing relations and triggering targeted exploration. We present DualGraph memory, an architecture that separates what the agent knows from how it writes. DualGraph maintains two co-evolving graphs: an Outline Graph (OG), and a Knowledge Graph (KG), a semantic memory that stores fine-grained knowledge units, including core entities, concepts, and their relations. By analyzing the KG topology together with structural signals from the OG, DualGraph generates targeted search queries, enabling more efficient and comprehensive iterative knowledge-driven exploration and refinement. Across DeepResearch Bench, DeepResearchGym, and DeepConsult, DualGraph consistently outperforms state-of-the-art baselines in report depth, breadth, and factual grounding; for example, it reaches a 53.08 RACE score on DeepResearch Bench with GPT-5. Moreover, ablation studies confirm the central role of the dual-graph design.