🤖 AI Summary
Enterprise-grade RAG systems in high-stakes decision-making are often hindered by shallow retrieval, lack of traceability, and fragility to ambiguous queries. To address these limitations, this work proposes ADORE, a framework that orchestrates multiple specialized agents through a central coordinator to perform user-guided, iterative deep retrieval and synthesis. Key innovations include a structured memory repository based on Claim-Evidence Graphs, a memory-locking synthesis mechanism, an evidence-coverage-guided execution pipeline, segmented packing and compression for long-context handling, and an evidence-driven termination criterion—collectively enabling the generation of verifiable, fully traceable reports. ADORE achieves state-of-the-art performance with a score of 52.65 on the DeepResearch Bench and outperforms existing commercial systems by a 77.2% preference win rate in the DeepConsult evaluation.
📝 Abstract
Retrieval-Augmented Generation (RAG) shows promise for enterprise knowledge work, yet it often underperforms in high-stakes decision settings that require deep synthesis, strict traceability, and recovery from underspecified prompts. One-pass retrieval-and-write pipelines frequently yield shallow summaries, inconsistent grounding, and weak mechanisms for completeness verification. We introduce ADORE (Adaptive Deep Orchestration for Research in Enterprise), an agentic framework that replaces linear retrieval with iterative, user-steered investigation coordinated by a central orchestrator and a set of specialized agents. ADORE's key insight is that a structured Memory Bank (a curated evidence store with explicit claim-evidence linkage and section-level admissible evidence) enables traceable report generation and systematic checks for evidence completeness. Our contributions are threefold: (1) Memory-locked synthesis - report generation is constrained to a structured Memory Bank (Claim-Evidence Graph) with section-level admissible evidence, enabling traceable claims and grounded citations; (2) Evidence-coverage-guided execution - a retrieval-reflection loop audits section-level evidence coverage to trigger targeted follow-up retrieval and terminates via an evidence-driven stopping criterion; (3) Section-packed long-context grounding - section-level packing, pruning, and citation-preserving compression make long-form synthesis feasible under context limits. Across our evaluation suite, ADORE ranks first on DeepResearch Bench (52.65) and achieves the highest head-to-head preference win rate on DeepConsult (77.2%) against commercial systems.