🤖 AI Summary
This work addresses the vulnerability of agentic Retrieval-Augmented Generation (RAG) systems to early-stage errors that propagate through long reasoning trajectories, often leading to complete failure. Existing repair strategies typically rely on full-chain re-execution, incurring substantial computational overhead. To overcome this limitation, the authors propose Doctor-RAG, a novel framework that decouples error attribution from repair for the first time. Doctor-RAG employs trajectory-level diagnosis to pinpoint the earliest failure point and leverages a coverage-gated error classification scheme to trigger tool-conditioned local repairs with minimal intervention, while reusing verified reasoning prefixes and retrieved evidence. Evaluated on three multi-hop question answering benchmarks, the method significantly improves answer accuracy while substantially reducing token consumption compared to conventional re-execution strategies.
📝 Abstract
Agentic Retrieval-Augmented Generation (Agentic RAG) has become a widely adopted paradigm for multi-hop question answering and complex knowledge reasoning, where retrieval and reasoning are interleaved at inference time. As reasoning trajectories grow longer, failures become increasingly common. Existing approaches typically address such failures by either stopping at diagnostic analysis or rerunning the entire retrieval-reasoning pipeline, which leads to substantial computational overhead and redundant reasoning. In this paper, we propose Doctor-RAG (DR-RAG), a unified diagnose-and-repair framework that corrects failures in Agentic RAG through explicit error localization and prefix reuse, enabling minimal-cost intervention. DR-RAG decomposes failure handling into two consecutive stages: (i) trajectory-level failure diagnosis and localization, which attributes errors to a coverage-gated taxonomy and identifies the earliest failure point in the reasoning trajectory; and (ii) tool-conditioned local repair, which intervenes only at the diagnosed failure point while maximally reusing validated reasoning prefixes and retrieved evidence. By explicitly separating error attribution from correction, DR-RAG enables precise error localization, thereby avoiding expensive full-pipeline reruns and enabling targeted, efficient repair. We evaluate DR-RAG across three multi-hop question answering benchmarks, multiple agentic RAG baselines, and different backbone models. Experimental results demonstrate that DR-RAG substantially improves answer accuracy while significantly reducing reasoning token consumption compared to rerun-based repair strategies.