🤖 AI Summary
Retrieval-augmented generation (RAG) agents lack process-level supervision for task decomposition, tool invocation, and stepwise reasoning, limiting their capability in complex reasoning and tool utilization. To address this, we propose EviPath—a novel framework that introduces abductive reasoning into subtask planning for the first time, enabling evidence-anchored, interpretable reasoning paths. EviPath synthesizes high-quality, end-to-end supervision data via agent-environment interaction simulation, evidence-driven faithful question answering, and conversational fine-tuning. This facilitates fine-grained modeling and traceable training of agent behavior chains. Evaluated on multiple open-domain question-answering benchmarks, an 8B-parameter model powered by EviPath surpasses state-of-the-art methods—achieving an absolute 14.7% improvement in Exact Match (EM), demonstrating both significant performance gains and robustness.
📝 Abstract
Retrieval-augmented generation agents development is hindered by the lack of process-level supervision to effectively guide agentic capabilities like task decomposition, retriever invocation, and stepwise decision-making. While reinforcement learning offers a potential solution, it suffers from sparse rewards and the limited reasoning capabilities of large language models (LLMs). Meanwhile, existing data synthesis methods only produce chain-of-thought rationales and fail to model environmental interactions. In this paper, we propose EviPath, an evidence-anchored reasoning path synthesis paradigm for RAG agent development. EviPath comprises: (i) Abductive Subtask Planning, which decomposes the problem into sub-questions and iteratively plans an optimal solution path based on the dependencies between them; (ii) Faithful Sub-question Answering, which uses supporting evidence to construct a proxy environment to generate reasoning thoughts and answers for each sub-question; and (iii) Conversational Fine-Tuning, which formats the complete agent-environment interaction trajectory into a dialogue format suitable for Supervised Fine-Tuning. EviPath allows LLMs to learn complex reasoning and tool-use capabilities directly from synthesized data. Extensive experiments on widely-used question-answering benchmarks show that an 8B parameter model trained with EviPath-synthesized data significantly and consistently outperforms state-of-the-art baselines with a double-digit absolute EM gain of 14.7% in open-domain question answering.