From Evidence to Trajectory: Abductive Reasoning Path Synthesis for Training Retrieval-Augmented Generation Agents

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

Retrieval-augmented generation (RAG) agents lack process-level supervision for task decomposition, tool invocation, and stepwise reasoning, limiting their capability in complex reasoning and tool utilization. To address this, we propose EviPath—a novel framework that introduces abductive reasoning into subtask planning for the first time, enabling evidence-anchored, interpretable reasoning paths. EviPath synthesizes high-quality, end-to-end supervision data via agent-environment interaction simulation, evidence-driven faithful question answering, and conversational fine-tuning. This facilitates fine-grained modeling and traceable training of agent behavior chains. Evaluated on multiple open-domain question-answering benchmarks, an 8B-parameter model powered by EviPath surpasses state-of-the-art methods—achieving an absolute 14.7% improvement in Exact Match (EM), demonstrating both significant performance gains and robustness.

Technology Category

Application Category

📝 Abstract

Retrieval-augmented generation agents development is hindered by the lack of process-level supervision to effectively guide agentic capabilities like task decomposition, retriever invocation, and stepwise decision-making. While reinforcement learning offers a potential solution, it suffers from sparse rewards and the limited reasoning capabilities of large language models (LLMs). Meanwhile, existing data synthesis methods only produce chain-of-thought rationales and fail to model environmental interactions. In this paper, we propose EviPath, an evidence-anchored reasoning path synthesis paradigm for RAG agent development. EviPath comprises: (i) Abductive Subtask Planning, which decomposes the problem into sub-questions and iteratively plans an optimal solution path based on the dependencies between them; (ii) Faithful Sub-question Answering, which uses supporting evidence to construct a proxy environment to generate reasoning thoughts and answers for each sub-question; and (iii) Conversational Fine-Tuning, which formats the complete agent-environment interaction trajectory into a dialogue format suitable for Supervised Fine-Tuning. EviPath allows LLMs to learn complex reasoning and tool-use capabilities directly from synthesized data. Extensive experiments on widely-used question-answering benchmarks show that an 8B parameter model trained with EviPath-synthesized data significantly and consistently outperforms state-of-the-art baselines with a double-digit absolute EM gain of 14.7% in open-domain question answering.

Problem

Research questions and friction points this paper is trying to address.

Developing RAG agents lacks process-level supervision for task decomposition

Reinforcement learning suffers from sparse rewards and limited LLM reasoning

Existing synthesis methods fail to model environmental interactions for agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

Abductive subtask planning for optimal solution paths

Evidence-based proxy environment for faithful reasoning

Conversational fine-tuning for agent-environment interaction training

🔎 Similar Papers

No similar papers found.

Qualcomm

$104,000.00 - $156,000.00

San Diego, California, United States of America

Authors to Follow