Curriculum Guided Reinforcement Learning for Efficient Multi Hop Retrieval Augmented Generation

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multi-hop RAG approaches suffer from redundant subquery generation, insufficient exploration depth, and excessively long search chains, leading to inefficient retrieval and inaccurate answers. This paper proposes EVO-RAG, the first curriculum-guided reinforcement learning framework for query rewriting, enabling agents to progressively evolve from broad exploration to precise refinement. Key contributions include: (1) a seven-dimensional step-wise reward vector coupled with a time-aware dynamic scheduling mechanism; and (2) a joint decision-making architecture integrating multi-head reward modeling and direct preference optimization (DPO), empowering agents to autonomously select among searching, backtracking, answering, or abstaining. Evaluated on four multi-hop QA benchmarks—including HotpotQA—EVO-RAG achieves up to a 4.6-point improvement in Exact Match (EM) and reduces average retrieval depth by 15%, significantly enhancing both answer accuracy and reasoning efficiency.

Technology Category

Application Category

📝 Abstract
Retrieval-augmented generation (RAG) grounds large language models (LLMs) in up-to-date external evidence, yet existing multi-hop RAG pipelines still issue redundant subqueries, explore too shallowly, or wander through overly long search chains. We introduce EVO-RAG, a curriculum-guided reinforcement learning framework that evolves a query-rewriting agent from broad early-stage exploration to concise late-stage refinement. EVO-RAG couples a seven-factor, step-level reward vector (covering relevance, redundancy, efficiency, and answer correctness) with a time-varying scheduler that reweights these signals as the episode unfolds. The agent is trained with Direct Preference Optimization over a multi-head reward model, enabling it to learn when to search, backtrack, answer, or refuse. Across four multi-hop QA benchmarks (HotpotQA, 2WikiMultiHopQA, MuSiQue, and Bamboogle), EVO-RAG boosts Exact Match by up to 4.6 points over strong RAG baselines while trimming average retrieval depth by 15 %. Ablation studies confirm the complementary roles of curriculum staging and dynamic reward scheduling. EVO-RAG thus offers a general recipe for building reliable, cost-effective multi-hop RAG systems.
Problem

Research questions and friction points this paper is trying to address.

Reduces redundant subqueries in multi-hop RAG pipelines
Improves retrieval depth and search chain efficiency
Enhances answer correctness and relevance in RAG systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Curriculum-guided RL for query-rewriting agent evolution
Seven-factor step-level reward with dynamic scheduling
Direct Preference Optimization over multi-head reward model
🔎 Similar Papers
No similar papers found.