Curriculum Guided Reinforcement Learning for Efficient Multi Hop Retrieval Augmented Generation

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Existing multi-hop RAG approaches suffer from redundant subquery generation, insufficient exploration depth, and excessively long search chains, leading to inefficient retrieval and inaccurate answers. This paper proposes EVO-RAG, the first curriculum-guided reinforcement learning framework for query rewriting, enabling agents to progressively evolve from broad exploration to precise refinement. Key contributions include: (1) a seven-dimensional step-wise reward vector coupled with a time-aware dynamic scheduling mechanism; and (2) a joint decision-making architecture integrating multi-head reward modeling and direct preference optimization (DPO), empowering agents to autonomously select among searching, backtracking, answering, or abstaining. Evaluated on four multi-hop QA benchmarks—including HotpotQA—EVO-RAG achieves up to a 4.6-point improvement in Exact Match (EM) and reduces average retrieval depth by 15%, significantly enhancing both answer accuracy and reasoning efficiency.

Technology Category

Application Category

📝 Abstract

Retrieval-augmented generation (RAG) grounds large language models (LLMs) in up-to-date external evidence, yet existing multi-hop RAG pipelines still issue redundant subqueries, explore too shallowly, or wander through overly long search chains. We introduce EVO-RAG, a curriculum-guided reinforcement learning framework that evolves a query-rewriting agent from broad early-stage exploration to concise late-stage refinement. EVO-RAG couples a seven-factor, step-level reward vector (covering relevance, redundancy, efficiency, and answer correctness) with a time-varying scheduler that reweights these signals as the episode unfolds. The agent is trained with Direct Preference Optimization over a multi-head reward model, enabling it to learn when to search, backtrack, answer, or refuse. Across four multi-hop QA benchmarks (HotpotQA, 2WikiMultiHopQA, MuSiQue, and Bamboogle), EVO-RAG boosts Exact Match by up to 4.6 points over strong RAG baselines while trimming average retrieval depth by 15 %. Ablation studies confirm the complementary roles of curriculum staging and dynamic reward scheduling. EVO-RAG thus offers a general recipe for building reliable, cost-effective multi-hop RAG systems.

Problem

Research questions and friction points this paper is trying to address.

Reduces redundant subqueries in multi-hop RAG pipelines

Improves retrieval depth and search chain efficiency

Enhances answer correctness and relevance in RAG systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Curriculum-guided RL for query-rewriting agent evolution

Seven-factor step-level reward with dynamic scheduling

Direct Preference Optimization over multi-head reward model

🔎 Similar Papers

No similar papers found.