🤖 AI Summary
Existing multi-hop RAG approaches suffer from redundant subquery generation, insufficient exploration depth, and excessively long search chains, leading to inefficient retrieval and inaccurate answers. This paper proposes EVO-RAG, the first curriculum-guided reinforcement learning framework for query rewriting, enabling agents to progressively evolve from broad exploration to precise refinement. Key contributions include: (1) a seven-dimensional step-wise reward vector coupled with a time-aware dynamic scheduling mechanism; and (2) a joint decision-making architecture integrating multi-head reward modeling and direct preference optimization (DPO), empowering agents to autonomously select among searching, backtracking, answering, or abstaining. Evaluated on four multi-hop QA benchmarks—including HotpotQA—EVO-RAG achieves up to a 4.6-point improvement in Exact Match (EM) and reduces average retrieval depth by 15%, significantly enhancing both answer accuracy and reasoning efficiency.
📝 Abstract
Retrieval-augmented generation (RAG) grounds large language models (LLMs) in up-to-date external evidence, yet existing multi-hop RAG pipelines still issue redundant subqueries, explore too shallowly, or wander through overly long search chains. We introduce EVO-RAG, a curriculum-guided reinforcement learning framework that evolves a query-rewriting agent from broad early-stage exploration to concise late-stage refinement. EVO-RAG couples a seven-factor, step-level reward vector (covering relevance, redundancy, efficiency, and answer correctness) with a time-varying scheduler that reweights these signals as the episode unfolds. The agent is trained with Direct Preference Optimization over a multi-head reward model, enabling it to learn when to search, backtrack, answer, or refuse. Across four multi-hop QA benchmarks (HotpotQA, 2WikiMultiHopQA, MuSiQue, and Bamboogle), EVO-RAG boosts Exact Match by up to 4.6 points over strong RAG baselines while trimming average retrieval depth by 15 %. Ablation studies confirm the complementary roles of curriculum staging and dynamic reward scheduling. EVO-RAG thus offers a general recipe for building reliable, cost-effective multi-hop RAG systems.