Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training

📅 2025-11-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing RAG methods predominantly rely on single-step retrieval, limiting their capability to handle complex, multi-hop question answering. Meanwhile, mainstream multi-step retrieval approaches typically require fine-tuning small language models (SLMs), incurring substantial computational overhead and lacking compatibility with large language models (LLMs). This paper proposes a lightweight, reinforcement learning–based framework for fine-tuning embedders—without modifying the LLM itself. It jointly optimizes a trainable embedder and a value-function-guided multi-step retrieval policy to enable efficient context expansion. The method supports ultra-long contexts (up to 10M tokens) and achieves state-of-the-art performance on long-context benchmarks including Babilong and RULER. It significantly reduces inference cost, mitigates hallucination, and demonstrates strong generalization and deployment efficiency.

Technology Category

Application Category

📝 Abstract
Retrieval-Augmented Generation (RAG) methods enhance LLM performance by efficiently filtering relevant context for LLMs, reducing hallucinations and inference cost. However, most existing RAG methods focus on single-step retrieval, which is often insufficient for answering complex questions that require multi-step search. Recently, multi-step retrieval approaches have emerged, typically involving the fine-tuning of small LLMs to perform multi-step retrieval. This type of fine-tuning is highly resource-intensive and does not enable the use of larger LLMs. In this work, we propose Q-RAG, a novel approach that fine-tunes the Embedder model for multi-step retrieval using reinforcement learning (RL). Q-RAG offers a competitive, resource-efficient alternative to existing multi-step retrieval methods for open-domain question answering and achieves state-of-the-art results on the popular long-context benchmarks Babilong and RULER for contexts up to 10M tokens.
Problem

Research questions and friction points this paper is trying to address.

Addresses multi-step retrieval for complex question answering
Reduces resource-intensive fine-tuning in retrieval-augmented generation
Enables efficient long-context processing up to 10M tokens
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tunes Embedder model using reinforcement learning
Enables multi-step retrieval for long contexts
Achieves state-of-the-art results efficiently
🔎 Similar Papers
No similar papers found.
A
Artyom Y. Sorokin
Applied AI, Moscow, Russia
N
N. Buzun
CILAB.AI, Moscow, Russia
A
Alexander Anokhin
Applied AI, Moscow, Russia
O
Oleg Inozemcev
Applied AI, Moscow, Russia
E
Egor Vedernikov
Applied AI, Moscow, Russia
Petr Anokhin
Petr Anokhin
Lomonosov Moscow State University; Федеральный медицинский исследовательский центр
M
M. Burtsev
London Institute for Mathematical Sciences, London, UK
T
Trushkov Alexey
Independent Researcher
W
Wenshuai Yin
Higher School of Economics, Moscow, Russia
Evgeny Burnaev
Evgeny Burnaev
Skoltech, Full Professor, Head of AI center, Head of research group, AIRI
Generative ModelingManifold LearningSurrogate Modeling3D Deep Learning