Self-Routing RAG: Binding Selective Retrieval with Knowledge Verbalization

📅 2025-04-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing RAG methods neglect LLMs’ intrinsic knowledge during selective retrieval, leading to low-quality retrievals that interfere with answer generation and rigid knowledge-source selection. To address this, we propose *Self-Routing*, a mechanism enabling the model to autonomously decide between external retrieval and internal parametric knowledge for answer generation—jointly optimizing knowledge-source selection and natural, verbalized response generation. We further introduce dynamic nearest-neighbor search to mitigate domain shift. Through multi-task fine-tuning—jointly modeling knowledge-source classification, verbalization, and response generation—and dynamic nearest-neighbor inference, our method improves average response accuracy by 5.1% across three mainstream LLMs, reduces inference latency, and cuts redundant retrieval calls by 29%. Our core contribution is the first integration of LLM knowledge confidence modeling into routing decisions, enabling end-to-end differentiable, adaptive retrieval augmentation.

Technology Category

Application Category

📝 Abstract
Selective retrieval improves retrieval-augmented generation (RAG) by reducing distractions from low-quality retrievals and improving efficiency. However, existing approaches under-utilize the inherent knowledge of large language models (LLMs), leading to suboptimal retrieval decisions and degraded generation performance. To bridge this gap, we propose Self-Routing RAG (SR-RAG), a novel framework that binds selective retrieval with knowledge verbalization. SR-RAG enables an LLM to dynamically decide between external retrieval and verbalizing its own parametric knowledge. To this end, we design a multi-task objective that jointly optimizes an LLM on knowledge source selection, knowledge verbalization, and response generation. We further introduce dynamic knowledge source inference via nearest neighbor search to improve the accuracy of knowledge source decision under domain shifts. Fine-tuning three LLMs with SR-RAG significantly improves both their response accuracy and inference latency. Compared to the strongest selective retrieval baseline, SR-RAG reduces retrievals by 29% while improving the performance by 5.1%.
Problem

Research questions and friction points this paper is trying to address.

Optimizing retrieval-augmented generation by reducing low-quality retrievals
Enhancing LLM knowledge utilization for better retrieval decisions
Improving response accuracy and latency via dynamic knowledge selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic decision between retrieval and verbalization
Multi-task optimization for joint learning
Nearest neighbor search for domain adaptation
🔎 Similar Papers
No similar papers found.
D
Di Wu
University of California, Los Angeles
Jia-Chen Gu
Jia-Chen Gu
University of California, Los Angeles
Natural Language ProcessingMachine Learning
K
Kai-Wei Chang
University of California, Los Angeles
N
Nanyun Peng
University of California, Los Angeles