Self-Routing RAG: Binding Selective Retrieval with Knowledge Verbalization

📅 2025-04-01

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing RAG methods neglect LLMs’ intrinsic knowledge during selective retrieval, leading to low-quality retrievals that interfere with answer generation and rigid knowledge-source selection. To address this, we propose *Self-Routing*, a mechanism enabling the model to autonomously decide between external retrieval and internal parametric knowledge for answer generation—jointly optimizing knowledge-source selection and natural, verbalized response generation. We further introduce dynamic nearest-neighbor search to mitigate domain shift. Through multi-task fine-tuning—jointly modeling knowledge-source classification, verbalization, and response generation—and dynamic nearest-neighbor inference, our method improves average response accuracy by 5.1% across three mainstream LLMs, reduces inference latency, and cuts redundant retrieval calls by 29%. Our core contribution is the first integration of LLM knowledge confidence modeling into routing decisions, enabling end-to-end differentiable, adaptive retrieval augmentation.

Technology Category

Application Category

📝 Abstract

Selective retrieval improves retrieval-augmented generation (RAG) by reducing distractions from low-quality retrievals and improving efficiency. However, existing approaches under-utilize the inherent knowledge of large language models (LLMs), leading to suboptimal retrieval decisions and degraded generation performance. To bridge this gap, we propose Self-Routing RAG (SR-RAG), a novel framework that binds selective retrieval with knowledge verbalization. SR-RAG enables an LLM to dynamically decide between external retrieval and verbalizing its own parametric knowledge. To this end, we design a multi-task objective that jointly optimizes an LLM on knowledge source selection, knowledge verbalization, and response generation. We further introduce dynamic knowledge source inference via nearest neighbor search to improve the accuracy of knowledge source decision under domain shifts. Fine-tuning three LLMs with SR-RAG significantly improves both their response accuracy and inference latency. Compared to the strongest selective retrieval baseline, SR-RAG reduces retrievals by 29% while improving the performance by 5.1%.

Problem

Research questions and friction points this paper is trying to address.

Optimizing retrieval-augmented generation by reducing low-quality retrievals

Enhancing LLM knowledge utilization for better retrieval decisions

Improving response accuracy and latency via dynamic knowledge selection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic decision between retrieval and verbalization

Multi-task optimization for joint learning

Nearest neighbor search for domain adaptation

🔎 Similar Papers

No similar papers found.

Authors to Follow