Query Routing for Retrieval-Augmented Language Models

📅 2025-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In retrieval-augmented generation (RAG) scenarios, dynamic routing across multiple large language models (LLMs) remains challenging—existing approaches neglect how retrieved documents dynamically influence model capabilities. Method: This paper formally defines the RAG-aware query routing problem and proposes RAGRouter, a framework that jointly models document embeddings and RAG capability embeddings via contrastive learning to explicitly capture knowledge representation shifts induced by retrieved content; it further incorporates a score-thresholding mechanism for low-latency, adaptive routing. Contribution/Results: Evaluated across diverse tasks and retrieval configurations, RAGRouter achieves an average 3.61% improvement over the best-performing single LLM baseline and significantly outperforms existing routing methods by 3.29%–9.33%. It delivers both superior accuracy and high deployment efficiency, enabling scalable, real-time RAG system orchestration.

Technology Category

Application Category

📝 Abstract
Retrieval-Augmented Generation (RAG) significantly improves the performance of Large Language Models (LLMs) on knowledge-intensive tasks. However, varying response quality across LLMs under RAG necessitates intelligent routing mechanisms, which select the most suitable model for each query from multiple retrieval-augmented LLMs via a dedicated router model. We observe that external documents dynamically affect LLMs' ability to answer queries, while existing routing methods, which rely on static parametric knowledge representations, exhibit suboptimal performance in RAG scenarios. To address this, we formally define the new retrieval-augmented LLM routing problem, incorporating the influence of retrieved documents into the routing framework. We propose RAGRouter, a RAG-aware routing design, which leverages document embeddings and RAG capability embeddings with contrastive learning to capture knowledge representation shifts and enable informed routing decisions. Extensive experiments on diverse knowledge-intensive tasks and retrieval settings show that RAGRouter outperforms the best individual LLM by 3.61% on average and existing routing methods by 3.29%-9.33%. With an extended score-threshold-based mechanism, it also achieves strong performance-efficiency trade-offs under low-latency constraints.
Problem

Research questions and friction points this paper is trying to address.

Improving response quality in retrieval-augmented LLMs via routing
Addressing static knowledge limitations in existing routing methods
Optimizing model selection with dynamic document influence in RAG
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic routing for retrieval-augmented LLMs
Leverages document and capability embeddings
Contrastive learning for informed routing decisions
🔎 Similar Papers
No similar papers found.
J
Jiarui Zhang
Shanghai Jiao Tong University
X
Xiangyu Liu
WeChat, Tencent Inc
Y
Yong Hu
WeChat, Tencent Inc
Chaoyue Niu
Chaoyue Niu
Shanghai Jiao Tong University
Device-Cloud MLOn-Device Intelligence
F
Fan Wu
Shanghai Jiao Tong University
Guihai Chen
Guihai Chen
Professor of Computer Science
Computer Science and Technology