RAGRouter-Bench: A Dataset and Benchmark for Adaptive RAG Routing

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

This study addresses the lack of systematic understanding in existing RAG research regarding the suitability of methods under varying query-corpus contexts and the trade-offs between effectiveness and efficiency, which hinders the development of adaptive routing strategies. To bridge this gap, we propose RAGRouter-Bench, the first benchmark specifically designed for adaptive RAG routing. From the perspective of query-corpus compatibility, it standardizes five representative RAG paradigms and conducts fine-grained evaluations across 7,727 queries and 21,460 multi-domain documents, uniformly measuring both generation quality and resource consumption. Our experiments reveal that RAG performance is highly dependent on query-corpus interactions, with no single paradigm consistently optimal; moreover, more complex mechanisms do not necessarily yield better efficiency-effectiveness trade-offs. This work establishes a foundational data and evaluation framework for interpretable and generalizable adaptive RAG systems.

Technology Category

Application Category

📝 Abstract

Retrieval-Augmented Generation (RAG) has become a core paradigm for grounding large language models with external knowledge. Despite extensive efforts exploring diverse retrieval strategies, existing studies predominantly focus on query-side complexity or isolated method improvements, lacking a systematic understanding of how RAG paradigms behave across different query-corpus contexts and effectiveness-efficiency trade-offs. In this work, we introduce RAGRouter-Bench, the first dataset and benchmark designed for adaptive RAG routing. RAGRouter-Bench revisits retrieval from a query-corpus compatibility perspective and standardizes five representative RAG paradigms for systematic evaluation across 7,727 queries and 21,460 documents spanning diverse domains. The benchmark incorporates three canonical query types together with fine-grained semantic and structural corpus metrics, as well as a unified evaluation for both generation quality and resource consumption. Experiments with DeepSeek-V3 and LLaMA-3.1-8B demonstrate that no single RAG paradigm is universally optimal, that paradigm applicability is strongly shaped by query-corpus interactions, and that increased advanced mechanism does not necessarily yield better effectiveness-efficiency trade-offs. These findings underscore the necessity of routing-aware evaluation and establish a foundation for adaptive, interpretable, and generalizable next-generation RAG systems.

Problem

Research questions and friction points this paper is trying to address.

Retrieval-Augmented Generation

adaptive routing

query-corpus compatibility

effectiveness-efficiency trade-off

RAG benchmark

Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive RAG routing

query-corpus compatibility

RAG benchmark