Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning

πŸ“… 2025-05-28
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Current multimodal retrieval-augmented generation (MRAG) methods rely on static retrieval pipelines and fail to leverage the dynamic reasoning capabilities of multimodal large language models (MLLMs) and their interactive potential with knowledge bases. To address this, we propose R1-Router, a dynamic stepwise retrieval-augmented reasoning framework. R1-Router introduces, for the first time, an adaptive knowledge-base routing mechanism conditioned on real-time reasoning states, enabling the model to autonomously decide *when* to retrieve, *which* knowledge source to consult, and *what* to queryβ€”while generating intermediate queries to orchestrate multi-source knowledge. We further design Step-wise GRPO, a reinforcement learning algorithm that optimizes rewards at the granularity of individual reasoning steps. Evaluated on multimodal open-domain question answering benchmarks, R1-Router outperforms strong baselines by over 7% in answer accuracy, while reducing redundant retrieval, improving reasoning fidelity, and enhancing computational efficiency.

Technology Category

Application Category

πŸ“ Abstract
Multimodal Retrieval-Augmented Generation (MRAG) has shown promise in mitigating hallucinations in Multimodal Large Language Models (MLLMs) by incorporating external knowledge during generation. Existing MRAG methods typically adopt a static retrieval pipeline that fetches relevant information from multiple Knowledge Bases (KBs), followed by a refinement step. However, these approaches overlook the reasoning and planning capabilities of MLLMs to dynamically determine how to interact with different KBs during the reasoning process. To address this limitation, we propose R1-Router, a novel MRAG framework that learns to decide when and where to retrieve knowledge based on the evolving reasoning state. Specifically, R1-Router can generate follow-up queries according to the current reasoning step, routing these intermediate queries to the most suitable KB, and integrating external knowledge into a coherent reasoning trajectory to answer the original query. Furthermore, we introduce Step-wise Group Relative Policy Optimization (Step-GRPO), a tailored reinforcement learning algorithm that assigns step-specific rewards to optimize the reasoning behavior of MLLMs. Experimental results on various open-domain QA benchmarks across multiple modalities demonstrate that R1-Router outperforms baseline models by over 7%. Further analysis shows that R1-Router can adaptively and effectively leverage diverse KBs, reducing unnecessary retrievals and improving both efficiency and accuracy.
Problem

Research questions and friction points this paper is trying to address.

Dynamic routing of queries across knowledge bases for reasoning
Optimizing retrieval-augmented generation with step-specific knowledge integration
Reducing unnecessary retrievals while improving efficiency and accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic query routing across knowledge bases
Step-specific reinforcement learning optimization
Adaptive retrieval reducing unnecessary queries
πŸ”Ž Similar Papers
No similar papers found.
Chunyi Peng
Chunyi Peng
Computer Science, Purdue University
wireless networking5Gmobile systemsmobile computingnetwork security
Zhipeng Xu
Zhipeng Xu
Northeastern University
NLPInformation Retrieval
Zhenghao Liu
Zhenghao Liu
Northeastern University
NLPInformation Retrieval
Yishan Li
Yishan Li
OpenBMB
Natural Language ProcessingLagre Language ModelInformation Retrieval
Yukun Yan
Yukun Yan
Tsinghua University
Large Language Model
S
Shuo Wang
Department of Computer Science and Technology, Institute for AI, Tsinghua University, China
Z
Zhiyuan Liu
Department of Computer Science and Technology, Institute for AI, Tsinghua University, China
Y
Yu Gu
School of Computer Science and Engineering, Northeastern University, China
M
Minghe Yu
School of Computer Science and Engineering, Northeastern University, China
G
Ge Yu
School of Computer Science and Engineering, Northeastern University, China
Maosong Sun
Maosong Sun
Professor of Computer Science and Technology, Tsinghua University
Natural Language ProcessingArtificial IntelligenceSocial Computing