🤖 AI Summary
Existing RAG methods employ rigid retrieval strategies ill-suited to queries of varying complexity, leading to suboptimal performance on knowledge-intensive tasks such as multi-hop reasoning. To address this, we propose the first Multi-Armed Bandit (MAB)-based adaptive Retrieval-Augmented Generation framework. Our method jointly models query complexity and employs reinforcement learning to dynamically balance retrieval precision and efficiency via a novel reward function—incorporating explicit step-wise penalties for excessive retrieval operations—to enable online, adaptive policy optimization. Evaluated across multiple single-hop and multi-hop benchmarks, our approach achieves state-of-the-art performance, significantly reducing average retrieval overhead while improving generation accuracy. The core contribution lies in the tight integration of query-complexity-aware modeling with MAB-driven dynamic decision-making, thereby transcending conventional static or heuristic retrieval paradigms.
📝 Abstract
Retrieval Augmented Generation (RAG) has proven to be highly effective in boosting the generative performance of language model in knowledge-intensive tasks. However, existing RAG framework either indiscriminately perform retrieval or rely on rigid single-class classifiers to select retrieval methods, leading to inefficiencies and suboptimal performance across queries of varying complexity. To address these challenges, we propose a reinforcement learning-based framework that dynamically selects the most suitable retrieval strategy based on query complexity. % our solution Our approach leverages a multi-armed bandit algorithm, which treats each retrieval method as a distinct ``arm'' and adapts the selection process by balancing exploration and exploitation. Additionally, we introduce a dynamic reward function that balances accuracy and efficiency, penalizing methods that require more retrieval steps, even if they lead to a correct result. Our method achieves new state of the art results on multiple single-hop and multi-hop datasets while reducing retrieval costs. Our code are available at https://github.com/FUTUREEEEEE/MBA .