🤖 AI Summary
Dense retrievers underperform on complex, reasoning-intensive queries, while exhaustive large language model (LLM) invocation incurs prohibitive computational overhead. To address this trade-off, we propose AdaQR, an adaptive query reasoning framework that implicitly embeds LLM-based reasoning capabilities into the vector space. AdaQR introduces a lightweight routing module that dynamically selects between efficient dense retrieval and deep LLM-based query rewriting—enabling on-demand, in-embedding-space reasoning enhancement without universal LLM invocation. This design eliminates unnecessary LLM calls while preserving semantic fidelity and reasoning depth. Evaluated on the large-scale BRIGHT benchmark, AdaQR achieves a 7% improvement in retrieval effectiveness while reducing inference cost by 28%, demonstrating a significant advance in balancing accuracy and efficiency for open-domain retrieval.
📝 Abstract
Dense retrievers enhance retrieval by encoding queries and documents into continuous vectors, but they often struggle with reasoning-intensive queries. Although Large Language Models (LLMs) can reformulate queries to capture complex reasoning, applying them universally incurs significant computational cost. In this work, we propose Adaptive Query Reasoning (AdaQR), a hybrid query rewriting framework. Within this framework, a Reasoner Router dynamically directs each query to either fast dense reasoning or deep LLM reasoning. The dense reasoning is achieved by the Dense Reasoner, which performs LLM-style reasoning directly in the embedding space, enabling a controllable trade-off between efficiency and accuracy. Experiments on large-scale retrieval benchmarks BRIGHT show that AdaQR reduces reasoning cost by 28% while preserving-or even improving-retrieval performance by 7%.