🤖 AI Summary
To address performance bottlenecks in traditional information retrieval caused by challenges in multi-hop reasoning and complex semantic alignment, this paper proposes a lightweight, production-deployable query reasoning enhancement framework. Built upon Qwen2.5-1.5B/7B-Instruct, it employs Proximal Policy Optimization (PPO) reinforcement learning coupled with a novel sparse-semantic joint reward mechanism—featuring the first semi-regularized RL reward function designed for end-to-end joint optimization of query rewriting and reasoning capabilities. Evaluated on the BRIGHT benchmark with BM25 as the retriever, our method significantly outperforms both prompt-based and state-of-the-art dense reasoning retrievers. Notably, TongSearch QR-1.5B/7B achieves retrieval accuracy comparable to GPT-4 and LLaMA3-70B, while reducing inference latency by over 90%, thereby enabling high-accuracy retrieval with practical low-overhead deployment.
📝 Abstract
Traditional information retrieval (IR) methods excel at textual and semantic matching but struggle in reasoning-intensive retrieval tasks that require multi-hop inference or complex semantic understanding between queries and documents. One promising solution is to explicitly rewrite or augment queries using large language models (LLMs) to elicit reasoning-relevant content prior to retrieval. However, the widespread use of large-scale language models like GPT-4 or LLaMA3-70B remains impractical due to their high inference cost and limited deployability in real-world systems. In this work, we introduce TongSearch QR (Previously Known as"TongSearch Reasoner"), a family of small-scale language models for query reasoning and rewriting in reasoning-intensive retrieval. With a novel semi-rule-based reward function, we employ reinforcement learning approaches enabling smaller language models, e,g, Qwen2.5-7B-Instruct and Qwen2.5-1.5B-Instruct, to achieve query reasoning performance rivaling large-scale language models without their prohibitive inference costs. Experiment results on BRIGHT benchmark show that with BM25 as retrievers, both TongSearch QR-7B and TongSearch QR-1.5B models significantly outperform existing baselines, including prompt-based query reasoners and some latest dense retrievers trained for reasoning-intensive retrieval tasks, offering superior adaptability for real-world deployment.