TongSearch-QR: Reinforced Query Reasoning for Retrieval

📅 2025-06-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address performance bottlenecks in traditional information retrieval caused by challenges in multi-hop reasoning and complex semantic alignment, this paper proposes a lightweight, production-deployable query reasoning enhancement framework. Built upon Qwen2.5-1.5B/7B-Instruct, it employs Proximal Policy Optimization (PPO) reinforcement learning coupled with a novel sparse-semantic joint reward mechanism—featuring the first semi-regularized RL reward function designed for end-to-end joint optimization of query rewriting and reasoning capabilities. Evaluated on the BRIGHT benchmark with BM25 as the retriever, our method significantly outperforms both prompt-based and state-of-the-art dense reasoning retrievers. Notably, TongSearch QR-1.5B/7B achieves retrieval accuracy comparable to GPT-4 and LLaMA3-70B, while reducing inference latency by over 90%, thereby enabling high-accuracy retrieval with practical low-overhead deployment.

Technology Category

Application Category

📝 Abstract
Traditional information retrieval (IR) methods excel at textual and semantic matching but struggle in reasoning-intensive retrieval tasks that require multi-hop inference or complex semantic understanding between queries and documents. One promising solution is to explicitly rewrite or augment queries using large language models (LLMs) to elicit reasoning-relevant content prior to retrieval. However, the widespread use of large-scale language models like GPT-4 or LLaMA3-70B remains impractical due to their high inference cost and limited deployability in real-world systems. In this work, we introduce TongSearch QR (Previously Known as"TongSearch Reasoner"), a family of small-scale language models for query reasoning and rewriting in reasoning-intensive retrieval. With a novel semi-rule-based reward function, we employ reinforcement learning approaches enabling smaller language models, e,g, Qwen2.5-7B-Instruct and Qwen2.5-1.5B-Instruct, to achieve query reasoning performance rivaling large-scale language models without their prohibitive inference costs. Experiment results on BRIGHT benchmark show that with BM25 as retrievers, both TongSearch QR-7B and TongSearch QR-1.5B models significantly outperform existing baselines, including prompt-based query reasoners and some latest dense retrievers trained for reasoning-intensive retrieval tasks, offering superior adaptability for real-world deployment.
Problem

Research questions and friction points this paper is trying to address.

Improve reasoning-intensive retrieval with small-scale models
Reduce high inference costs of large language models
Enhance query reasoning performance for real-world deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Small-scale language models for query reasoning
Semi-rule-based reward function for reinforcement learning
Achieves performance rivaling large models cost-effectively
🔎 Similar Papers
No similar papers found.
X
Xubo Qin
NLCo Lab, Beijing Institute for General Artificial Intelligence (BIGAI)
Jun Bai
Jun Bai
Assistant professor
Computer aided drug discoveryMedical image analysisAI therapeutic target identification
J
Jiaqi Li
NLCo Lab, Beijing Institute for General Artificial Intelligence (BIGAI)
Zixia Jia
Zixia Jia
BigAI
NLP
Z
Zilong Zheng
NLCo Lab, Beijing Institute for General Artificial Intelligence (BIGAI)