TongSearch-QR: Reinforced Query Reasoning for Retrieval

📅 2025-06-13

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

To address performance bottlenecks in traditional information retrieval caused by challenges in multi-hop reasoning and complex semantic alignment, this paper proposes a lightweight, production-deployable query reasoning enhancement framework. Built upon Qwen2.5-1.5B/7B-Instruct, it employs Proximal Policy Optimization (PPO) reinforcement learning coupled with a novel sparse-semantic joint reward mechanism—featuring the first semi-regularized RL reward function designed for end-to-end joint optimization of query rewriting and reasoning capabilities. Evaluated on the BRIGHT benchmark with BM25 as the retriever, our method significantly outperforms both prompt-based and state-of-the-art dense reasoning retrievers. Notably, TongSearch QR-1.5B/7B achieves retrieval accuracy comparable to GPT-4 and LLaMA3-70B, while reducing inference latency by over 90%, thereby enabling high-accuracy retrieval with practical low-overhead deployment.

Technology Category

Application Category

📝 Abstract

Traditional information retrieval (IR) methods excel at textual and semantic matching but struggle in reasoning-intensive retrieval tasks that require multi-hop inference or complex semantic understanding between queries and documents. One promising solution is to explicitly rewrite or augment queries using large language models (LLMs) to elicit reasoning-relevant content prior to retrieval. However, the widespread use of large-scale language models like GPT-4 or LLaMA3-70B remains impractical due to their high inference cost and limited deployability in real-world systems. In this work, we introduce TongSearch QR (Previously Known as"TongSearch Reasoner"), a family of small-scale language models for query reasoning and rewriting in reasoning-intensive retrieval. With a novel semi-rule-based reward function, we employ reinforcement learning approaches enabling smaller language models, e,g, Qwen2.5-7B-Instruct and Qwen2.5-1.5B-Instruct, to achieve query reasoning performance rivaling large-scale language models without their prohibitive inference costs. Experiment results on BRIGHT benchmark show that with BM25 as retrievers, both TongSearch QR-7B and TongSearch QR-1.5B models significantly outperform existing baselines, including prompt-based query reasoners and some latest dense retrievers trained for reasoning-intensive retrieval tasks, offering superior adaptability for real-world deployment.

Problem

Research questions and friction points this paper is trying to address.

Improve reasoning-intensive retrieval with small-scale models

Reduce high inference costs of large language models

Enhance query reasoning performance for real-world deployment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Small-scale language models for query reasoning

Semi-rule-based reward function for reinforcement learning

Achieves performance rivaling large models cost-effectively

🔎 Similar Papers

FiDeLiS: Faithful Reasoning in Large Language Model for Knowledge Graph Question Answering