LightSearcher: Efficient DeepSearch via Experiential Memory

📅 2025-12-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of balancing accuracy and efficiency when deep reasoning models invoke external search tools, this paper proposes DeepSearch, a reinforcement learning framework grounded in experiential memory. Methodologically, it introduces (1) a textual experiential memory mechanism that generates interpretable summaries of successful multi-hop reasoning paths via contrastive learning, and (2) an adaptive reward shaping strategy that penalizes redundant tool invocations only upon correct answers. By integrating reinforcement learning, contrastive learning, and multi-hop reasoning, DeepSearch achieves accuracy comparable to the state-of-the-art baseline ReSearch across four multi-hop question answering benchmarks. Crucially, it reduces tool calls by 39.6%, inference latency by 48.6%, and token consumption by 21.2%, thereby enabling efficient, autonomous control over knowledge retrieval.

Technology Category

Application Category

📝 Abstract
DeepSearch paradigms have become a core enabler for deep reasoning models, allowing them to invoke external search tools to access up-to-date, domain-specific knowledge beyond parametric boundaries, thereby enhancing the depth and factual reliability of reasoning. Building upon this foundation, recent advances in reinforcement learning (RL) have further empowered models to autonomously and strategically control search tool usage, optimizing when and how to query external knowledge sources. Yet, these RL-driven DeepSearch systems often reveal a see-saw trade-off between accuracy and efficiency-frequent tool invocations can improve factual correctness but lead to unnecessary computational overhead and diminished efficiency. To address this challenge, we propose LightSearcher, an efficient RL framework that incorporates textual experiential memory by learning contrastive reasoning trajectories to generate interpretable summaries of successful reasoning patterns. In addition, it employs an adaptive reward shaping mechanism that penalizes redundant tool calls only in correct-answer scenarios. This design effectively balances the inherent accuracy-efficiency trade-off in DeepSearch paradigms. Experiments on four multi-hop QA benchmarks show that LightSearcher maintains accuracy comparable to SOTA baseline ReSearch, while reducing search tool invocations by 39.6%, inference time by 48.6%, and token consumption by 21.2%, demonstrating its superior efficiency.
Problem

Research questions and friction points this paper is trying to address.

Balances accuracy-efficiency trade-off in DeepSearch
Reduces computational overhead from frequent tool invocations
Optimizes external search usage via experiential memory
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses experiential memory with contrastive reasoning trajectories
Employs adaptive reward shaping to penalize redundant calls
Balances accuracy-efficiency trade-off in DeepSearch paradigms
🔎 Similar Papers
No similar papers found.
H
Hengzhi Lan
Beijing University of Posts and Telecommunications, Beijing, China
Y
Yue Yu
Beijing University of Posts and Telecommunications, Beijing, China
Li Qian
Li Qian
University of Michigan
Database Usability
Li Peng
Li Peng
Nanjing University of Posts and Telecommunications
J
Jie Wu
Researcher, China
W
Wei Liu
Researcher, China
Jian Luan
Jian Luan
Toshiba, Microsoft, Xiaomi
LLMVLMTTSSinging Synthesis
T
Ting Bai
Beijing University of Posts and Telecommunications, Beijing, China