LightSearcher: Efficient DeepSearch via Experiential Memory

📅 2025-12-06

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

To address the challenge of balancing accuracy and efficiency when deep reasoning models invoke external search tools, this paper proposes DeepSearch, a reinforcement learning framework grounded in experiential memory. Methodologically, it introduces (1) a textual experiential memory mechanism that generates interpretable summaries of successful multi-hop reasoning paths via contrastive learning, and (2) an adaptive reward shaping strategy that penalizes redundant tool invocations only upon correct answers. By integrating reinforcement learning, contrastive learning, and multi-hop reasoning, DeepSearch achieves accuracy comparable to the state-of-the-art baseline ReSearch across four multi-hop question answering benchmarks. Crucially, it reduces tool calls by 39.6%, inference latency by 48.6%, and token consumption by 21.2%, thereby enabling efficient, autonomous control over knowledge retrieval.

Technology Category

Application Category

📝 Abstract

DeepSearch paradigms have become a core enabler for deep reasoning models, allowing them to invoke external search tools to access up-to-date, domain-specific knowledge beyond parametric boundaries, thereby enhancing the depth and factual reliability of reasoning. Building upon this foundation, recent advances in reinforcement learning (RL) have further empowered models to autonomously and strategically control search tool usage, optimizing when and how to query external knowledge sources. Yet, these RL-driven DeepSearch systems often reveal a see-saw trade-off between accuracy and efficiency-frequent tool invocations can improve factual correctness but lead to unnecessary computational overhead and diminished efficiency. To address this challenge, we propose LightSearcher, an efficient RL framework that incorporates textual experiential memory by learning contrastive reasoning trajectories to generate interpretable summaries of successful reasoning patterns. In addition, it employs an adaptive reward shaping mechanism that penalizes redundant tool calls only in correct-answer scenarios. This design effectively balances the inherent accuracy-efficiency trade-off in DeepSearch paradigms. Experiments on four multi-hop QA benchmarks show that LightSearcher maintains accuracy comparable to SOTA baseline ReSearch, while reducing search tool invocations by 39.6%, inference time by 48.6%, and token consumption by 21.2%, demonstrating its superior efficiency.

Problem

Research questions and friction points this paper is trying to address.

Balances accuracy-efficiency trade-off in DeepSearch

Reduces computational overhead from frequent tool invocations

Optimizes external search usage via experiential memory

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses experiential memory with contrastive reasoning trajectories

Employs adaptive reward shaping to penalize redundant calls

Balances accuracy-efficiency trade-off in DeepSearch paradigms

🔎 Similar Papers

No similar papers found.