Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization

πŸ“… 2026-02-26
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the limitations of existing deep research agents in search-intensive tasks, which suffer from high latency, high cost, and poor cross-task generalization due to their reliance on serial reasoning. To overcome these challenges, we propose the β€œSearch More, Think Less” (SMTL) framework, which shifts long-horizon reasoning from deep sequential processing toward parallel evidence acquisition. SMTL enables efficient context management under constrained context budgets and supports joint training on both factoid question answering and open-ended research tasks through a unified data synthesis pipeline. Optimized end-to-end via supervised fine-tuning and reinforcement learning, SMTL achieves state-of-the-art performance on BrowseComp (48.6%), GAIA (75.7%), Xbench (82.0%), and DeepResearch Bench (45.9%). Compared to Mirothinker-v1.0, it reduces reasoning steps by 70.7% on BrowseComp while simultaneously improving accuracy.

Technology Category

Application Category

πŸ“ Abstract
Recent deep research agents primarily improve performance by scaling reasoning depth, but this leads to high inference cost and latency in search-intensive scenarios. Moreover, generalization across heterogeneous research settings remains challenging. In this work, we propose \emph{Search More, Think Less} (SMTL), a framework for long-horizon agentic search that targets both efficiency and generalization. SMTL replaces sequential reasoning with parallel evidence acquisition, enabling efficient context management under constrained context budgets. To support generalization across task types, we further introduce a unified data synthesis pipeline that constructs search tasks spanning both deterministic question answering and open-ended research scenarios with task appropriate evaluation metrics. We train an end-to-end agent using supervised fine-tuning and reinforcement learning, achieving strong and often state of the art performance across benchmarks including BrowseComp (48.6\%), GAIA (75.7\%), Xbench (82.0\%), and DeepResearch Bench (45.9\%). Compared to Mirothinker-v1.0, SMTL with maximum 100 interaction steps reduces the average number of reasoning steps on BrowseComp by 70.7\%, while improving accuracy.
Problem

Research questions and friction points this paper is trying to address.

long-horizon agentic search
reasoning efficiency
generalization
inference cost
search-intensive scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic search
parallel evidence acquisition
context efficiency
generalization
data synthesis
πŸ”Ž Similar Papers