REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents

📅 2026-02-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses key challenges in long-horizon search tasks for large language models, including scarcity of high-quality trajectories, sparse reward signals, limited scalability of task construction, and high tool invocation costs. To overcome these issues, the authors propose REDSearcher, a framework that systematically enhances agent planning and tool-use capabilities through co-designed complex task synthesis and staged mid-to-late training phases. Key innovations include formulating task synthesis as a dual-constrained optimization problem, introducing a tool-augmented querying mechanism, leveraging graph topology and evidence dispersion to modulate task difficulty, and constructing a local simulation environment to enable cost-effective reinforcement learning. The approach achieves state-of-the-art performance on both textual and multimodal search benchmarks and releases 10K high-quality textual trajectories, 5K multimodal trajectories, and a 1K textual reinforcement learning query set.

Technology Category

Application Category

📝 Abstract
Large language models are transitioning from generalpurpose knowledge engines to realworld problem solvers, yet optimizing them for deep search tasks remains challenging. The central bottleneck lies in the extreme sparsity of highquality search trajectories and reward signals, arising from the difficulty of scalable longhorizon task construction and the high cost of interactionheavy rollouts involving external tool calls. To address these challenges, we propose REDSearcher, a unified framework that codesigns complex task synthesis, midtraining, and posttraining for scalable searchagent optimization. Specifically, REDSearcher introduces the following improvements: (1) We frame task synthesis as a dualconstrained optimization, where task difficulty is precisely governed by graph topology and evidence dispersion, allowing scalable generation of complex, highquality tasks. (2) We introduce toolaugmented queries to encourage proactive tool use rather than passive recall.(3) During midtraining, we strengthen core atomic capabilities knowledge, planning, and function calling substantially reducing the cost of collecting highquality trajectories for downstream training. (4) We build a local simulated environment that enables rapid, lowcost algorithmic iteration for reinforcement learning experiments. Across both textonly and multimodal searchagent benchmarks, our approach achieves stateoftheart performance. To facilitate future research on longhorizon search agents, we will release 10K highquality complex text search trajectories, 5K multimodal trajectories and 1K text RL query set, and together with code and model checkpoints.
Problem

Research questions and friction points this paper is trying to address.

long-horizon search
sparse reward
task synthesis
tool-augmented reasoning
scalable agent training
Innovation

Methods, ideas, or system contributions that make the work stand out.

long-horizon search
task synthesis
tool-augmented queries
mid-training optimization
simulated environment
🔎 Similar Papers
No similar papers found.