OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories

📅 2026-05-05
📈 Citations: 0
Influential: 0
📄 PDF

career value

246K/year
🤖 AI Summary
This work addresses the challenge of training state-of-the-art deep search agents under resource-constrained academic settings, circumventing the high-cost industrial paradigm reliant on pretraining, continued pretraining (CPT), and reinforcement learning (RL). The authors propose training a 30B-parameter ReAct agent using supervised fine-tuning (SFT) alone, augmented by three efficient data synthesis strategies: knowledge graph expansion, toolset augmentation, and stringent filtering of low-step trajectories. As the first purely academic effort to achieve competitive performance with SFT-only training, the resulting agent surpasses the Tongyi DeepResearch model—which employs a complex CPT+SFT+RL pipeline—on four benchmarks: BrowseComp (46.0%), BrowseComp-ZH (58.1%), Humanity's Last Exam (34.6%), and xbench (78.0%). The model weights are publicly released.
📝 Abstract
Deep search capabilities have become an indispensable competency for frontier Large Language Model (LLM) agents, yet their development remains dominated by industrial giants. The typical industry recipe involves a highly resource-intensive pipeline spanning pre-training, continual pre-training (CPT), supervised fine-tuning (SFT), and reinforcement learning (RL). In this report, we show that when fueled with informative and high-difficulty trajectories, a simple SFT approach could be surprisingly powerful for training frontier search agents. By introducing three simple data synthesis modifications: scaling knowledge graph size for richer exploration, expanding the tool set size for broader functionality, and strict low-step filtering, we establish a stronger baseline. Trained on merely 10.6k data points, our OpenSeeker-v2 achieves state-of-the-art performance across 4 benchmarks (30B-sized agents with ReAct paradigm): 46.0% on BrowseComp, 58.1% on BrowseComp-ZH, 34.6% on Humanity's Last Exam, and 78.0% on xbench, surpassing even Tongyi DeepResearch trained with heavy CPT+SFT+RL pipeline, which achieves 43.4%, 46.7%, 32.9%, and 75.0%, respectively. Notably, OpenSeeker-v2 represents the first state-of-the-art search agent within its model scale and paradigm to be developed by a purely academic team using only SFT. We are excited to open-source the OpenSeeker-v2 model weights and share our simple yet effective findings to make frontier search agent research more accessible to the community.
Problem

Research questions and friction points this paper is trying to address.

search agents
supervised fine-tuning
resource-constrained training
frontier LLM agents
academic research
Innovation

Methods, ideas, or system contributions that make the work stand out.

supervised fine-tuning (SFT)
high-difficulty trajectories
search agents
data synthesis
ReAct paradigm