🤖 AI Summary
Existing LLM-based agents struggle to balance search breadth and reasoning depth in large-scale web retrieval: sequential querying leads to insufficient coverage, while noisy raw inputs disrupt coherence in multi-step reasoning. This paper proposes a planning-execution dual-agent collaborative framework, enabled by programmatic tool augmentation to enforce role specialization—where the planner agent dynamically generates search strategies, and the executor agent precisely invokes tools and synthesizes structured evidence. We further introduce task-constrained strategy optimization and cross-lingual support. Evaluated on English and Chinese benchmarks, our approach achieves 30.0 and 46.5 points, respectively—significantly outperforming both open-source and closed-source baselines. It is the first method to achieve a substantive trade-off between scalability and reasoning quality, establishing a scalable, robust paradigm for complex, reasoning-intensive information retrieval.
📝 Abstract
Effective information seeking in the vast and ever-growing digital landscape requires balancing expansive search with strategic reasoning. Current large language model (LLM)-based agents struggle to achieve this balance due to limitations in search breadth and reasoning depth, where slow, serial querying restricts coverage of relevant sources and noisy raw inputs disrupt the continuity of multi-step reasoning. To address these challenges, we propose BrowseMaster, a scalable framework built around a programmatically augmented planner-executor agent pair. The planner formulates and adapts search strategies based on task constraints, while the executor conducts efficient, targeted retrieval to supply the planner with concise, relevant evidence. This division of labor preserves coherent, long-horizon reasoning while sustaining broad and systematic exploration, overcoming the trade-off that limits existing agents. Extensive experiments on challenging English and Chinese benchmarks show that BrowseMaster consistently outperforms open-source and proprietary baselines, achieving scores of 30.0 on BrowseComp-en and 46.5 on BrowseComp-zh, which demonstrates its strong capability in complex, reasoning-heavy information-seeking tasks at scale.