WebDancer: Towards Autonomous Information Seeking Agency

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of deep information seeking and multi-step reasoning in complex real-world problems by proposing the first end-to-end, data-driven paradigm for building autonomous web agents. Methodologically, it introduces a four-stage training pipeline—web browsing data construction, trajectory sampling, supervised fine-tuning (SFT) cold-start, and PPO-based reinforcement learning—to systematically decouple browsing behavior modeling from reasoning capability enhancement. Built upon the ReAct framework, it incorporates action-observation alignment modeling and synthetic trajectory augmentation, and customizes evaluation protocols using GAIA and WebWalkerQA. The approach achieves state-of-the-art performance on both benchmarks, with significant improvements in zero-shot transfer and long-horizon planning. The codebase and interactive demo are publicly released, providing a reproducible technical foundation for research on autonomous information-seeking agents.

Technology Category

Application Category

📝 Abstract
Addressing intricate real-world problems necessitates in-depth information seeking and multi-step reasoning. Recent progress in agentic systems, exemplified by Deep Research, underscores the potential for autonomous multi-step research. In this work, we present a cohesive paradigm for building end-to-end agentic information seeking agents from a data-centric and training-stage perspective. Our approach consists of four key stages: (1) browsing data construction, (2) trajectories sampling, (3) supervised fine-tuning for effective cold start, and (4) reinforcement learning for enhanced generalisation. We instantiate this framework in a web agent based on the ReAct, WebDancer. Empirical evaluations on the challenging information seeking benchmarks, GAIA and WebWalkerQA, demonstrate the strong performance of WebDancer, achieving considerable results and highlighting the efficacy of our training paradigm. Further analysis of agent training provides valuable insights and actionable, systematic pathways for developing more capable agentic models. The codes and demo will be released in https://github.com/Alibaba-NLP/WebAgent.
Problem

Research questions and friction points this paper is trying to address.

Autonomous multi-step information seeking for complex problems
End-to-end agentic system training paradigm development
Enhancing generalization in web-based research agents
Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-end agentic information seeking paradigm
Four-stage training including reinforcement learning
WebDancer agent based on ReAct framework
🔎 Similar Papers
No similar papers found.