🤖 AI Summary
Large language models (LLMs) deployed as web agents exhibit insufficient reasoning capabilities and poor robustness in dynamic web environments. Method: This paper proposes a reasoning-skill-customized enhancement framework. Its core innovation is the first explicit reformulation of web interaction reasoning as a chain-of-thought (CoT) paradigm, distilled from real-world interaction trajectories into three key reasoning patterns: reflection-and-forecasting, branching exploration, and rollback. We construct CoT-structured training data and employ supervised fine-tuning (SFT) coupled with a self-improvement mechanism for targeted capability enhancement. Results: Our method achieves significant improvements over existing state-of-the-art approaches on WebVoyager, Mind2Web-Live, and SimpleQA (web search), demonstrating the effectiveness and generalizability of explicit CoT structural modeling and skill-specific optimization.
📝 Abstract
Web agents powered by Large Language Models (LLMs) show promise for next-generation AI, but their limited reasoning in uncertain, dynamic web environments hinders robust deployment. In this paper, we identify key reasoning skills essential for effective web agents, i.e., reflection&lookahead, branching, and rollback, and curate trajectory data that exemplifies these abilities by reconstructing the agent's (inference-time) reasoning algorithms into chain-of-thought rationales. We conduct experiments in the agent self-improving benchmark, OpenWebVoyager, and demonstrate that distilling salient reasoning patterns into the backbone LLM via simple fine-tuning can substantially enhance its performance. Our approach yields significant improvements across multiple benchmarks, including WebVoyager, Mind2web-live, and SimpleQA (web search), highlighting the potential of targeted reasoning skill enhancement for web agents.