🤖 AI Summary
Web agents frequently fail to navigate unfamiliar websites due to insufficient environmental understanding and ineffective path planning. To address this, we propose a graph-search-driven structured exploration framework that formally models web navigation as a graph search process over reusable information—enabling cross-session knowledge sharing and scalable generation of high-quality navigation trajectories. Our method integrates graph search algorithms, tight coupling with the WebArena benchmark, fine-tuning of a 7B-language-model, and interactive, URL-grounded data collection. This yields a high-fidelity dataset comprising 10K successful trajectories and 40K interaction steps. Evaluated on WebArena, the fine-tuned model achieves a 21.7% task success rate—outperforming GPT-4o mini by 2.4 percentage points and establishing a new state-of-the-art for models of comparable scale.
📝 Abstract
One of the fundamental problems in digital agents is their lack of understanding of their environment. For instance, a web browsing agent may get lost in unfamiliar websites, uncertain what pages must be visited to achieve its goals. To address this, we propose Go-Browse, a method for automatically collecting diverse and realistic web agent data at scale through structured exploration of web environments. Go-Browse achieves efficient exploration by framing data collection as a graph search, enabling reuse of information across exploration episodes. We instantiate our method on the WebArena benchmark, collecting a dataset of 10K successful task-solving trajectories and 40K interaction steps across 100 URLs. Fine-tuning a 7B parameter language model on this dataset achieves a success rate of 21.7% on the WebArena benchmark, beating GPT-4o mini by 2.4% and exceeding current state-of-the-art results for sub-10B parameter models by 2.9%.