Go-Browse: Training Web Agents with Structured Exploration

📅 2025-06-04

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Web agents frequently fail to navigate unfamiliar websites due to insufficient environmental understanding and ineffective path planning. To address this, we propose a graph-search-driven structured exploration framework that formally models web navigation as a graph search process over reusable information—enabling cross-session knowledge sharing and scalable generation of high-quality navigation trajectories. Our method integrates graph search algorithms, tight coupling with the WebArena benchmark, fine-tuning of a 7B-language-model, and interactive, URL-grounded data collection. This yields a high-fidelity dataset comprising 10K successful trajectories and 40K interaction steps. Evaluated on WebArena, the fine-tuned model achieves a 21.7% task success rate—outperforming GPT-4o mini by 2.4 percentage points and establishing a new state-of-the-art for models of comparable scale.

Technology Category

Application Category

📝 Abstract

One of the fundamental problems in digital agents is their lack of understanding of their environment. For instance, a web browsing agent may get lost in unfamiliar websites, uncertain what pages must be visited to achieve its goals. To address this, we propose Go-Browse, a method for automatically collecting diverse and realistic web agent data at scale through structured exploration of web environments. Go-Browse achieves efficient exploration by framing data collection as a graph search, enabling reuse of information across exploration episodes. We instantiate our method on the WebArena benchmark, collecting a dataset of 10K successful task-solving trajectories and 40K interaction steps across 100 URLs. Fine-tuning a 7B parameter language model on this dataset achieves a success rate of 21.7% on the WebArena benchmark, beating GPT-4o mini by 2.4% and exceeding current state-of-the-art results for sub-10B parameter models by 2.9%.

Problem

Research questions and friction points this paper is trying to address.

Lack of web agents' understanding of their environment

Difficulty in navigating unfamiliar websites efficiently

Need for scalable data collection for web agent training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Structured exploration for web agent training

Graph search for efficient data collection

Fine-tuning language models with collected trajectories

🔎 Similar Papers

NNetNav: Unsupervised Learning of Browser Agents Through Environment Interaction in the Wild