Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents

📅 2024-11-10
🏛️ arXiv.org
📈 Citations: 9
Influential: 1
📄 PDF
🤖 AI Summary
In real-world web automation, irreversible actions render backtracking search methods (e.g., tree search) ineffective and inefficient. Method: We propose a model-driven planning paradigm centered on Dreamer-7B—a lightweight, LLM-augmented world model that uniquely serves *both* as a world model and a value function. We introduce a scalable data synthesis and distillation pipeline, eliminating reliance on sandboxed environments. The WebDreamer framework enables multi-step action simulation and consequence evaluation for pre-execution planning. Contributions/Results: On VisualWebArena, WebDreamer matches tree search performance while accelerating inference 4–5×. It significantly outperforms reactive baselines on real-world benchmarks Online-Mind2Web and Mind2Web-Live. Notably, Dreamer-7B achieves performance comparable to GPT-4o, demonstrating the viability of efficient small-scale models for complex web navigation planning.

Technology Category

Application Category

📝 Abstract
Language agents based on large language models (LLMs) have demonstrated great promise in automating web-based tasks. Recent work has shown that incorporating advanced planning algorithms, e.g., tree search, is advantageous over reactive planning for web agents. However, unlike simulated sandbox environments, real-world environments such as the web are rife with irreversible actions. This undermines the feasibility of backtracking, a cornerstone of (tree) search. Overly relying on test-time search also hurts efficiency. We advocate model-based planning for web agents that employs a world model to simulate and deliberate over the outcome of each candidate action before committing to one. We systematically explore this paradigm by (1) Proposing a model-based planning framework, WebDreamer, which employs LLMs to serve as both world models and value functions; (2) Training specialized LLMs as world models with a scalable data synthesis pipeline. Empirical results demonstrate that WebDreamer achieves substantial performance improvements over reactive baselines. It is competitive, while being 4-5 times more efficient, with tree search in sandbox environments (VisualWebArena) and also works effectively on real-world websites (Online-Mind2Web and Mind2Web-Live). Furthermore, our trained world model, Dreamer-7B, performs comparable to GPT-4o, highlighting the potential of specialized world models for efficient and effective planning in complex web environments.
Problem

Research questions and friction points this paper is trying to address.

Address irreversible actions in real-world web environments for planning
Improve efficiency by reducing reliance on test-time search methods
Develop specialized world models for effective web agent planning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Model-based planning with world models
LLMs as world models and value functions
Scalable data synthesis for training world models
🔎 Similar Papers
No similar papers found.