WebCoach: Self-Evolving Web Agents with Cross-Session Memory Guidance

📅 2025-11-17

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

Existing web navigation agents exhibit poor robustness and sample efficiency in cross-session tasks due to repeated errors, lacking long-term memory and continual learning capabilities. To address this, we propose a model-agnostic self-evolving framework that establishes a persistent external memory bank, integrating navigation log compression, episodic memory storage, and similarity-based retrieval. Runtime hooks inject task-level guidance without requiring model retraining, enabling real-time error avoidance and policy refinement. This approach significantly enhances the long-horizon adaptability of multimodal LLM-based navigation agents. On the WebVoyager benchmark, our method improves task success rate from 47% to 61% for a 38B-parameter model while reducing average navigation steps; notably, smaller models achieve performance on par with the GPT-4o baseline.

Technology Category

Application Category

📝 Abstract

Multimodal LLM-powered agents have recently demonstrated impressive capabilities in web navigation, enabling agents to complete complex browsing tasks across diverse domains. However, current agents struggle with repetitive errors and lack the ability to learn from past experiences across sessions, limiting their long-term robustness and sample efficiency. We introduce WebCoach, a model-agnostic self-evolving framework that equips web browsing agents with persistent cross-session memory, enabling improved long-term planning, reflection, and continual learning without retraining. WebCoach consists of three key components: (1) a WebCondenser, which standardizes raw navigation logs into concise summaries; (2) an External Memory Store, which organizes complete trajectories as episodic experiences; and (3) a Coach, which retrieves relevant experiences based on similarity and recency, and decides whether to inject task-specific advice into the agent via runtime hooks. This design empowers web agents to access long-term memory beyond their native context window, improving robustness in complex browsing tasks. Moreover, WebCoach achieves self-evolution by continuously curating episodic memory from new navigation trajectories, enabling agents to improve over time without retraining. Evaluations on the WebVoyager benchmark demonstrate that WebCoach consistently improves the performance of browser-use agents across three different LLM backbones. With a 38B model, it increases task success rates from 47% to 61% while reducing or maintaining the average number of steps. Notably, smaller base models with WebCoach achieve performance comparable to the same web agent using GPT-4o.

Problem

Research questions and friction points this paper is trying to address.

Web agents struggle with repetitive errors across browsing sessions

Current agents lack ability to learn from past experiences long-term

Existing systems limit robustness and sample efficiency in web navigation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-evolving framework with persistent cross-session memory

Standardizes navigation logs into concise trajectory summaries

Retrieves episodic experiences for runtime advice injection

🔎 Similar Papers

MMInA: Benchmarking Multihop Multimodal Internet Agents