🤖 AI Summary
To address the insufficient robustness and poor policy adaptability of web agents in navigating and interacting with complex web pages, this paper proposes the R2D2 framework, introducing for the first time a synergistic dual-paradigm mechanism: *Remember*—dynamically constructing a webpage topology map via replay-buffer-based trajectory storage—and *Reflect*—error-attribution-driven iterative policy optimization. By integrating trajectory-level error analysis, environment modeling, and dynamic action-policy fine-tuning, R2D2 achieves joint understanding of webpage structure and task logic. Evaluated on the WEBARENA benchmark, R2D2 reduces navigation error rate by 50% and triples task success rate. The framework significantly enhances generalization and practical utility of web agents in real-world applications such as customer service automation and digital assistants.
📝 Abstract
The proliferation of web agents necessitates advanced navigation and interaction strategies within complex web environments. Current models often struggle with efficient navigation and action execution due to limited visibility and understanding of web structures. Our proposed R2D2 framework addresses these challenges by integrating two paradigms: Remember and Reflect. The Remember paradigm utilizes a replay buffer that aids agents in reconstructing the web environment dynamically, thus enabling the formulation of a detailed ``map'' of previously visited pages. This helps in reducing navigational errors and optimizing the decision-making process during web interactions. Conversely, the Reflect paradigm allows agents to learn from past mistakes by providing a mechanism for error analysis and strategy refinement, enhancing overall task performance. We evaluate R2D2 using the WEBARENA benchmark, demonstrating significant improvements over existing methods, including a 50% reduction in navigation errors and a threefold increase in task completion rates. Our findings suggest that a combination of memory-enhanced navigation and reflective learning promisingly advances the capabilities of web agents, potentially benefiting various applications such as automated customer service and personal digital assistants.