🤖 AI Summary
Existing LLM-based web agents struggle with multi-step, goal-directed tasks—such as information retrieval, report generation, and online transactions—in open-web environments due to insufficient reasoning depth, ineffective backtracking, and low computational efficiency. To address these challenges, we propose a Tree-based Reasoning-and-Action (TRA) architecture that integrates subtask decomposition, context-and-action memory, webpage state replay, and background-reasoning-guided exploration. This enables fine-grained, backtrackable multi-step reasoning and cross-session knowledge reuse. Evaluated on the WebArena benchmark, our approach achieves a 35.8% task success rate and reduces execution time by up to 40.4%, significantly outperforming prior methods. Our core contributions are threefold: (1) the first integration of structured tree search into web agents; (2) the introduction of shared action memory for persistent, reusable interaction history; and (3) a unified framework that jointly optimizes reasoning depth, controllability, and execution efficiency.
📝 Abstract
Autonomous web agents powered by large language models (LLMs) show strong potential for performing goal-oriented tasks such as information retrieval, report generation, and online transactions. These agents mark a key step toward practical embodied reasoning in open web environments. However, existing approaches remain limited in reasoning depth and efficiency: vanilla linear methods fail at multi-step reasoning and lack effective backtracking, while other search strategies are coarse-grained and computationally costly. We introduce Branch-and-Browse, a fine-grained web agent framework that unifies structured reasoning-acting, contextual memory, and efficient execution. It (i) employs explicit subtask management with tree-structured exploration for controllable multi-branch reasoning, (ii) bootstraps exploration through efficient web state replay with background reasoning, and (iii) leverages a page action memory to share explored actions within and across sessions. On the WebArena benchmark, Branch-and-Browse achieves a task success rate of 35.8% and reduces execution time by up to 40.4% relative to state-of-the-art methods. These results demonstrate that Branch-and-Browse is a reliable and efficient framework for LLM-based web agents.