🤖 AI Summary
Current autonomous web agents are limited in complex, dynamic, and long-horizon tasks due to rigid planning and reasoning hallucinations. This work proposes WebUncertainty, a novel framework that introduces a dual uncertainty-driven mechanism operating at both task and action levels. At the task level, it enables adaptive planning, while at the action level, it integrates Monte Carlo Tree Search (MCTS) with a Confidence-induced Action Uncertainty (ConActU) strategy to jointly quantify epistemic and aleatoric uncertainties, thereby enhancing reasoning robustness. Experimental results demonstrate that the proposed method significantly outperforms state-of-the-art models on the WebArena and WebVoyager benchmarks, exhibiting superior task completion capability and adaptability.
📝 Abstract
Recent advancements in large language models (LLMs) have empowered autonomous web agents to execute natural language instructions directly on real-world webpages. However, existing agents often struggle with complex tasks involving dynamic interactions and long-horizon execution due to rigid planning strategies and hallucination-prone reasoning. To address these limitations, we propose WebUncertainty, a novel autonomous agent framework designed to tackle dual-level uncertainty in planning and reasoning. Specifically, we design a Task Uncertainty-Driven Adaptive Planning Mechanism that adaptively selects planning modes to navigate unknown environments. Furthermore, we introduce an Action Uncertainty-Driven Monte Carlo tree search (MCTS) Reasoning Mechanism. This mechanism incorporates the Confidence-induced Action Uncertainty (ConActU) strategy to quantify both aleatoric uncertainty (AU) and epistemic uncertainty (EU), thereby optimizing the search process and guiding robust decision-making. Experimental results on the WebArena and WebVoyager benchmarks demonstrate that WebUncertainty achieves superior performance compared to state-of-the-art baselines.