🤖 AI Summary
Existing web agents for dynamic web environments rely on greedy, unidirectional search strategies, exhibiting poor fault tolerance and difficulty recovering from erroneous states. To address this, we propose an explicit, triggerable state-rollback mechanism. Our method leverages large language model–driven trajectory modeling coupled with lightweight state snapshot management, enabling agents to proactively revert to historically validated states during navigation—thereby overcoming the limitations of conventional unidirectional search. This is the first approach to realize controllable and interpretable explicit rollback, supporting both zero-shot generalization and fine-tuning adaptability. Evaluated on two real-world web navigation benchmarks, our method achieves significant improvements in task success rate and path efficiency under both zero-shot and fine-tuned settings, empirically demonstrating that the rollback mechanism enhances both planning robustness and flexibility.
📝 Abstract
With recent advancements in large language models, web agents have been greatly improved. However, dealing with complex and dynamic web environments requires more advanced planning and search abilities. Previous studies usually adopt a greedy one-way search strategy, which may struggle to recover from erroneous states. In this work, we enhance web agents with an explicit rollback mechanism, enabling the agent to revert back to a previous state in its navigation trajectory. This mechanism gives the model the flexibility to directly control the search process, leading to an effective and efficient web navigation method. We conduct experiments on two live web navigation benchmarks with zero-shot and fine-tuning settings. The results demonstrate the effectiveness of our proposed approach.