ATLAS: Actor-Critic Task-Completion with Look-ahead Action Simulation

📅 2025-10-26

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

Current web agents exhibit poor generalization to unseen environments, rely heavily on website-specific fine-tuning, and struggle to model structural and dynamic environmental properties—leading to inefficient planning. This paper proposes a fine-tuning-free, memory-augmented modular architecture that constructs a lightweight cognitive map via exploratory interaction, and integrates hierarchical planning, a world model, and forward-looking re-planning to enable action simulation and policy optimization within a learned cognitive space. Its core innovation lies in unifying the Actor-Critic framework with an executable action simulator and a critic module, jointly supporting plan execution, mental rehearsal, and real-time policy correction. Evaluated on WebArena-Lite, our approach achieves a 63.0% task success rate—substantially surpassing the prior state-of-the-art (53.9%). Ablation studies confirm the significant contribution of each component.

Technology Category

Application Category

📝 Abstract

We observe that current state-of-the-art web-agents are unable to effectively adapt to new environments without neural network fine-tuning, without which they produce inefficient execution plans due to a lack of awareness of the structure and dynamics of the new environment. To address this limitation, we introduce ATLAS (Actor-Critic Task-completion with Look-ahead Action Simulation), a memory-augmented agent that is able to make plans grounded in a model of the environment by simulating the consequences of those actions in cognitive space. Our agent starts by building a "cognitive map" by performing a lightweight curiosity driven exploration of the environment. The planner proposes candidate actions; the simulator predicts their consequences in cognitive space; a critic analyzes the options to select the best roll-out and update the original plan; and a browser executor performs the chosen action. On the WebArena-Lite Benchmark, we achieve a 63% success rate compared to 53.9% success rate for the previously published state-of-the-art. Unlike previous systems, our modular architecture requires no website-specific LLM fine-tuning. Ablations show sizable drops without the world-model, hierarchical planner, and look-ahead-based replanner confirming their complementary roles within the design of our system

Problem

Research questions and friction points this paper is trying to address.

Web agents fail to adapt to new environments without fine-tuning

Agents lack awareness of environment structure and dynamics

Current systems produce inefficient execution plans in novel settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Builds cognitive map through curiosity-driven environment exploration

Simulates action consequences in cognitive space for planning

Uses modular architecture without website-specific LLM fine-tuning

🔎 Similar Papers

Long-Horizon Planning for Multi-Agent Robots in Partially Observable Environments