🤖 AI Summary
Existing RL web agents suffer from a fundamental trade-off between browser interaction fidelity and server-state controllability, leading to redundant context, unstable UI/network states causing nondeterministic actions, and inefficient containerized scaling. This paper proposes a Browser-Server Coordinated Reinforcement Learning framework: (1) a compact, site-agnostic browser observation space; (2) a UI-stability auto-detection mechanism ensuring action determinism; and (3) a lightweight frontend coupled with a rapidly bootable/shutdown containerized web server architecture for high-concurrency, isolated deployment. Evaluated on WebArena’s shopping CMS and GitLab tasks, our approach achieves state-of-the-art single-prompt success rates. It reduces startup latency by 5×, decreases storage overhead by 240×, and enables a single machine to host over 200 concurrent containers—significantly improving scalability, efficiency, and reproducibility in large-scale RL training and evaluation.
📝 Abstract
Training and evaluation of Reinforcement Learning (RL) web agents have gained increasing attention, yet a scalable and efficient environment that couples realistic and robust browser-side interaction with controllable server-side state at scale is still missing. Existing environments tend to have one or more of the following issues: they overwhelm policy models with excessive and noisy context; they perform actions non-deterministically without waiting for the UI or network to stabilize; or they cannot scale isolated client-server containers effectively for parallel RL rollouts. We propose WEBSERV, an environment that includes 1) a compact, site-agnostic browser environment that balances context and action complexity, and 2) a scalable RL environment via efficient launching and resetting web-servers to enable scalable RL training and evaluation. We evaluate WEBSERV on the shopping CMS and Gitlab tasks in WebArena, achieving state-of-the-art single-prompt success rates while cutting launch latency by ~5x and storage need by ~240x, with a comparable memory footprint, enabling 200+ concurrent containers on a single host.