🤖 AI Summary
Training autonomous web agents in real-world website environments is severely hindered by their uncontrollable nature, difficulty of reset, and lack of verifiable feedback. To address this, this work proposes VeriEnv, a framework that leverages large language models to automatically clone real websites into executable, verifiable synthetic environments. VeriEnv provides controlled access to internal environment states through a Python SDK, enabling agents to self-generate tasks and receive programmatically defined reward signals. This facilitates safe, scalable, and self-evolving training. Experimental results demonstrate that agents trained with VeriEnv exhibit strong generalization to unseen websites, acquire site-specific skills, and consistently improve as the scale of the training environments increases.
📝 Abstract
Training autonomous web agents is fundamentally limited by the environments they learn from: real-world websites are unsafe to explore, hard to reset, and rarely provide verifiable feedback. We propose VeriEnv, a framework that treats language models as environment creators, automatically cloning real-world websites into fully executable, verifiable synthetic environments. By exposing controlled internal access via a Python SDK, VeriEnv enables agents to self-generate tasks with deterministic, programmatically verifiable rewards, eliminating reliance on heuristic or LLM-based judges. This design decouples agent learning from unsafe real-world interaction while enabling scalable self-evolution through environment expansion. Through experiments on web agent benchmarks, we show that agents trained with VeriEnv generalize to unseen websites, achieve site-specific mastery through self-evolving training, and benefit from scaling the number of training environments. Code and resources will be released at https://github.com/kyle8581/VeriEnv upon acceptance.