Weblica: Scalable and Reproducible Training Environments for Visual Web Agents

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work addresses the limitations of existing vision-based web agents in reinforcement learning, which suffer from small-scale, low-diversity training data and poor fidelity in simulating real user interactions. To overcome these challenges, the authors propose the first large-scale, reproducible training framework grounded in authentic web interactions. By leveraging HTTP-level cache replay to preserve real-world states and employing large language models to synthesize diverse web environments that embody core navigation skills, the framework enables reinforcement learning across over a thousand tasks. The resulting Weblica-8B model outperforms open-source counterparts of comparable scale on multiple web navigation benchmarks, achieves superior test-time computational scalability, requires fewer reasoning steps, and matches the performance of API-based models.

📝 Abstract

The web is complex, open-ended, and constantly changing, making it challenging to scale training data for visual web agents. Existing data collection attempts remain limited to offline trajectories for supervised fine-tuning or a handful of simulated environments for RL training, thus failing to capture web diversity. We propose Weblica (Web Replica), a framework for constructing reproducible and scalable web environments. Our framework leverages 1) HTTP-level caching to capture and replay stable visual states while preserving interactive behavior and 2) LLM-based environment synthesis grounded in real-world websites and core web navigation skills. Using this framework, we scale RL training to thousands of diverse environments and tasks. Our best model, Weblica-8B, outperforms open-weight baselines of similar size across multiple web navigation benchmarks while using fewer inference steps, scales favorably with additional test-time compute, and is competitive with API models.

Problem

Research questions and friction points this paper is trying to address.

visual web agents

training environments

web navigation

scalability

reproducibility

Innovation

Methods, ideas, or system contributions that make the work stand out.

Weblica

visual web agents

HTTP-level caching