SimGym: Traffic-Grounded Browser Agents for Offline A/B Testing in E-Commerce

📅 2026-02-01

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Traditional A/B testing in e-commerce relies on real user traffic, resulting in prolonged experiment cycles and potential degradation of user experience. This work proposes SimGym, a large language model (LLM)-driven browser agent framework that constructs high-fidelity synthetic buyers by extracting user personas and intents from production data, enabling offline simulation of their interactions under both control and treatment conditions. SimGym represents the first approach to integrate LLM-powered agents with real-world user behavioral patterns, effectively replicating the impact of UI changes without involving actual users. Validated on a major e-commerce platform, SimGym reduces experiment duration from weeks to under an hour—even without alignment fine-tuning—demonstrating substantial gains in testing efficiency and scalability.

Technology Category

Application Category

📝 Abstract

A/B testing remains the gold standard for evaluating e-commerce UI changes, yet it diverts traffic, takes weeks to reach significance, and risks harming user experience. We introduce SimGym, a scalable system for rapid offline A/B testing using traffic-grounded synthetic buyers powered by Large Language Model agents operating in a live browser. SimGym extracts per-shop buyer profiles and intents from production interaction data, identifies distinct behavioral archetypes, and simulates cohort-weighted sessions across control and treatment storefronts. We validate SimGym against real human outcomes from real UI changes on a major e-commerce platform under confounder control. Even without alignment post training, SimGym agents achieve state of the art alignment with observed outcome shifts and reduces experiment cycles from weeks to under an hour , enabling rapid experimentation without exposure to real buyers.

Problem

Research questions and friction points this paper is trying to address.

A/B testing

e-commerce

offline evaluation

user experience

traffic diversion

Innovation

Methods, ideas, or system contributions that make the work stand out.

SimGym

offline A/B testing

LLM agents