🤖 AI Summary
Traditional A/B testing in e-commerce is time-consuming, requires diverting real user traffic, and often degrades user experience. This work proposes an intelligent agent simulation framework grounded in vision-language models (VLMs), which constructs traffic-driven user profiles from production clickstream data and simulates end-to-end shopping journeys across control and treatment groups within live-browser environments. By integrating a situational memory mechanism and a behavior alignment evaluation protocol, the framework achieves high-fidelity user emulation. It uniquely combines multimodal perception, browser-level interaction, and behavioral modeling, and has been validated across multiple product categories on major e-commerce platforms. The simulated outcomes align with real user add-to-cart behavior in 77% of cases, reducing experimental cycles from weeks to under one hour.
📝 Abstract
A/B testing remains the gold standard for evaluating modifications to e-commerce storefronts, yet it diverts traffic, requires weeks to reach statistical significance, and risks degrading user experience. We present SimGym, a framework for simulating A/B tests on e-commerce storefronts using vision-language model (VLM) agents operating in a live browser. The framework comprises three key components: (a) a traffic-grounded persona generation pipeline that derives per-shop buyer archetypes and intents from production clickstream data; (b) a live-browser agent architecture that combines multimodal perception over visual and browser-structured observations with episodic memory and guardrails to conduct coherent shopping sessions across control and treatment storefronts; and (c) an evaluation protocol that compares simulated outcome shifts with observed shifts in real buyer behavior. We validate SimGym on A/B tests of visually driven UI theme changes from a major e-commerce platform across diverse storefronts and product categories. Empirical results show that SimGym agents achieve strong agreement with observed outcome shifts, attaining 77% directional alignment with add-to-cart shifts observed across interface variants in real-buyer traffic. It reduces experimental cycles from weeks to under an hour, enabling rapid experimentation without exposing real buyers to candidate variants.