SimGym: A Framework for A/B Test Simulation in E-Commerce with Traffic-Grounded VLM Agents

📅 2026-05-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

194K/year
🤖 AI Summary
Traditional A/B testing in e-commerce is time-consuming, requires diverting real user traffic, and often degrades user experience. This work proposes an intelligent agent simulation framework grounded in vision-language models (VLMs), which constructs traffic-driven user profiles from production clickstream data and simulates end-to-end shopping journeys across control and treatment groups within live-browser environments. By integrating a situational memory mechanism and a behavior alignment evaluation protocol, the framework achieves high-fidelity user emulation. It uniquely combines multimodal perception, browser-level interaction, and behavioral modeling, and has been validated across multiple product categories on major e-commerce platforms. The simulated outcomes align with real user add-to-cart behavior in 77% of cases, reducing experimental cycles from weeks to under one hour.
📝 Abstract
A/B testing remains the gold standard for evaluating modifications to e-commerce storefronts, yet it diverts traffic, requires weeks to reach statistical significance, and risks degrading user experience. We present SimGym, a framework for simulating A/B tests on e-commerce storefronts using vision-language model (VLM) agents operating in a live browser. The framework comprises three key components: (a) a traffic-grounded persona generation pipeline that derives per-shop buyer archetypes and intents from production clickstream data; (b) a live-browser agent architecture that combines multimodal perception over visual and browser-structured observations with episodic memory and guardrails to conduct coherent shopping sessions across control and treatment storefronts; and (c) an evaluation protocol that compares simulated outcome shifts with observed shifts in real buyer behavior. We validate SimGym on A/B tests of visually driven UI theme changes from a major e-commerce platform across diverse storefronts and product categories. Empirical results show that SimGym agents achieve strong agreement with observed outcome shifts, attaining 77% directional alignment with add-to-cart shifts observed across interface variants in real-buyer traffic. It reduces experimental cycles from weeks to under an hour, enabling rapid experimentation without exposing real buyers to candidate variants.
Problem

Research questions and friction points this paper is trying to address.

A/B testing
e-commerce
user experience
traffic diversion
statistical significance
Innovation

Methods, ideas, or system contributions that make the work stand out.

A/B testing simulation
vision-language model (VLM) agents
traffic-grounded personas
multimodal browser agents
e-commerce experimentation
🔎 Similar Papers
No similar papers found.
H
Han Li
Shopify
V
Vibhor Malik
Shopify
Zahra Zanjani Foumani
Zahra Zanjani Foumani
Ph.D. student, University of California Irvine
Data FusionBayesian OptimizationUncertainty Quantification
A
Alberto Castelo
Shopify
S
Shuang Xie
Shopify
A
Ailin Fan
Shopify
K
Keat Yang Koay
Shopify
Y
Yuanzheng Zhu
Shopify
M
Meysam Feghhi
Shopify
R
Ronie Uliana
Shopify
Zhaoyu Zhang
Zhaoyu Zhang
Associate Professor, The Chinese University of Hong Kong, Shenzhen
Optoelectronicssemiconductor lasersorganic light emitting devicesperovskite light emitting devices
A
Angelo Ocana Martins
Shopify
M
Mingyu Zhao
Shopify
F
Francis Pelland
Shopify
J
Jonathan Faerman
Shopify
N
Nikolas LeBlanc
Shopify
A
Aaron Glazer
Shopify
Andrew McNamara
Andrew McNamara
Direction of Applied Science, Microsoft
NLPImage UnderstandingRecommendation SystemsLLM
Z
Zhong Wu
Shopify
L
Lingyun Wang
Shopify