SimGym: Traffic-Grounded Browser Agents for Offline A/B Testing in E-Commerce

📅 2026-02-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional A/B testing in e-commerce relies on real user traffic, resulting in prolonged experiment cycles and potential degradation of user experience. This work proposes SimGym, a large language model (LLM)-driven browser agent framework that constructs high-fidelity synthetic buyers by extracting user personas and intents from production data, enabling offline simulation of their interactions under both control and treatment conditions. SimGym represents the first approach to integrate LLM-powered agents with real-world user behavioral patterns, effectively replicating the impact of UI changes without involving actual users. Validated on a major e-commerce platform, SimGym reduces experiment duration from weeks to under an hour—even without alignment fine-tuning—demonstrating substantial gains in testing efficiency and scalability.

Technology Category

Application Category

📝 Abstract
A/B testing remains the gold standard for evaluating e-commerce UI changes, yet it diverts traffic, takes weeks to reach significance, and risks harming user experience. We introduce SimGym, a scalable system for rapid offline A/B testing using traffic-grounded synthetic buyers powered by Large Language Model agents operating in a live browser. SimGym extracts per-shop buyer profiles and intents from production interaction data, identifies distinct behavioral archetypes, and simulates cohort-weighted sessions across control and treatment storefronts. We validate SimGym against real human outcomes from real UI changes on a major e-commerce platform under confounder control. Even without alignment post training, SimGym agents achieve state of the art alignment with observed outcome shifts and reduces experiment cycles from weeks to under an hour , enabling rapid experimentation without exposure to real buyers.
Problem

Research questions and friction points this paper is trying to address.

A/B testing
e-commerce
offline evaluation
user experience
traffic diversion
Innovation

Methods, ideas, or system contributions that make the work stand out.

SimGym
offline A/B testing
LLM agents
traffic-grounded simulation
e-commerce UI evaluation
🔎 Similar Papers
No similar papers found.
A
Alberto Castelo
Shopify, Ottawa, Ontario, Canada
Zahra Zanjani Foumani
Zahra Zanjani Foumani
Ph.D. student, University of California Irvine
Data FusionBayesian OptimizationUncertainty Quantification
A
Ailin Fan
Shopify, Ottawa, Ontario, Canada
K
Keat Yang Koay
Shopify, Ottawa, Ontario, Canada
V
Vibhor Malik
Shopify, Ottawa, Ontario, Canada
Y
Yuanzheng Zhu
Shopify, Ottawa, Ontario, Canada
H
Han Li
Shopify, Ottawa, Ontario, Canada
M
Meysam Feghhi
Shopify, Ottawa, Ontario, Canada
R
Ronie Uliana
Shopify, Ottawa, Ontario, Canada
S
Shuang Xie
Shopify, Ottawa, Ontario, Canada
Zhaoyu Zhang
Zhaoyu Zhang
Associate Professor, The Chinese University of Hong Kong, Shenzhen
Optoelectronicssemiconductor lasersorganic light emitting devicesperovskite light emitting devices
A
Angelo Ocana Martins
Shopify, Ottawa, Ontario, Canada
M
Mingyu Zhao
Shopify, Ottawa, Ontario, Canada
F
Francis Pelland
Shopify, Ottawa, Ontario, Canada
J
Jonathan Faerman
Shopify, Ottawa, Ontario, Canada
N
Nikolas LeBlanc
Shopify, Ottawa, Ontario, Canada
A
Aaron Glazer
Shopify, Ottawa, Ontario, Canada
Andrew McNamara
Andrew McNamara
Direction of Applied Science, Microsoft
NLPImage UnderstandingRecommendation SystemsLLM
L
Lingyun Wang
Shopify, Ottawa, Ontario, Canada
Z
Zhong Wu
Shopify, Ottawa, Ontario, Canada