AgentA/B: Automated and Scalable Web A/BTesting with Interactive LLM Agents

📅 2025-04-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional A/B testing relies on real user traffic, suffering from long experimental cycles, high operational costs, and poor scalability. To address these limitations, this paper introduces the first web-scale A/B testing paradigm powered by autonomous LLM agents. Our method constructs multi-persona agents that parse live DOM structures, model interactive behaviors, and generate realistic multi-step user trajectories directly within production web environments. The framework supports configurable personas and massive parallel execution—e.g., concurrently deploying 1,000 agents on Amazon.com—thereby drastically reducing dependence on organic traffic. Empirical evaluation demonstrates strong behavioral alignment between LLM agents and real users (similarity >92%), validating the fidelity, scalability, and statistical reliability of our simulation-based approach. This work establishes a novel, efficient, controllable, and fully reproducible infrastructure for web experimentation science.

Technology Category

Application Category

📝 Abstract
A/B testing experiment is a widely adopted method for evaluating UI/UX design decisions in modern web applications. Yet, traditional A/B testing remains constrained by its dependence on the large-scale and live traffic of human participants, and the long time of waiting for the testing result. Through formative interviews with six experienced industry practitioners, we identified critical bottlenecks in current A/B testing workflows. In response, we present AgentA/B, a novel system that leverages Large Language Model-based autonomous agents (LLM Agents) to automatically simulate user interaction behaviors with real webpages. AgentA/B enables scalable deployment of LLM agents with diverse personas, each capable of navigating the dynamic webpage and interactively executing multi-step interactions like search, clicking, filtering, and purchasing. In a demonstrative controlled experiment, we employ AgentA/B to simulate a between-subject A/B testing with 1,000 LLM agents Amazon.com, and compare agent behaviors with real human shopping behaviors at a scale. Our findings suggest AgentA/B can emulate human-like behavior patterns.
Problem

Research questions and friction points this paper is trying to address.

Automating A/B testing to reduce human traffic dependency
Speeding up A/B testing results with simulated user interactions
Scaling A/B testing using diverse LLM agent personas
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM agents simulate user interactions automatically
Scalable deployment with diverse personas
Multi-step interactive behaviors like humans
🔎 Similar Papers
No similar papers found.
Dakuo Wang
Dakuo Wang
Northeastern University
Human-AI CollaborationHuman-Centered AIHuman-Computer InteractionAI for HealthcareCSCW
T
Ting-Yao Hsu
Pennsylvania State University
Y
Yuxuan Lu
Northeastern University
Limeng Cui
Limeng Cui
Amazon
Large Language ModelsGenerative AIRecommendation Systems
Yaochen Xie
Yaochen Xie
Applied Scientist, Amazon
Machine learningSelf-supervised learning
W
William Headean
Amazon
B
Bingsheng Yao
Northeastern University
A
Akash Veeragouni
Amazon
J
Jiapeng Liu
Amazon
Sreyashi Nag
Sreyashi Nag
Amazon Search
J
Jessie Wang
Amazon