Exploring Recommender System Evaluation: A Multi-Modal User Agent Framework for A/B Testing

๐Ÿ“… 2026-01-08
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 2
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the high cost, prolonged duration, and user experience disruption associated with online A/B testing in traditional recommender systems, as well as the limitations of existing large language modelโ€“based user agents that lack multimodal perception and authentic interactive capabilities, thereby failing to accurately simulate user behavior. To overcome these challenges, the authors propose A/B Agentโ€”a novel user agent framework capable of processing multimodal inputs and supporting multi-page interactions. Within a constructed recommendation sandbox environment, A/B Agent integrates user profiles, action memory, and a fatigue mechanism, enabling, for the first time, visually aware simulation users with cross-page behavioral modeling. Experimental results demonstrate that this framework effectively substitutes real-world A/B testing across model, data, and feature dimensions, and that the synthesized data it generates significantly enhances recommender system performance.

Technology Category

Application Category

๐Ÿ“ Abstract
In recommender systems, online A/B testing is a crucial method for evaluating the performance of different models. However, conducting online A/B testing often presents significant challenges, including substantial economic costs, user experience degradation, and considerable time requirements. With the Large Language Models'powerful capacity, LLM-based agent shows great potential to replace traditional online A/B testing. Nonetheless, current agents fail to simulate the perception process and interaction patterns, due to the lack of real environments and visual perception capability. To address these challenges, we introduce a multi-modal user agent for A/B testing (A/B Agent). Specifically, we construct a recommendation sandbox environment for A/B testing, enabling multimodal and multi-page interactions that align with real user behavior on online platforms. The designed agent leverages multimodal information perception, fine-grained user preferences, and integrates profiles, action memory retrieval, and a fatigue system to simulate complex human decision-making. We validated the potential of the agent as an alternative to traditional A/B testing from three perspectives: model, data, and features. Furthermore, we found that the data generated by A/B Agent can effectively enhance the capabilities of recommendation models. Our code is publicly available at https://github.com/Applied-Machine-Learning-Lab/ABAgent.
Problem

Research questions and friction points this paper is trying to address.

Recommender System
A/B Testing
User Simulation
Multimodal Perception
LLM-based Agent
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal user agent
A/B testing simulation
recommendation sandbox
LLM-based evaluation
user behavior modeling
๐Ÿ”Ž Similar Papers
No similar papers found.