๐ค AI Summary
This work addresses the high cost, prolonged duration, and user experience disruption associated with online A/B testing in traditional recommender systems, as well as the limitations of existing large language modelโbased user agents that lack multimodal perception and authentic interactive capabilities, thereby failing to accurately simulate user behavior. To overcome these challenges, the authors propose A/B Agentโa novel user agent framework capable of processing multimodal inputs and supporting multi-page interactions. Within a constructed recommendation sandbox environment, A/B Agent integrates user profiles, action memory, and a fatigue mechanism, enabling, for the first time, visually aware simulation users with cross-page behavioral modeling. Experimental results demonstrate that this framework effectively substitutes real-world A/B testing across model, data, and feature dimensions, and that the synthesized data it generates significantly enhances recommender system performance.
๐ Abstract
In recommender systems, online A/B testing is a crucial method for evaluating the performance of different models. However, conducting online A/B testing often presents significant challenges, including substantial economic costs, user experience degradation, and considerable time requirements. With the Large Language Models'powerful capacity, LLM-based agent shows great potential to replace traditional online A/B testing. Nonetheless, current agents fail to simulate the perception process and interaction patterns, due to the lack of real environments and visual perception capability. To address these challenges, we introduce a multi-modal user agent for A/B testing (A/B Agent). Specifically, we construct a recommendation sandbox environment for A/B testing, enabling multimodal and multi-page interactions that align with real user behavior on online platforms. The designed agent leverages multimodal information perception, fine-grained user preferences, and integrates profiles, action memory retrieval, and a fatigue system to simulate complex human decision-making. We validated the potential of the agent as an alternative to traditional A/B testing from three perspectives: model, data, and features. Furthermore, we found that the data generated by A/B Agent can effectively enhance the capabilities of recommendation models. Our code is publicly available at https://github.com/Applied-Machine-Learning-Lab/ABAgent.