Exploring Recommender System Evaluation: A Multi-Modal User Agent Framework for A/B Testing

📅 2026-01-08

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

223K/year

🤖 AI Summary

This work addresses the high cost, prolonged duration, and user experience disruption associated with online A/B testing in traditional recommender systems, as well as the limitations of existing large language model–based user agents that lack multimodal perception and authentic interactive capabilities, thereby failing to accurately simulate user behavior. To overcome these challenges, the authors propose A/B Agent—a novel user agent framework capable of processing multimodal inputs and supporting multi-page interactions. Within a constructed recommendation sandbox environment, A/B Agent integrates user profiles, action memory, and a fatigue mechanism, enabling, for the first time, visually aware simulation users with cross-page behavioral modeling. Experimental results demonstrate that this framework effectively substitutes real-world A/B testing across model, data, and feature dimensions, and that the synthesized data it generates significantly enhances recommender system performance.

Technology Category

Application Category

📝 Abstract

In recommender systems, online A/B testing is a crucial method for evaluating the performance of different models. However, conducting online A/B testing often presents significant challenges, including substantial economic costs, user experience degradation, and considerable time requirements. With the Large Language Models'powerful capacity, LLM-based agent shows great potential to replace traditional online A/B testing. Nonetheless, current agents fail to simulate the perception process and interaction patterns, due to the lack of real environments and visual perception capability. To address these challenges, we introduce a multi-modal user agent for A/B testing (A/B Agent). Specifically, we construct a recommendation sandbox environment for A/B testing, enabling multimodal and multi-page interactions that align with real user behavior on online platforms. The designed agent leverages multimodal information perception, fine-grained user preferences, and integrates profiles, action memory retrieval, and a fatigue system to simulate complex human decision-making. We validated the potential of the agent as an alternative to traditional A/B testing from three perspectives: model, data, and features. Furthermore, we found that the data generated by A/B Agent can effectively enhance the capabilities of recommendation models. Our code is publicly available at https://github.com/Applied-Machine-Learning-Lab/ABAgent.

Problem

Research questions and friction points this paper is trying to address.

Recommender System

A/B Testing

User Simulation

Multimodal Perception

LLM-based Agent

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal user agent

A/B testing simulation

recommendation sandbox

LLM-based evaluation

user behavior modeling

🔎 Similar Papers

Review-based Recommender Systems: A Survey of Approaches, Challenges and Future Perspectives

2024-05-09arXiv.orgCitations: 4