🤖 AI Summary
Evaluating large language models (LLMs) in simulating user online shopping behavior—particularly next-action prediction—is hindered by a scarcity of high-quality, fine-grained, cognition-aware behavioral data.
Method: We introduce OPeRA, the first publicly available shopping behavior simulation dataset featuring quadruple-aligned annotations: user personas, visual observations, action sequences, and intrinsic rationales. We design a digital-twin-oriented simulation benchmark, collecting real-world interactions via a custom browser extension and capturing cognitive processes through just-in-time structured questionnaires. Multimodal temporal alignment and privacy-preserving annotation yield hundreds of complete shopping sessions.
Contribution/Results: Joint prediction of actions and rationales significantly improves performance: baseline models achieve 17.3% higher accuracy than single-task counterparts, empirically validating LLMs’ capacity to model users’ dynamic intentions in e-commerce contexts.
📝 Abstract
Can large language models (LLMs) accurately simulate the next web action of a specific user? While LLMs have shown promising capabilities in generating ``believable'' human behaviors, evaluating their ability to mimic real user behaviors remains an open challenge, largely due to the lack of high-quality, publicly available datasets that capture both the observable actions and the internal reasoning of an actual human user. To address this gap, we introduce OPERA, a novel dataset of Observation, Persona, Rationale, and Action collected from real human participants during online shopping sessions. OPERA is the first public dataset that comprehensively captures: user personas, browser observations, fine-grained web actions, and self-reported just-in-time rationales. We developed both an online questionnaire and a custom browser plugin to gather this dataset with high fidelity. Using OPERA, we establish the first benchmark to evaluate how well current LLMs can predict a specific user's next action and rationale with a given persona andhistory. This dataset lays the groundwork for future research into LLM agents that aim to act as personalized digital twins for human.