SessionIntentBench: A Multi-task Inter-session Intention-shift Modeling Benchmark for E-commerce Customer Behavior Understanding

📅 2025-07-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing approaches struggle to model the dynamic evolution of e-commerce users’ intents across multiple browsing sessions, primarily due to overreliance on shallow textual features (e.g., titles and descriptions) and the absence of annotated data and evaluation benchmarks for cross-session intent transfer. Method: We propose the “Intent Tree”—a novel hierarchical structure that explicitly models intent evolution across sessions—and introduce SessionIntentBench, a large-scale, multimodal, multi-task benchmark comprising 1.97 million intent annotations and over 10 million derivable tasks. Contribution/Results: Experiments reveal that current Large Vision-Language Models (LVLMs) perform poorly on intent migration tasks. Integrating Intent Tree representations significantly improves model performance. This work establishes a new paradigm, dataset, and evaluation standard for understanding e-commerce user behavior, advancing research in cross-session intent modeling and multimodal session understanding.

Technology Category

Application Category

📝 Abstract
Session history is a common way of recording user interacting behaviors throughout a browsing activity with multiple products. For example, if an user clicks a product webpage and then leaves, it might because there are certain features that don't satisfy the user, which serve as an important indicator of on-the-spot user preferences. However, all prior works fail to capture and model customer intention effectively because insufficient information exploitation and only apparent information like descriptions and titles are used. There is also a lack of data and corresponding benchmark for explicitly modeling intention in E-commerce product purchase sessions. To address these issues, we introduce the concept of an intention tree and propose a dataset curation pipeline. Together, we construct a sibling multimodal benchmark, SessionIntentBench, that evaluates L(V)LMs' capability on understanding inter-session intention shift with four subtasks. With 1,952,177 intention entries, 1,132,145 session intention trajectories, and 13,003,664 available tasks mined using 10,905 sessions, we provide a scalable way to exploit the existing session data for customer intention understanding. We conduct human annotations to collect ground-truth label for a subset of collected data to form an evaluation gold set. Extensive experiments on the annotated data further confirm that current L(V)LMs fail to capture and utilize the intention across the complex session setting. Further analysis show injecting intention enhances LLMs' performances.
Problem

Research questions and friction points this paper is trying to address.

Modeling inter-session intention shifts in e-commerce user behavior
Addressing lack of data for intention modeling in purchase sessions
Enhancing LLMs' capability to understand complex customer intentions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces intention tree concept for modeling
Proposes dataset curation pipeline method
Constructs multimodal benchmark SessionIntentBench
🔎 Similar Papers
No similar papers found.
Yuqi Yang
Yuqi Yang
Nankai University
Computer VisionSemantic Segmentation
W
Weiqi Wang
Department of Computer Science and Engineering, HKUST, Hong Kong SAR, China; Amazon.com Inc, Palo Alto, CA, USA
Baixuan Xu
Baixuan Xu
Hong Kong University of Science and Technology
Long-horizon AgentMultimodal Understanding
W
Wei Fan
Department of Computer Science and Engineering, HKUST, Hong Kong SAR, China
Qing Zong
Qing Zong
HKUST
Natural Language ProcessingLarge Language ModelsFactualityUncertainty Calibration
Chunkit Chan
Chunkit Chan
Ph.D. Student, HKUST | Applied Scientist Intern, Amazon, Palo Alto
Natural Language ProcessingLarge Language ModelsTheory of MindComputational Linguistics
Zheye Deng
Zheye Deng
HKUST
Large Language ModelsText-to-StructureAgent Reinforcement Learning
X
Xin Liu
Amazon.com Inc, Palo Alto, CA, USA
Y
Yifan Gao
Amazon.com Inc, Palo Alto, CA, USA
C
Changlong Yu
Amazon.com Inc, Palo Alto, CA, USA
C
Chen Luo
Amazon.com Inc, Palo Alto, CA, USA
Y
Yang Li
Amazon.com Inc, Palo Alto, CA, USA
Z
Zheng Li
Amazon.com Inc, Palo Alto, CA, USA
Q
Qingyu Yin
Amazon.com Inc, Palo Alto, CA, USA
Bing Yin
Bing Yin
Amazon.com
NLPInformation RetrievalDeep LearningKnowledge Graphs
Yangqiu Song
Yangqiu Song
HKUST
Artificial IntelligenceData MiningNatural Language ProcessingKnowledge GraphsCommonsense Reasoning