Exploiting ID-Text Complementarity via Ensembling for Sequential Recommendation

📅 2025-12-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing sequential recommendation methods exhibit an inadequate understanding of the relationship between item IDs and modality-based features (e.g., textual descriptions): some assume modality embeddings can fully replace ID embeddings, while others rely on complex alignment mechanisms or multi-stage joint training—both overlooking the intrinsic complementarity between ID and modality signals. This paper is the first to systematically demonstrate the strong complementarity of ID and textual features in sequential recommendation. We propose a novel “independent modeling + lightweight integration” paradigm: ID-based models (e.g., SASRec) and pre-trained text encoders (e.g., BERT) are trained separately, and their outputs are fused via learnable weighted integration. Our approach eliminates intricate architectural designs and explicit alignment modules. Extensive experiments on multiple public benchmarks show consistent and significant improvements over state-of-the-art ID–text joint modeling methods, achieving new SOTA performance. Results empirically validate that both ID and textual signals are indispensable and that their synergistic gains can be efficiently realized.

Technology Category

Application Category

📝 Abstract
Modern Sequential Recommendation (SR) models commonly utilize modality features to represent items, motivated in large part by recent advancements in language and vision modeling. To do so, several works completely replace ID embeddings with modality embeddings, claiming that modality embeddings render ID embeddings unnecessary because they can match or even exceed ID embedding performance. On the other hand, many works jointly utilize ID and modality features, but posit that complex fusion strategies, such as multi-stage training and/or intricate alignment architectures, are necessary for this joint utilization. However, underlying both these lines of work is a lack of understanding of the complementarity of ID and modality features. In this work, we address this gap by studying the complementarity of ID- and text-based SR models. We show that these models do learn complementary signals, meaning that either should provide performance gain when used properly alongside the other. Motivated by this, we propose a new SR method that preserves ID-text complementarity through independent model training, then harnesses it through a simple ensembling strategy. Despite this method's simplicity, we show it outperforms several competitive SR baselines, implying that both ID and text features are necessary to achieve state-of-the-art SR performance but complex fusion architectures are not.
Problem

Research questions and friction points this paper is trying to address.

Explores complementarity between ID and text features in sequential recommendation
Proposes a simple ensembling method to leverage this complementarity effectively
Shows both features are necessary for state-of-the-art performance without complex fusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Ensembling ID and text models for complementarity
Independent training preserves ID-text feature signals
Simple fusion outperforms complex alignment architectures
🔎 Similar Papers
No similar papers found.