🤖 AI Summary
This work addresses the limitations of existing sequential recommendation methods, which often neglect item textual information and suffer from dimensional collapse and insufficient spectral utilization when leveraging large language model (LLM)-based embeddings. To overcome these issues, the authors propose a spectral-domain adapter that employs a Transformer to perform attention-based selection and aggregation across the full spectrum of frequency components. A learnable spectral positional encoding is introduced as an inductive bias to guide the model toward critical spectral regions and enhance embedding diversity. By integrating singular value decomposition with a spectral attention mechanism, the method enables adaptive transformation of high-dimensional semantic embeddings. Evaluated on four real-world datasets and integrated with three mainstream sequential recommendation models, the approach achieves an average performance improvement of 9.17%, significantly outperforming strong existing baselines.
📝 Abstract
Traditional sequential recommendation (SR) models learn low-dimensional item ID embeddings from user-item interactions, often overlooking textual information such as item titles or descriptions. Recent advances in Large Language Models (LLMs) have inspired a surge of research that encodes item textual information with high-dimensional semantic embeddings, and designs transformation methods to inject such embeddings into SR models. These embedding transformation strategies can be categorized into two types, both of which exhibits notable drawbacks: 1) adapter-based methods suffer from pronounced dimension collapse, concentrating information into a few dominant dimensions; 2) SVD-based methods are rigid and manual, considering only a few principal spectral components while discarding rich information in the remaining spectrum. To address these limitations, we propose SpecTran, a spectral-aware transformer-based adapter that operates in the spectral domain, attending to the full spectrum to select and aggregates informative components. A learnable spectral-position encoding injects singular-value cues as an inductive bias, guiding attention toward salient spectral components and promoting diversity across embedding dimensions. Across four real-world datasets and three SR backbones, it consistently outperforms strong baselines, achieving an average improvement of 9.17%.