🤖 AI Summary
In sequential recommendation, Transformer-based self-attention exhibits a low-pass filtering effect, impairing modeling of high-frequency short-term user interests. This work systematically reproduces and analyzes BSARec, introducing a user historical frequency quantification metric. Through frequency-domain analysis, frequency-binned evaluation, and comparative signal processing (FFT vs. DWT), we dissect the actual contributions of its components. Key findings: (1) Non-uniform dynamic padding significantly boosts performance and constitutes the primary source of high-frequency modeling gains; (2) The frequency-domain rescaling module is redundant—residual connections alone achieve comparable effectiveness; (3) DWT yields only marginal improvement over FFT, failing to justify its increased DSP complexity. Experiments across multiple benchmarks confirm BSARec’s superiority over baselines and clarify an effective pathway for high-frequency modeling: optimizing input representations is more impactful than designing intricate frequency-domain architectures.
📝 Abstract
In sequential recommendation (SR), the self-attention mechanism of Transformer-based models acts as a low-pass filter, limiting their ability to capture high-frequency signals that reflect short-term user interests. To overcome this, BSARec augments the Transformer encoder with a frequency layer that rescales high-frequency components using the Fourier transform. However, the overall effectiveness of BSARec and the roles of its individual components have yet to be systematically validated. We reproduce BSARec and show that it outperforms other SR methods on some datasets. To empirically assess whether BSARec improves performance on high-frequency signals, we propose a metric to quantify user history frequency and evaluate SR methods across different user groups. We compare digital signal processing (DSP) techniques and find that the discrete wavelet transform (DWT) offer only slight improvements over Fourier transforms, and DSP methods provide no clear advantage over simple residual connections. Finally, we explore padding strategies and find that non-constant padding significantly improves recommendation performance, whereas constant padding hinders the frequency rescaler's ability to capture high-frequency signals.