🤖 AI Summary
Existing sequential recommendation methods suffer from two key limitations: (1) low efficiency due to the quadratic computational complexity of Transformer-based models, and (2) insufficient multi-scale modeling—Mamba struggles to capture behavioral periodicity, suffers from semantic sparsity in user–item interactions, and exhibits weak multimodal feature fusion. To address these challenges, we propose FreqMamba, the first Mamba-based architecture integrating Fast Fourier Transform (FFT) for explicit frequency-domain modeling of user behavioral periodicity. We further incorporate large language model (LLM)-derived text embeddings to mitigate interaction sparsity and design a learnable gating mechanism for adaptive fusion of temporal, spectral, and semantic features. Extensive experiments demonstrate that FreqMamba achieves a 3.2% absolute improvement in Hit Rate@10 over state-of-the-art Mamba baselines across multiple benchmark datasets, while attaining 1.2× faster inference speed than Transformers. FreqMamba establishes new state-of-the-art performance in sequential recommendation.
📝 Abstract
Sequential recommendation systems aim to predict users' next preferences based on their interaction histories, but existing approaches face critical limitations in efficiency and multi-scale pattern recognition. While Transformer-based methods struggle with quadratic computational complexity, recent Mamba-based models improve efficiency but fail to capture periodic user behaviors, leverage rich semantic information, or effectively fuse multimodal features. To address these challenges, we propose model, a novel sequential recommendation framework that integrates multi-scale Mamba with Fourier analysis, Large Language Models (LLMs), and adaptive gating. First, we enhance Mamba with Fast Fourier Transform (FFT) to explicitly model periodic patterns in the frequency domain, separating meaningful trends from noise. Second, we incorporate LLM-based text embeddings to enrich sparse interaction data with semantic context from item descriptions. Finally, we introduce a learnable gate mechanism to dynamically balance temporal (Mamba), frequency (FFT), and semantic (LLM) features, ensuring harmonious multimodal fusion. Extensive experiments demonstrate that model achieves state-of-the-art performance, improving Hit Rate@10 by 3.2% over existing Mamba-based models while maintaining 20% faster inference than Transformer baselines. Our results highlight the effectiveness of combining frequency analysis, semantic understanding, and adaptive fusion for sequential recommendation. Code and datasets are available at: https://anonymous.4open.science/r/M2Rec.