๐ค AI Summary
This work addresses the inefficiency of traditional attention mechanisms in processing long user behavior sequences due to their quadratic complexity, as well as the limitations of existing linear attention approachesโsuch as inadequate temporal signal modeling, weak positional encoding, and shallow architectures. To overcome these challenges, we propose FuXi-Linear, a linear-complexity recommendation model that decouples temporal and semantic signals, integrates a time-preserving channel and a linear positional channel, and incorporates a learnable positional kernel function to significantly enhance temporal modeling and positional awareness. FuXi-Linear is the first linear recommendation model to demonstrate power-law scaling on sequences of thousand-scale lengths, outperforming state-of-the-art methods on sequences spanning several thousand items while achieving 10ร and 21ร speedups during prefill and decoding stages, respectively, alongside improved recommendation accuracy and inference efficiency.
๐ Abstract
Modern recommendation systems primarily rely on attention mechanisms with quadratic complexity, which limits their ability to handle long user sequences and slows down inference. While linear attention is a promising alternative, existing research faces three critical challenges: (1) temporal signals are often overlooked or integrated via naive coupling that causes mutual interference between temporal and semantic signals while neglecting behavioral periodicity; (2) insufficient positional information provided by existing linear frameworks; and (3) a primary focus on short sequences and shallow architectures. To address these issues, we propose FuXi-Linear, a linear-complexity model designed for efficient long-sequence recommendation. Our approach introduces two key components: (1) a Temporal Retention Channel that independently computes periodic attention weights using temporal data, preventing crosstalk between temporal and semantic signals; (2) a Linear Positional Channel that integrates positional information through learnable kernels within linear complexity. Moreover, we demonstrate that FuXi-Linear exhibits a robust power-law scaling property at a thousand-length scale, a characteristic largely unexplored in prior linear recommendation studies. Extensive experiments on sequences of several thousand tokens demonstrate that FuXi-Linear outperforms state-of-the-art models in recommendation quality, while achieving up to 10$\times$ speedup in the prefill stage and up to 21$\times$ speedup in the decode stage compared to competitive baselines. Our code has been released in a public repository https://github.com/USTC-StarTeam/fuxi-linear.