🤖 AI Summary
To address the limited expressiveness in sequential recommendation caused by coarse-grained spatiotemporal modeling and the disconnection between explicit and implicit feature interactions, this paper proposes a novel Transformer architecture designed for large-scale industrial scenarios. Methodologically, it introduces: (1) an adaptive multi-channel self-attention mechanism that decouples temporal, positional, and semantic feature modeling; (2) a multi-stage feed-forward network (FFN) to enhance implicit high-order feature interactions; and (3) scalable design enabling consistent performance gains with increasing parameter count—first demonstrated in sequential recommendation. Offline experiments show significant improvements over state-of-the-art methods. Online A/B testing on Huawei Music’s mobile application demonstrates practical efficacy: average plays per user increase by 4.76%, and average listening duration rises by 5.10%, validating both technical effectiveness and industrial deployability.
📝 Abstract
Inspired by scaling laws and large language models, research on large-scale recommendation models has gained significant attention. Recent advancements have shown that expanding sequential recommendation models to large-scale recommendation models can be an effective strategy. Current state-of-the-art sequential recommendation models primarily use self-attention mechanisms for explicit feature interactions among items, while implicit interactions are managed through Feed-Forward Networks (FFNs). However, these models often inadequately integrate temporal and positional information, either by adding them to attention weights or by blending them with latent representations, which limits their expressive power. A recent model, HSTU, further reduces the focus on implicit feature interactions, constraining its performance. We propose a new model called FuXi-$alpha$ to address these issues. This model introduces an Adaptive Multi-channel Self-attention mechanism that distinctly models temporal, positional, and semantic features, along with a Multi-stage FFN to enhance implicit feature interactions. Our offline experiments demonstrate that our model outperforms existing models, with its performance continuously improving as the model size increases. Additionally, we conducted an online A/B test within the Huawei Music app, which showed a $4.76%$ increase in the average number of songs played per user and a $5.10%$ increase in the average listening duration per user. Our code has been released at https://github.com/USTC-StarTeam/FuXi-alpha.