🤖 AI Summary
This work addresses the issue of representational rank collapse in large-scale advertising recommendation systems, where increasing model parameters often fails to enhance expressiveness due to rank degradation in deep models. To mitigate this, the authors propose RankUp, an architecture that constructs a high-rank representation space through stochastic permutation of sparse features, a multi-embedding paradigm, global token fusion, cross-pretrained embeddings, and decoupling of task-specific tokens. Deployed in WeChat’s advertising systems—including Channels, Official Accounts, and Moments—RankUp consistently improves performance, yielding GMV gains of 3.41%, 4.81%, and 2.21%, respectively. These results demonstrate that the model’s representational capacity scales effectively with size, validating the efficacy of the proposed approach in real-world industrial applications.
📝 Abstract
The scaling laws for recommender systems have been increasingly validated, where MetaFormer-based architectures consistently benefit from increased model depth, hidden dimensionality, and user behavior sequence length. However, whether representation capacity scales proportionally with parameter growth remains largely unexplored. Prior studies on RankMixer reveal that the effective rank of token representations exhibits a damped oscillatory trajectory across layers, failing to increase consistently with depth and even degrading in deeper layers. Motivated by this observation, we propose \textbf{RankUp}, an architecture designed to mitigate representation collapse and enhance expressive capacity through randomized permutation splitting over sparse features, a multi-embedding paradigm, global token integration, crossed pretrained embedding tokens and task-specific token decoupling. RankUp has been fully deployed in large-scale production across Weixin Video Accounts, Official Accounts and Moments, yielding GMV improvements of 3.41\%, 4.81\% and 2.21\%, respectively.