🤖 AI Summary
This work addresses the challenge of inference latency caused by modeling high-dimensional sparse feature interactions in billion-scale live-streaming recommendation systems. To this end, we propose the Zenith architecture, which tokenizes high-dimensional features and introduces two key components—Token Fusion and Token Boost—to efficiently identify and prioritize a small set of critical features (termed Prime Tokens). By enhancing token heterogeneity, Zenith improves model performance while maintaining controlled inference costs. The approach significantly advances model scaling laws and demonstrates strong empirical gains: after deployment on TikTok Live, it achieves a 1.05% increase in CTR AUC, a 1.10% reduction in Logloss, and boosts both the number and duration of high-quality viewing sessions by 9.93% and 8.11%, respectively.
📝 Abstract
Accurately capturing feature interactions is essential in recommender systems, and recent trends show that scaling up model capacity could be a key driver for next-level predictive performance. While prior work has explored various model architectures to capture multi-granularity feature interactions, relatively little attention has been paid to efficient feature handling and scaling model capacity without incurring excessive inference latency. In this paper, we address this by presenting Zenith, a scalable and efficient ranking architecture that learns complex feature interactions with minimal runtime overhead. Zenith is designed to handle a few high-dimensional Prime Tokens with Token Fusion and Token Boost modules, which exhibits superior scaling laws compared to other state-of-the-art ranking methods, thanks to its improved token heterogeneity. Its real-world effectiveness is demonstrated by deploying the architecture to TikTok Live, a leading online livestreaming platform that attracts billions of users globally. Our A/B test shows that Zenith achieves +1.05%/-1.10% in online CTR AUC and Logloss, and realizes +9.93% gains in Quality Watch Session / User and +8.11% in Quality Watch Duration / User.