🤖 AI Summary
This work addresses the lack of predictable scaling laws in current recommender systems, which is primarily hindered by computational inefficiency and suboptimal resource allocation—particularly when processing user history and contextual features. To overcome these limitations, the authors propose a unified architecture that synergistically combines efficient low-level modules, including Generalized Dot-Product Attention (GDPA), Hierarchical Seed Pooling (HSP), and sliding-window attention, with high-level strategies such as Computation Skip and event-level personalization. This co-design significantly enhances FLOPs utilization and scaling efficiency. Evaluated on NVIDIA B200 GPUs, the approach improves FLOPs utilization from 17% to 37% and doubles scaling efficiency compared to existing methods. The proposed techniques have been deployed in Meta’s core advertising models, yielding substantial production gains.
📝 Abstract
Deriving predictable scaling laws that govern the relationship between model performance and computational investment is crucial for designing and allocating resources in massive-scale recommendation systems. While such laws are established for large language models, they remain challenging for recommendation systems, especially those processing both user history and context features. We identify poor scaling efficiency as the main barrier to predictable power-law scaling, stemming from inefficient modules with low Model FLOPs Utilization (MFU) and suboptimal resource allocation. We introduce Kunlun, a scalable architecture that systematically improves model efficiency and resource allocation. Our low-level optimizations include Generalized Dot-Product Attention (GDPA), Hierarchical Seed Pooling (HSP), and Sliding Window Attention. Our high-level innovations feature Computation Skip (CompSkip) and Event-level Personalization. These advances increase MFU from 17% to 37% on NVIDIA B200 GPUs and double scaling efficiency over state-of-the-art methods. Kunlun is now deployed in major Meta Ads models, delivering significant production impact.