Kunlun: Establishing Scaling Laws for Massive-Scale Recommendation Systems through Unified Architecture Design

📅 2026-02-10

📈 Citations: 0

✨ Influential: 0

career value

241K/year

🤖 AI Summary

This work addresses the lack of predictable scaling laws in current recommender systems, which is primarily hindered by computational inefficiency and suboptimal resource allocation—particularly when processing user history and contextual features. To overcome these limitations, the authors propose a unified architecture that synergistically combines efficient low-level modules, including Generalized Dot-Product Attention (GDPA), Hierarchical Seed Pooling (HSP), and sliding-window attention, with high-level strategies such as Computation Skip and event-level personalization. This co-design significantly enhances FLOPs utilization and scaling efficiency. Evaluated on NVIDIA B200 GPUs, the approach improves FLOPs utilization from 17% to 37% and doubles scaling efficiency compared to existing methods. The proposed techniques have been deployed in Meta’s core advertising models, yielding substantial production gains.

Technology Category

Application Category

📝 Abstract

Deriving predictable scaling laws that govern the relationship between model performance and computational investment is crucial for designing and allocating resources in massive-scale recommendation systems. While such laws are established for large language models, they remain challenging for recommendation systems, especially those processing both user history and context features. We identify poor scaling efficiency as the main barrier to predictable power-law scaling, stemming from inefficient modules with low Model FLOPs Utilization (MFU) and suboptimal resource allocation. We introduce Kunlun, a scalable architecture that systematically improves model efficiency and resource allocation. Our low-level optimizations include Generalized Dot-Product Attention (GDPA), Hierarchical Seed Pooling (HSP), and Sliding Window Attention. Our high-level innovations feature Computation Skip (CompSkip) and Event-level Personalization. These advances increase MFU from 17% to 37% on NVIDIA B200 GPUs and double scaling efficiency over state-of-the-art methods. Kunlun is now deployed in major Meta Ads models, delivering significant production impact.

Problem

Research questions and friction points this paper is trying to address.

scaling laws

recommendation systems

computational investment

model performance

resource allocation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Scaling Laws

Model FLOPs Utilization

Unified Architecture

Computation Skip

Event-level Personalization

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Research Engineer, Monetization AI