Bending the Scaling Law Curve in Large-Scale Recommendation Systems

📅 2026-02-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of high computational complexity and limited expressiveness of self-attention in traditional sequential recommendation models when handling long sequences and deep architectures. The authors propose ULTRA-HSTU, an end-to-end co-designed model-system framework that innovatively integrates input sequence structuring, sparse attention mechanisms, and efficient model topology restructuring. This approach preserves the expressive power of self-attention while substantially overcoming scalability bottlenecks in both training and inference. Experimental results demonstrate over 5× faster training and 21× accelerated inference compared to existing methods. Upon online deployment, the system consistently achieves a 4%–8% increase in user consumption and engagement metrics.

Technology Category

Application Category

📝 Abstract
Learning from user interaction history through sequential models has become a cornerstone of large-scale recommender systems. Recent advances in large language models have revealed promising scaling laws, sparking a surge of research into long-sequence modeling and deeper architectures for recommendation tasks. However, many recent approaches rely heavily on cross-attention mechanisms to address the quadratic computational bottleneck in sequential modeling, which can limit the representational power gained from self-attention. We present ULTRA-HSTU, a novel sequential recommendation model developed through end-to-end model and system co-design. By innovating in the design of input sequences, sparse attention mechanisms, and model topology, ULTRA-HSTU achieves substantial improvements in both model quality and efficiency. Comprehensive benchmarking demonstrates that ULTRA-HSTU achieves remarkable scaling efficiency gains -- over 5x faster training scaling and 21x faster inference scaling compared to conventional models -- while delivering superior recommendation quality. Our solution is fully deployed at scale, serving billions of users daily and driving significant 4% to 8% consumption and engagement improvements in real-world production environments.
Problem

Research questions and friction points this paper is trying to address.

scaling laws
sequential recommendation
computational bottleneck
attention mechanisms
large-scale recommender systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

sparse attention
sequence modeling
model-system co-design
scaling laws
large-scale recommendation
🔎 Similar Papers
No similar papers found.
Q
Qin Ding
Meta Recommendation Systems
K
Kevin Course
Meta Recommendation Systems
Linjian Ma
Linjian Ma
Research scientist, Meta Platforms, Inc.
Numerical AlgorithmsTensorsQuantum SimulationHigh Performance ComputingMachine Learning
Jianhui Sun
Jianhui Sun
University of Virginia
Data MiningOptimizationDeep Learning
R
Rouchen Liu
Meta Recommendation Systems
Z
Zhao Zhu
Meta Recommendation Systems
C
Chunxing Yin
Meta Recommendation Systems
Wei Li
Wei Li
Huawei Noah‘s Ark Lab
Low-level VisionComputer VisionAIGC
D
Dai Li
Meta Recommendation Systems
Y
Yu Shi
Meta Recommendation Systems
X
Xuan Cao
Meta Recommendation Systems
Z
Ze Yang
Meta Recommendation Systems
H
Han Li
Meta Recommendation Systems
Xing Liu
Xing Liu
Research Scientist, Meta Platforms, Inc.
recommendation systemsmachine learningartificial intelligencehigh-performance computing
B
Bi Xue
Meta Recommendation Systems
H
Hongwei Li
Meta Recommendation Systems
R
Rui Jian
Meta Recommendation Systems
D
Daisy Shi He
Meta Recommendation Systems
J
Jing Qian
Meta Recommendation Systems
M
Matt Ma
Meta Recommendation Systems
Q
Qunshu Zhang
Meta Recommendation Systems
Rui Li
Rui Li
Meta
searchdata miningmachine learningdatabaserecommendation systems