Versioned Late Materialization for Ultra-Long Sequence Training in Recommendation Systems at Scale

πŸ“… 2026-04-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

207K/year
πŸ€– AI Summary
This work addresses the data redundancy, storage overhead, and I/O bottlenecks caused by ultra-long user interaction histories in recommendation systems under the β€œFat Row” paradigm, particularly the resource inefficiencies in multi-tenant environments. To tackle these challenges, the authors propose a versioned lazy materialization paradigm that stores interaction histories in an immutable, normalized layer and reconstructs sequences on-the-fly during training via lightweight version pointers. The approach integrates decoupled preprocessing, pipelined I/O prefetching, and data affinity optimizations to ensure GPU-compute-bound training. It further introduces novel support for multi-dimensional projection pushdown and a dual-channel protocol to prevent future leakage, thereby eliminating redundancy while preserving online-offline consistency. Deployed in production, this method significantly reduces infrastructure costs, enables training with longer sequences, and improves model quality.
πŸ“ Abstract
Modern Deep Learning Recommendation Models (DLRMs) follow scaling laws with sequence length, driving the frontier toward ultra-long User Interaction History (UIH). However, the industry-standard "Fat Row" paradigm, which pre-materializes these sequences into every training example, creates a storage and I/O wall where data infrastructure usage exceeds GPU training capacity due to data redundancy that is amplified in multi-tenant environments where models with vastly different sequence length requirements share a union dataset. We present a \emph{versioned late materialization} paradigm that eliminates this redundancy by storing UIH once in a normalized, immutable tier and reconstructing sequences just-in-time during training via lightweight versioned pointers. The system ensures Online-to-Offline (O2O) consistency through a bifurcated protocol that prevents future leakage across both streaming and batch training, while a read-optimized immutable storage layer provides multi-dimensional projection pushdown for heterogeneous model tenants. Disaggregated data preprocessing with pipelined I/O prefetching and data-affinity optimizations masks the latency of training-time sequence reconstruction, keeping training throughput compute-bound by GPUs. Deployed on production DLRMs, the system reduces training data infrastructure resource usage while enabling aggressive sequence length scaling that delivers significant model quality gains, serving as the foundational data infrastructure for modern recommendation model architectures, including HSTU and ULTRA-HSTU.
Problem

Research questions and friction points this paper is trying to address.

Ultra-long Sequence Training
Data Redundancy
Storage and I/O Bottleneck
Multi-tenant Recommendation Systems
User Interaction History
Innovation

Methods, ideas, or system contributions that make the work stand out.

versioned late materialization
ultra-long sequence training
data redundancy elimination
immutable storage
multi-tenant recommendation systems
πŸ”Ž Similar Papers
L
Liang Guo
Meta Platforms, Inc., Menlo Park, CA, USA
G
Ge Song
Meta Platforms, Inc., Menlo Park, CA, USA
L
Litao Deng
Meta Platforms, Inc., Menlo Park, CA, USA
Jianhui Sun
Jianhui Sun
University of Virginia
Data MiningOptimizationDeep Learning
C
Chufeng Hu
Meta Platforms, Inc., Menlo Park, CA, USA
L
Lu Zhang
Meta Platforms, Inc., Menlo Park, CA, USA
Z
Zhen Ma
Meta Platforms, Inc., Menlo Park, CA, USA
S
Shouwei Chen
Meta Platforms, Inc., Menlo Park, CA, USA
Weiran Liu
Weiran Liu
Staff Security Engineer, Alibaba Group
cryptographydifferential privacymulti-party computation
S
Sarang Masti Sreeshylan
Meta Platforms, Inc., Menlo Park, CA, USA
X
Xiaoxuan Meng
Meta Platforms, Inc., Menlo Park, CA, USA