Request-Only Optimization for Recommendation Systems

📅 2025-07-24

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

To address the low storage and training efficiency of large-scale deep learning recommendation models (DLRMs) under trillion-parameter scales and massive log data, this paper proposes a “request-level optimization paradigm,” the first to treat user requests—not individual samples—as the fundamental training unit. This paradigm jointly optimizes data, model, and system through request-granular data storage, intrinsic feature deduplication, communication compression, and a request-aware neural network architecture. Compared to conventional sample-level training, it significantly reduces log redundancy and storage overhead, enables emerging architectures such as generative recommendation, improves model quality in trillion-FLOP-scale scenarios, and cuts training resource consumption by over 30%.

Technology Category

Application Category

📝 Abstract

Deep Learning Recommendation Models (DLRMs) represent one of the largest machine learning applications on the planet. Industry-scale DLRMs are trained with petabytes of recommendation data to serve billions of users every day. To utilize the rich user signals in the long user history, DLRMs have been scaled up to unprecedented complexity, up to trillions of floating-point operations (TFLOPs) per example. This scale, coupled with the huge amount of training data, necessitates new storage and training algorithms to efficiently improve the quality of these complex recommendation systems. In this paper, we present a Request-Only Optimizations (ROO) training and modeling paradigm. ROO simultaneously improves the storage and training efficiency as well as the model quality of recommendation systems. We holistically approach this challenge through co-designing data (i.e., request-only data), infrastructure (i.e., request-only based data processing pipeline), and model architecture (i.e., request-only neural architectures). Our ROO training and modeling paradigm treats a user request as a unit of the training data. Compared with the established practice of treating a user impression as a unit, our new design achieves native feature deduplication in data logging, consequently saving data storage. Second, by de-duplicating computations and communications across multiple impressions in a request, this new paradigm enables highly scaled-up neural network architectures to better capture user interest signals, such as Generative Recommenders (GRs) and other request-only friendly architectures.

Problem

Research questions and friction points this paper is trying to address.

Optimizes storage and training for large-scale recommendation systems

Enhances model quality with request-only data and architectures

Reduces computational overhead by deduplicating features and operations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Request-Only Optimizations (ROO) paradigm

Request-only data and processing pipeline

Request-only neural architectures for efficiency

🔎 Similar Papers

A Comprehensive Survey on Retrieval Methods in Recommender Systems