MTGRBoost: Boosting Large-scale Generative Recommendation Models in Meituan

📅 2025-05-19

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Industrial training systems for generative recommendation models (GRMs) suffer from inefficient sparse embedding updates, GPU load imbalance, and suboptimal embedding lookup performance. To address these challenges, this paper introduces the first efficient and scalable system tailored for large-scale GRM training. Our method features: (1) a dynamic hash table replacing static embedding tables to enable real-time embedding insertion/deletion and low-latency lookup; (2) a dynamic sequence balancing strategy coupled with embedding ID deduplication and automatic table merging to mitigate long-tail distribution effects and redundant parameter updates; and (3) integration of mixed-precision training, gradient accumulation, operator fusion, and fault-tolerant checkpointing. Experiments demonstrate 1.6×–2.4× higher training throughput and near-linear scalability up to 100 GPUs. The system has been deployed in production at Meituan, serving over 100 million inference requests daily.

Technology Category

Application Category

📝 Abstract

Recommendation is crucial for both user experience and company revenue, and generative recommendation models (GRMs) are shown to produce quality recommendations recently. However, existing systems are limited by insufficient functionality support and inefficient implementations for training GRMs in industrial scenarios. As such, we introduce MTGRBoost as an efficient and scalable system for GRM training. Specifically, to handle the real-time insert/delete of sparse embedding entries, MTGRBoost employs dynamic hash tables to replace static tables. To improve efficiency, MTGRBoost conducts dynamic sequence balancing to address the computation load imbalances among GPUs and adopts embedding ID deduplication alongside automatic table merging to accelerate embedding lookup. MTGRBoost also incorporates implementation optimizations including checkpoint resuming, mixed precision training, gradient accumulation, and operator fusion. Extensive experiments show that MTGRBoost improves training throughput by $1.6 imes$ -- $2.4 imes$ while achieving good scalability when running over 100 GPUs. MTGRBoost has been deployed for many applications in Meituan and is now handling hundreds of millions of requests on a daily basis.

Problem

Research questions and friction points this paper is trying to address.

Enhance functionality and efficiency for generative recommendation models

Address real-time sparse embedding entry updates dynamically

Optimize GPU computation balance and embedding lookup speed

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic hash tables for real-time sparse embedding updates

Dynamic sequence balancing to equalize GPU workloads

Embedding ID deduplication with automatic table merging

🔎 Similar Papers

Non-autoregressive Generative Models for Reranking Recommendation