MTGR: Industrial-Scale Generative Recommendation Framework in Meituan

📅 2025-05-24

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Existing generative recommendation methods sacrifice high-performing cross features for scalability, leading to performance degradation that cannot be mitigated by scaling model size. This paper proposes MTGR, a generative recommendation framework designed for industrial-scale deployment, which—uniquely among generative architectures—fully preserves the high-accuracy cross features of traditional deep models (e.g., DLRM). Methodologically, MTGR introduces Group-Layer Normalization (GLN) to model multi-semantic embedding spaces and a dynamic masking mechanism to prevent information leakage; it further integrates user-level compression, GLN-based normalization, and efficient training optimizations built upon the HSTU architecture. Experiments demonstrate a 65× reduction in per-sample forward inference FLOPs, state-of-the-art offline and online metrics over the past two years, and full-scale deployment across Meituan’s primary traffic scenarios.

Technology Category

Application Category

📝 Abstract

Scaling law has been extensively validated in many domains such as natural language processing and computer vision. In the recommendation system, recent work has adopted generative recommendations to achieve scalability, but their generative approaches require abandoning the carefully constructed cross features of traditional recommendation models. We found that this approach significantly degrades model performance, and scaling up cannot compensate for it at all. In this paper, we propose MTGR (Meituan Generative Recommendation) to address this issue. MTGR is modeling based on the HSTU architecture and can retain the original deep learning recommendation model (DLRM) features, including cross features. Additionally, MTGR achieves training and inference acceleration through user-level compression to ensure efficient scaling. We also propose Group-Layer Normalization (GLN) to enhance the performance of encoding within different semantic spaces and the dynamic masking strategy to avoid information leakage. We further optimize the training frameworks, enabling support for our models with 10 to 100 times computational complexity compared to the DLRM, without significant cost increases. MTGR achieved 65x FLOPs for single-sample forward inference compared to the DLRM model, resulting in the largest gain in nearly two years both offline and online. This breakthrough was successfully deployed on Meituan, the world's largest food delivery platform, where it has been handling the main traffic.

Problem

Research questions and friction points this paper is trying to address.

Retaining cross features in generative recommendation models

Accelerating training and inference for large-scale systems

Enhancing performance without significant cost increases

Innovation

Methods, ideas, or system contributions that make the work stand out.

Retains DLRM features including cross features

Uses user-level compression for acceleration

Implements Group-Layer Normalization for better encoding

🔎 Similar Papers

Non-autoregressive Generative Models for Reranking Recommendation