MTGR: Industrial-Scale Generative Recommendation Framework in Meituan

📅 2025-05-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing generative recommendation methods sacrifice high-performing cross features for scalability, leading to performance degradation that cannot be mitigated by scaling model size. This paper proposes MTGR, a generative recommendation framework designed for industrial-scale deployment, which—uniquely among generative architectures—fully preserves the high-accuracy cross features of traditional deep models (e.g., DLRM). Methodologically, MTGR introduces Group-Layer Normalization (GLN) to model multi-semantic embedding spaces and a dynamic masking mechanism to prevent information leakage; it further integrates user-level compression, GLN-based normalization, and efficient training optimizations built upon the HSTU architecture. Experiments demonstrate a 65× reduction in per-sample forward inference FLOPs, state-of-the-art offline and online metrics over the past two years, and full-scale deployment across Meituan’s primary traffic scenarios.

Technology Category

Application Category

📝 Abstract
Scaling law has been extensively validated in many domains such as natural language processing and computer vision. In the recommendation system, recent work has adopted generative recommendations to achieve scalability, but their generative approaches require abandoning the carefully constructed cross features of traditional recommendation models. We found that this approach significantly degrades model performance, and scaling up cannot compensate for it at all. In this paper, we propose MTGR (Meituan Generative Recommendation) to address this issue. MTGR is modeling based on the HSTU architecture and can retain the original deep learning recommendation model (DLRM) features, including cross features. Additionally, MTGR achieves training and inference acceleration through user-level compression to ensure efficient scaling. We also propose Group-Layer Normalization (GLN) to enhance the performance of encoding within different semantic spaces and the dynamic masking strategy to avoid information leakage. We further optimize the training frameworks, enabling support for our models with 10 to 100 times computational complexity compared to the DLRM, without significant cost increases. MTGR achieved 65x FLOPs for single-sample forward inference compared to the DLRM model, resulting in the largest gain in nearly two years both offline and online. This breakthrough was successfully deployed on Meituan, the world's largest food delivery platform, where it has been handling the main traffic.
Problem

Research questions and friction points this paper is trying to address.

Retaining cross features in generative recommendation models
Accelerating training and inference for large-scale systems
Enhancing performance without significant cost increases
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retains DLRM features including cross features
Uses user-level compression for acceleration
Implements Group-Layer Normalization for better encoding
🔎 Similar Papers
Ruidong Han
Ruidong Han
Meituan
recommender systemgenerative model
Bin Yin
Bin Yin
Meituan, Beijing, China
S
Shangyu Chen
Meituan, Beijing, China
H
He Jiang
Meituan, Beijing, China
F
Fei Jiang
Meituan, Beijing, China
X
Xiang Li
Meituan, Beijing, China
C
Chi Ma
Meituan, Beijing, China
M
Mincong Huang
Meituan, Beijing, China
Xiaoguang Li
Xiaoguang Li
Noah's Ark Lab,HUAWEI
Question AnsweringInformation RetrievalDialogue Systems
C
Chunzhen Jing
Meituan, Beijing, China
Y
Yueming Han
Meituan, Beijing, China
M
Menglei Zhou
Meituan, Beijing, China
L
Lei Yu
Meituan, Beijing, China
Chuan Liu
Chuan Liu
University of Rochester
W
Wei Lin
Meituan, Beijing, China