LEMUR: Large scale End-to-end MUltimodal Recommendation

πŸ“… 2025-11-14
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the cold-start problem and poor generalization in traditional ID-based recommendation systems, as well as objective misalignment and inflexibility in dynamic updates caused by two-stage training in industrial-scale multimodal recommendation, this paper proposes the first end-to-end large-scale recommendation framework operating directly on raw multimodal data. Our method jointly optimizes multimodal representation learning and recommendation objectives to eliminate stage-wise decoupling; introduces an incremental differentiable memory bank for efficient accumulation and real-time updating of multimodal sequential representations; and integrates deep multimodal encoding, sequential modeling, and memory-augmented mechanisms. Deployed on Douyin’s search platform, the framework achieves a 0.843% reduction in query rewriting rate, a 0.81% improvement in QAUC, and significant gains in core offline advertising metrics.

Technology Category

Application Category

πŸ“ Abstract
Traditional ID-based recommender systems often struggle with cold-start and generalization challenges. Multimodal recommendation systems, which leverage textual and visual data, offer a promising solution to mitigate these issues. However, existing industrial approaches typically adopt a two-stage training paradigm: first pretraining a multimodal model, then applying its frozen representations to train the recommendation model. This decoupled framework suffers from misalignment between multimodal learning and recommendation objectives, as well as an inability to adapt dynamically to new data. To address these limitations, we propose LEMUR, the first large-scale multimodal recommender system trained end-to-end from raw data. By jointly optimizing both the multimodal and recommendation components, LEMUR ensures tighter alignment with downstream objectives while enabling real-time parameter updates. Constructing multimodal sequential representations from user history often entails prohibitively high computational costs. To alleviate this bottleneck, we propose a novel memory bank mechanism that incrementally accumulates historical multimodal representations throughout the training process. After one month of deployment in Douyin Search, LEMUR has led to a 0.843% reduction in query change rate decay and a 0.81% improvement in QAUC. Additionally, LEMUR has shown significant gains across key offline metrics for Douyin Advertisement. Our results validate the superiority of end-to-end multimodal recommendation in real-world industrial scenarios.
Problem

Research questions and friction points this paper is trying to address.

Addressing cold-start and generalization issues in ID-based recommenders
Resolving misalignment between multimodal learning and recommendation objectives
Reducing computational costs of constructing multimodal sequential representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-end training of multimodal recommendation system
Joint optimization of multimodal and recommendation components
Memory bank mechanism for incremental historical representations
πŸ”Ž Similar Papers
No similar papers found.
Xintian Han
Xintian Han
ByteDance
Machine Learning
H
Honggang Chen
ByteDance, Beijing, China
Q
Quan Lin
ByteDance, Beijing, China
J
Jingyue Gao
ByteDance, Beijing, China
X
Xiangyuan Ren
ByteDance, Beijing, China
L
Lifei Zhu
ByteDance, Hangzhou, China
Zhisheng Ye
Zhisheng Ye
PhD @ School of Computer Science, Peking University
Distributed SystemsResource managementLarge Language Models
S
Shikang Wu
ByteDance, Beijing, China
X
Xionghang Xie
ByteDance, Beijing, China
X
Xiaochu Gan
ByteDance, Beijing, China
B
Bingzheng Wei
ByteDance, Beijing, China
P
Peng Xu
ByteDance, San Jose, USA
Z
Zhe Wang
ByteDance, Beijing, China
Y
Yuchao Zheng
ByteDance, Hangzhou, China
J
Jingjian Lin
ByteDance, Beijing, China
D
Di Wu
ByteDance, Beijing, China
J
Junfeng Ge
ByteDance, Beijing, China