SynGR: Unleashing the Potential of Cross-Modal Synergy for Generative Recommendation

📅 2026-05-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

182K/year
🤖 AI Summary
This work addresses the limitations of existing generative recommendation methods, which overly rely on alignment strategies in multimodal fusion and struggle to capture emergent item semantics arising from cross-modal interactions. To overcome this, the authors propose SynGR, a novel framework that introduces an explicit cross-modal synergy mechanism into generative recommendation, thereby transcending conventional alignment paradigms. Built upon a sequence-to-sequence architecture, SynGR employs a synergistic constraint mechanism to dynamically modulate the contribution of each modality, mitigating over-reliance on dominant modalities and enabling the learning of fine-grained emergent semantics beyond shared or modality-specific signals. Extensive experiments demonstrate that SynGR significantly outperforms state-of-the-art methods across three benchmark datasets, confirming the effectiveness and superiority of cross-modal synergistic modeling in enhancing user preference representation and recommendation performance.
📝 Abstract
Generative Recommendation (GR) has emerged as a promising paradigm by formulating item recommendation as a sequence-to-sequence generation task over item identifiers. Recent studies have incorporated multimodal signals to provide richer token-level evidence for generation. However, existing approaches largely rely on alignment-centric fusion and underexplore synergistic information across modalities. In practice, synergistic information plays a critical role in capturing emergent item properties that cannot be inferred from any single modality alone. Such properties encode intrinsic item semantics and guide user preferences, enabling models to move beyond surface-level feature matching. To address this limitation, we propose \textbf{SynGR}, a synergistic generative recommendation framework that explicitly encourages the exploitation of cross-modal dependencies during generation. By constraining overreliance on dominant modalities, SynGR enables the model to capture emergent item semantics beyond shared or modality-specific signals. Extensive experiments across three benchmark datasets demonstrate that SynGR achieves superior performance.
Problem

Research questions and friction points this paper is trying to address.

Generative Recommendation
Cross-Modal Synergy
Multimodal Fusion
Emergent Semantics
Item Representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-modal synergy
generative recommendation
synergistic information
multimodal fusion
emergent semantics
🔎 Similar Papers
No similar papers found.
W
Wei Chen
School of Artificial Intelligence, Beihang University, Beijing, China
X
Xingyu Guo
School of Artificial Intelligence, Beihang University, Beijing, China
S
Shuang Li
School of Artificial Intelligence, Beihang University, Beijing, China
F
Fuwei Zhang
School of Artificial Intelligence, Beihang University, Beijing, China
Meng Yuan
Meng Yuan
Marie Skłodowska-Curie Fellow, Chalmers University of Technology
MechatronicsEnergy systemModel predictive controlRobotics
Jing Fan
Jing Fan
Research Assistant, Vanderbilt University
Human Robot InteractionBrain Computer InterfaceArtificial IntelligenceMachine LearningVirtual Reality
Z
Zhao Zhang
School of Computer Science and Engineering, Beihang University, Beijing, China
D
Deqing Wang
School of Computer Science and Engineering, Beihang University, Beijing, China
F
Fuzhen Zhuang
School of Artificial Intelligence, Beihang University, Beijing, China