🤖 AI Summary
This work addresses the limitations of existing generative recommendation methods, which overly rely on alignment strategies in multimodal fusion and struggle to capture emergent item semantics arising from cross-modal interactions. To overcome this, the authors propose SynGR, a novel framework that introduces an explicit cross-modal synergy mechanism into generative recommendation, thereby transcending conventional alignment paradigms. Built upon a sequence-to-sequence architecture, SynGR employs a synergistic constraint mechanism to dynamically modulate the contribution of each modality, mitigating over-reliance on dominant modalities and enabling the learning of fine-grained emergent semantics beyond shared or modality-specific signals. Extensive experiments demonstrate that SynGR significantly outperforms state-of-the-art methods across three benchmark datasets, confirming the effectiveness and superiority of cross-modal synergistic modeling in enhancing user preference representation and recommendation performance.
📝 Abstract
Generative Recommendation (GR) has emerged as a promising paradigm by formulating item recommendation as a sequence-to-sequence generation task over item identifiers. Recent studies have incorporated multimodal signals to provide richer token-level evidence for generation. However, existing approaches largely rely on alignment-centric fusion and underexplore synergistic information across modalities. In practice, synergistic information plays a critical role in capturing emergent item properties that cannot be inferred from any single modality alone. Such properties encode intrinsic item semantics and guide user preferences, enabling models to move beyond surface-level feature matching. To address this limitation, we propose \textbf{SynGR}, a synergistic generative recommendation framework that explicitly encourages the exploitation of cross-modal dependencies during generation. By constraining overreliance on dominant modalities, SynGR enables the model to capture emergent item semantics beyond shared or modality-specific signals. Extensive experiments across three benchmark datasets demonstrate that SynGR achieves superior performance.