RecGOAT: Graph Optimal Adaptive Transport for LLM-Enhanced Multimodal Recommendation with Dual Semantic Alignment

📅 2026-01-31

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the semantic incompatibility between multimodal representations generated by large language models and sparse ID-based features in recommender systems, which limits recommendation performance. To bridge this gap, the authors propose RecGOAT, a novel framework that leverages graph attention networks to model user–user, user–item, and item–item relationships, thereby enhancing collaborative semantics. RecGOAT introduces, for the first time, a theoretically grounded dual-granularity progressive alignment mechanism that integrates cross-modal contrastive learning with optimal adaptive transport to achieve unified semantic alignment between multimodal and ID features at both instance and distribution levels. Extensive experiments demonstrate that RecGOAT achieves state-of-the-art performance on three public benchmarks and exhibits strong effectiveness and industrial scalability in a large-scale online advertising platform.

Technology Category

Application Category

📝 Abstract

Multimodal recommendation systems typically integrates user behavior with multimodal data from items, thereby capturing more accurate user preferences. Concurrently, with the rise of large models (LMs), multimodal recommendation is increasingly leveraging their strengths in semantic understanding and contextual reasoning. However, LM representations are inherently optimized for general semantic tasks, while recommendation models rely heavily on sparse user/item unique identity (ID) features. Existing works overlook the fundamental representational divergence between large models and recommendation systems, resulting in incompatible multimodal representations and suboptimal recommendation performance. To bridge this gap, we propose RecGOAT, a novel yet simple dual semantic alignment framework for LLM-enhanced multimodal recommendation, which offers theoretically guaranteed alignment capability. RecGOAT first employs graph attention networks to enrich collaborative semantics by modeling item-item, user-item, and user-user relationships, leveraging user/item LM representations and interaction history. Furthermore, we design a dual-granularity progressive multimodality-ID alignment framework, which achieves instance-level and distribution-level semantic alignment via cross-modal contrastive learning (CMCL) and optimal adaptive transport (OAT), respectively. Theoretically, we demonstrate that the unified representations derived from our alignment framework exhibit superior semantic consistency and comprehensiveness. Extensive experiments on three public benchmarks show that our RecGOAT achieves state-of-the-art performance, empirically validating our theoretical insights. Additionally, the deployment on a large-scale online advertising platform confirms the model's effectiveness and scalability in industrial recommendation scenarios. Code available at https://github.com/6lyc/RecGOAT-LLM4Rec.

Problem

Research questions and friction points this paper is trying to address.

multimodal recommendation

large language models

representation divergence

semantic alignment

ID features

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimal Adaptive Transport

Dual Semantic Alignment

Cross-Modal Contrastive Learning

Graph Attention Networks

LLM-enhanced Recommendation

🔎 Similar Papers

MMREC: LLM Based Multi-Modal Recommender System

2024-08-08International Workshop on Semantic and Social Media Adaptation and PersonalizationCitations: 13

Authors to Follow