Gated Multimodal Graph Learning for Personalized Recommendation

📅 2025-05-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address cold-start and data sparsity challenges in collaborative filtering, this paper proposes a lightweight, scalable multimodal recommendation framework. Methodologically: (1) a gated multimodal fusion module dynamically weights image and text features to adaptively mitigate inter-modal quality variance; (2) a two-layer LightGCN encoder—without nonlinear transformations—models the user-item interaction graph to efficiently capture high-order collaborative signals. The framework balances performance, efficiency, and interpretability: it significantly outperforms state-of-the-art collaborative filtering and multimodal GNN baselines on Amazon multimodal benchmarks, achieving substantial gains in Top-K metrics (e.g., Recall@20, NDCG@20). Crucially, it reduces parameter count and computational overhead by orders of magnitude, enabling practical large-scale deployment.

Technology Category

Application Category

📝 Abstract
Multimodal recommendation has emerged as a promising solution to alleviate the cold-start and sparsity problems in collaborative filtering by incorporating rich content information, such as product images and textual descriptions. However, effectively integrating heterogeneous modalities into a unified recommendation framework remains a challenge. Existing approaches often rely on fixed fusion strategies or complex architectures , which may fail to adapt to modality quality variance or introduce unnecessary computational overhead. In this work, we propose RLMultimodalRec, a lightweight and modular recommendation framework that combines graph-based user modeling with adaptive multimodal item encoding. The model employs a gated fusion module to dynamically balance the contribution of visual and textual modalities, enabling fine-grained and content-aware item representations. Meanwhile, a two-layer LightGCN encoder captures high-order collaborative signals by propagating embeddings over the user-item interaction graph without relying on nonlinear transformations. We evaluate our model on a real-world dataset from the Amazon product domain. Experimental results demonstrate that RLMultimodalRec consistently outperforms several competitive baselines, including collaborative filtering, visual-aware, and multimodal GNN-based methods. The proposed approach achieves significant improvements in top-K recommendation metrics while maintaining scalability and interpretability, making it suitable for practical deployment.
Problem

Research questions and friction points this paper is trying to address.

Alleviate cold-start and sparsity in recommendation systems
Integrate heterogeneous modalities adaptively and efficiently
Balance visual and textual contributions dynamically
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gated fusion balances visual and textual modalities
LightGCN captures high-order collaborative signals
Modular framework adapts to modality quality variance
🔎 Similar Papers
No similar papers found.
S
Sibei Liu
Miami Herbert Business School, University of Miami, FL, United States
Yuanzhe Zhang
Yuanzhe Zhang
Institute of Automation, Chinese Academy of Sciences
Natural Language Processing
X
Xiang Li
Department of Electrical & Computer Engineering, Rutgers University, Sunnyvale, United States
Y
Yunbo Liu
Department of Electrical and Computer Engineering, Duke University, NC, United States
C
Chengwei Feng
School of Engineering, Computer & Mathematical Sciences (ECMS), Auckland University of Technology, Auckland, New Zealand
H
Hao Yang
Department Of Computer Science, Universiti Putra Malaysia, Kuala Lumpur, Malaysia