Robust Multimodal Recommendation via Graph Retrieval-Enhanced Modality Completion

📅 2026-05-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

206K/year
🤖 AI Summary
This work addresses the performance degradation of multimodal recommender systems in real-world scenarios caused by missing modalities. To tackle this issue, the authors propose a graph retrieval–augmented modality completion framework that leverages a modality-aware subgraph retrieval mechanism to extract semantically relevant subgraphs from a global interaction graph. A graph Transformer is then employed to jointly encode the query node and the retrieved subgraph, enabling effective reconstruction of missing modality features. Additionally, a learnable sparse routing codebook is introduced to enhance embedding robustness. Extensive experiments on multiple multimodal recommendation benchmarks demonstrate that the proposed method significantly outperforms existing approaches, validating the efficacy and superiority of the subgraph retrieval and joint encoding strategy for modality completion.
📝 Abstract
Multimodal data plays a critical role in web-based recommendation systems, where information from diverse modalities such as vision and text enhances representation learning. However, real-world multimodal datasets often suffer from modality incompleteness due to sensor failures, annotation scarcity, or privacy constraints, which substantially degrade model performance and reliability. One effective solution to address this issue is modality completion, which reconstructs missing features to provide modality-complete graphs for downstream tasks. Given a query node with missing multimodal features, existing modality completion methods typically infer information from the node itself or its neighbors to reconstruct the missing modality. However, these methods may overlook semantically relevant context in the graph, which contains valuable cues that are non-trivial to capture through simple methods like neighborhood aggregation. In this work, we propose GRE-MC, a Graph Retrieval-Enhanced Modality Completion framework, to overcome these limitations. By introducing a modality-aware subgraph retrieval mechanism, GRE-MC selects semantically relevant subgraphs from the entire graph, providing richer contextual information for completing missing modalities. Subsequently, a graph transformer jointly encodes the query node and the retrieved subgraph via global attention to complete the missing features, while a learnable sparse-routing codebook regularizes latent embeddings into compact bases for improved robustness. Extensive experiments on multimodal recommendation benchmarks demonstrate that GRE-MC consistently outperforms state-of-the-art methods, validating the effectiveness of subgraph retrieval and joint-encoding graph transformer for robust modality completion.
Problem

Research questions and friction points this paper is trying to address.

multimodal recommendation
modality incompleteness
missing modalities
graph-based recommendation
robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

modality completion
graph retrieval
graph transformer
multimodal recommendation
sparse routing
🔎 Similar Papers
No similar papers found.