Sequences as Nodes for Contrastive Multimodal Graph Recommendation

📅 2026-02-06

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work addresses the cold-start and data sparsity challenges in recommender systems by proposing a novel sequential modeling paradigm that eliminates the need for manual data augmentation. The approach transforms user interaction sequences into graph nodes via attention pooling, constructing a Sequence-Item (SI) graph on which multi-view propagation is performed to jointly integrate collaborative, sequential, and multimodal signals. A key innovation is an ID-guided multimodal gating mechanism that effectively aligns and suppresses noise in textual and visual features. Extensive experiments demonstrate that the proposed method significantly outperforms existing baselines on the Amazon Baby, Sports, and Electronics datasets, with particularly pronounced gains in scenarios involving short user interaction histories, thereby effectively mitigating cold-start and sparsity issues.

Technology Category

Application Category

📝 Abstract

To tackle cold-start and data sparsity issues in recommender systems, numerous multimodal, sequential, and contrastive techniques have been proposed. While these augmentations can boost recommendation performance, they tend to add noise and disrupt useful semantics. To address this, we propose MuSICRec (Multimodal Sequence-Item Contrastive Recommender), a multi-view graph-based recommender that combines collaborative, sequential, and multimodal signals. We build a sequence-item (SI) view by attention pooling over the user's interacted items to form sequence nodes. We propagate over the SI graph, obtaining a second view organically as an alternative to artificial data augmentation, while simultaneously injecting sequential context signals. Additionally, to mitigate modality noise and align the multimodal information, the contribution of text and visual features is modulated according to an ID-guided gate. We evaluate under a strict leave-two-out split against a broad range of sequential, multimodal, and contrastive baselines. On the Amazon Baby, Sports, and Electronics datasets, MuSICRec outperforms state-of-the-art baselines across all model types. We observe the largest gains for short-history users, mitigating sparsity and cold-start challenges. Our code is available at https://anonymous.4open.science/r/MuSICRec-3CEE/ and will be made publicly available.

Problem

Research questions and friction points this paper is trying to address.

cold-start

data sparsity

multimodal noise

semantic disruption

recommendation

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal recommendation

contrastive learning

sequence-item graph