🤖 AI Summary
To address three key challenges in multimodal recommendation—content noise, feedback noise, and modality-behavior misalignment—this paper proposes DA-MRS, a denoising and alignment framework. DA-MRS constructs a cross-modal consistency graph to explicitly model inter-modal relationships; introduces a content-guided denoising Bayesian Personalized Ranking (BPR) loss for probabilistic feedback calibration; and pioneers a dual-alignment mechanism jointly driven by user preferences and fine-grained item relations to enhance representation consistency. The framework is plug-and-play, fully compatible with mainstream backbone models. Extensive experiments across multiple benchmark datasets and diverse noise settings demonstrate that DA-MRS consistently and significantly improves recommendation performance. Notably, it exhibits exceptional robustness under high-noise conditions, validating its strong generalizability and practical applicability in real-world noisy multimodal scenarios.
📝 Abstract
Multi-modal recommender systems (MRSs) are pivotal in diverse online web platforms and have garnered considerable attention in recent years. However, previous studies overlook the challenges of (1) noisy multi-modal content, (2) noisy user feedback, and (3) aligning multi-modal content with user feedback. In order to tackle these challenges, we propose Denoising and Aligning Multi-modal Recommender System (DA-MRS). To mitigate multi-modal noise, DA-MRS first constructs item-item graphs determined by consistent content similarity across modalities. To denoise user feedback, DA-MRS associates the probability of observed feedback with multi-modal content and devises a denoised BPR loss. Furthermore, DA-MRS implements Alignment guided by User preference to enhance task-specific item representation and Alignment guided by graded Item relations to provide finer-grained alignment. Extensive experiments verify that DA-MRS is a plug-and-play framework and achieves significant and consistent improvements across various datasets, backbone models, and noisy scenarios.