🤖 AI Summary
This work addresses the vulnerability of multimodal recommender systems to noise—such as misleading visual or textual content—which can significantly degrade performance. It presents the first systematic investigation into trustworthiness from both modality and interaction perspectives, proposing a lightweight, plug-and-play correction module. The method leverages Sinkhorn soft matching to learn semantic alignment between items and their multimodal features, thereby suppressing mismatched signals. Additionally, it uncovers the dual role of pseudo-interactions and graph propagation under noisy conditions. Notably, the approach requires no modification to the backbone model and consistently enhances recommendation robustness across multiple datasets and noise levels, demonstrating the critical importance of interaction-level noise mitigation.
📝 Abstract
Recent advances in multimodal recommendation have demonstrated the effectiveness of incorporating visual and textual content into collaborative filtering. However, real-world deployments raise an increasingly important yet underexplored issue: trustworthiness. On modern e-commerce platforms, multimodal content can be misleading or unreliable (e.g., visually inconsistent product images or click-bait titles), injecting untrustworthy signals into multimodal representations and making existing recommenders brittle under modality corruption. In this work, we take a step towards trustworthy multimodal recommendation from both a method and an analysis perspective. First, we propose a plug-and-play modality-level rectification component that mitigates untrustworthy modality features by learning soft correspondences between items and multimodal features. Using lightweight projections and Sinkhorn-based soft matching, the rectification suppresses mismatched modality signals while preserving semantic consistency, and can be integrated into existing multimodal recommenders without architectural modifications. Second, we present two practical insights on interaction-level trustworthiness under noisy collaborative signals: (i) training-set pseudo interactions can help or hurt performance under noise depending on prior-signal alignment; and (ii) propagation-graph pseudo edges can also help or hurt robustness, as message passing may amplify misalignment. Extensive experiments on multiple datasets and backbones under varying corruption levels demonstrate improved robustness from modality rectification and validate the above interaction-level observations.