🤖 AI Summary
In multimodal recommendation, unreliable modality data often degrades fusion performance below unimodal baselines. Existing weighted fusion methods lack effective supervision for modality reliability, leading to inaccurate weight learning. To address this, we propose a reliability-guided dynamic multimodal fusion framework. First, we implicitly formulate the Bayesian Personalized Ranking (BPR) objective as a proxy label for modality reliability and design a confidence-aware mechanism to adaptively calibrate supervision strength, thereby mitigating erroneous supervision. Second, we introduce a modality-specific score discrepancy modeling module and an end-to-end differentiable fusion module. Extensive experiments on three real-world datasets demonstrate significant improvements over state-of-the-art methods, validating the effectiveness of our reliability supervision mechanism in enhancing both fusion accuracy and robustness.
📝 Abstract
Multimodal recommendation faces an issue of the performance degradation that the uni-modal recommendation sometimes achieves the better performance. A possible reason is that the unreliable item modality data hurts the fusion result. Several existing studies have introduced weights for different modalities to reduce the contribution of the unreliable modality data in predicting the final user rating. However, they fail to provide appropriate supervisions for learning the modality weights, making the learned weights imprecise. Therefore, we propose a modality reliability guided multimodal recommendation framework that uniquely learns the modality weights supervised by the modality reliability. Considering that there is no explicit label provided for modality reliability, we resort to automatically identify it through the BPR recommendation objective. In particular, we define a modality reliability vector as the supervision label by the difference between modality-specific user ratings to positive and negative items, where a larger difference indicates a higher reliability of the modality as the BPR objective is better satisfied. Furthermore, to enhance the effectiveness of the supervision, we calculate the confidence level for the modality reliability vector, which dynamically adjusts the supervision strength and eliminates the harmful supervision. Extensive experiments on three real-world datasets show the effectiveness of the proposed method.