Training-Free Graph Filtering via Multimodal Feature Refinement for Extremely Fast Multimodal Recommendation

📅 2025-03-06

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

To address the high training overhead and inference latency of neural-network-based approaches in multimodal recommendation, this paper proposes MM-GF, a training-free graph filtering framework. MM-GF constructs a cross-modal similarity graph, aligns heterogeneous modality features (e.g., text and images) via robust scaling and vector translation, and replaces complex neural fusion modules with an interpretable linear low-pass filter for multimodal information refinement and aggregation. As the first training-free multimodal graph filtering paradigm, MM-GF eliminates parameter optimization entirely while achieving state-of-the-art performance: it improves recommendation accuracy by up to 13.35% across multiple real-world datasets and reduces end-to-end inference time to under 10 seconds—significantly outperforming existing SOTA trainable models in both efficiency and effectiveness.

Technology Category

Application Category

📝 Abstract

Multimodal recommender systems improve the performance of canonical recommender systems with no item features by utilizing diverse content types such as text, images, and videos, while alleviating inherent sparsity of user-item interactions and accelerating user engagement. However, current neural network-based models often incur significant computational overhead due to the complex training process required to learn and integrate information from multiple modalities. To overcome this limitation, we propose MultiModal-Graph Filtering (MM-GF), a training-free method based on the notion of graph filtering (GF) for efficient and accurate multimodal recommendations. Specifically, MM-GF first constructs multiple similarity graphs through nontrivial multimodal feature refinement such as robust scaling and vector shifting by addressing the heterogeneous characteristics across modalities. Then, MM-GF optimally fuses multimodal information using linear low-pass filters across different modalities. Extensive experiments on real-world benchmark datasets demonstrate that MM-GF not only improves recommendation accuracy by up to 13.35% compared to the best competitor but also dramatically reduces computational costs by achieving the runtime of less than 10 seconds.

Problem

Research questions and friction points this paper is trying to address.

Reduces computational overhead in multimodal recommender systems

Improves recommendation accuracy without complex training

Efficiently integrates diverse content types like text, images, videos

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free graph filtering for recommendations

Multimodal feature refinement via robust scaling

Linear low-pass filters for information fusion

🔎 Similar Papers

No similar papers found.