Multimodal-enhanced Federated Recommendation: A Group-wise Fusion Approach

📅 2025-09-24

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

To address distribution heterogeneity and fine-grained semantic alignment challenges in multimodal federated recommendation, this paper proposes the Group-aware Federated Multimodal Fusion (GFMF) framework. GFMF offloads high-capacity multimodal encoders to the server to enable cross-client semantic alignment while preserving user privacy. It introduces a novel group-aware fusion mechanism that dynamically clusters clients based on user similarity and performs fine-grained feature interaction within each group, balancing knowledge sharing and personalization. Additionally, a dedicated fusion loss is designed to optimize alignment quality. The framework is modular and plug-and-play. Extensive experiments on five public benchmark datasets demonstrate that GFMF consistently outperforms state-of-the-art multimodal federated recommendation methods, achieving significant improvements in recommendation accuracy and generalization capability.

Technology Category

Application Category

📝 Abstract

Federated Recommendation (FR) is a new learning paradigm to tackle the learn-to-rank problem in a privacy-preservation manner. How to integrate multi-modality features into federated recommendation is still an open challenge in terms of efficiency, distribution heterogeneity, and fine-grained alignment. To address these challenges, we propose a novel multimodal fusion mechanism in federated recommendation settings (GFMFR). Specifically, it offloads multimodal representation learning to the server, which stores item content and employs a high-capacity encoder to generate expressive representations, alleviating client-side overhead. Moreover, a group-aware item representation fusion approach enables fine-grained knowledge sharing among similar users while retaining individual preferences. The proposed fusion loss could be simply plugged into any existing federated recommender systems empowering their capability by adding multi-modality features. Extensive experiments on five public benchmark datasets demonstrate that GFMFR consistently outperforms state-of-the-art multimodal FR baselines.

Problem

Research questions and friction points this paper is trying to address.

Integrating multi-modality features into federated recommendation systems efficiently

Addressing distribution heterogeneity and fine-grained alignment challenges in FR

Reducing client-side overhead while maintaining privacy preservation in recommendations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Server-side multimodal representation learning

Group-aware fusion for fine-grained sharing

Plug-and-play fusion loss integration

🔎 Similar Papers

No similar papers found.