Multimodal-enhanced Federated Recommendation: A Group-wise Fusion Approach

📅 2025-09-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address distribution heterogeneity and fine-grained semantic alignment challenges in multimodal federated recommendation, this paper proposes the Group-aware Federated Multimodal Fusion (GFMF) framework. GFMF offloads high-capacity multimodal encoders to the server to enable cross-client semantic alignment while preserving user privacy. It introduces a novel group-aware fusion mechanism that dynamically clusters clients based on user similarity and performs fine-grained feature interaction within each group, balancing knowledge sharing and personalization. Additionally, a dedicated fusion loss is designed to optimize alignment quality. The framework is modular and plug-and-play. Extensive experiments on five public benchmark datasets demonstrate that GFMF consistently outperforms state-of-the-art multimodal federated recommendation methods, achieving significant improvements in recommendation accuracy and generalization capability.

Technology Category

Application Category

📝 Abstract
Federated Recommendation (FR) is a new learning paradigm to tackle the learn-to-rank problem in a privacy-preservation manner. How to integrate multi-modality features into federated recommendation is still an open challenge in terms of efficiency, distribution heterogeneity, and fine-grained alignment. To address these challenges, we propose a novel multimodal fusion mechanism in federated recommendation settings (GFMFR). Specifically, it offloads multimodal representation learning to the server, which stores item content and employs a high-capacity encoder to generate expressive representations, alleviating client-side overhead. Moreover, a group-aware item representation fusion approach enables fine-grained knowledge sharing among similar users while retaining individual preferences. The proposed fusion loss could be simply plugged into any existing federated recommender systems empowering their capability by adding multi-modality features. Extensive experiments on five public benchmark datasets demonstrate that GFMFR consistently outperforms state-of-the-art multimodal FR baselines.
Problem

Research questions and friction points this paper is trying to address.

Integrating multi-modality features into federated recommendation systems efficiently
Addressing distribution heterogeneity and fine-grained alignment challenges in FR
Reducing client-side overhead while maintaining privacy preservation in recommendations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Server-side multimodal representation learning
Group-aware fusion for fine-grained sharing
Plug-and-play fusion loss integration
🔎 Similar Papers
No similar papers found.
C
Chunxu Zhang
Jilin University, China
W
Weipeng Zhang
Jilin University, China
Guodong Long
Guodong Long
Associate Professor, Faculty of Engineering and IT, University of Technology Sydney
Federated LearningFoundation ModelsFederated IntelligenceFoundation AgentsDigital Health
Z
Zhiheng Xue
Jilin University, China
R
Riting Xia
Inner Mongolia University, China
B
Bo Yang
Jilin University, China