🤖 AI Summary
To address distribution heterogeneity and fine-grained semantic alignment challenges in multimodal federated recommendation, this paper proposes the Group-aware Federated Multimodal Fusion (GFMF) framework. GFMF offloads high-capacity multimodal encoders to the server to enable cross-client semantic alignment while preserving user privacy. It introduces a novel group-aware fusion mechanism that dynamically clusters clients based on user similarity and performs fine-grained feature interaction within each group, balancing knowledge sharing and personalization. Additionally, a dedicated fusion loss is designed to optimize alignment quality. The framework is modular and plug-and-play. Extensive experiments on five public benchmark datasets demonstrate that GFMF consistently outperforms state-of-the-art multimodal federated recommendation methods, achieving significant improvements in recommendation accuracy and generalization capability.
📝 Abstract
Federated Recommendation (FR) is a new learning paradigm to tackle the learn-to-rank problem in a privacy-preservation manner. How to integrate multi-modality features into federated recommendation is still an open challenge in terms of efficiency, distribution heterogeneity, and fine-grained alignment. To address these challenges, we propose a novel multimodal fusion mechanism in federated recommendation settings (GFMFR). Specifically, it offloads multimodal representation learning to the server, which stores item content and employs a high-capacity encoder to generate expressive representations, alleviating client-side overhead. Moreover, a group-aware item representation fusion approach enables fine-grained knowledge sharing among similar users while retaining individual preferences. The proposed fusion loss could be simply plugged into any existing federated recommender systems empowering their capability by adding multi-modality features. Extensive experiments on five public benchmark datasets demonstrate that GFMFR consistently outperforms state-of-the-art multimodal FR baselines.