🤖 AI Summary
This work addresses the challenge of aligning heterogeneous vision-language models in federated learning, where clients differ in resources, tasks, and architectures while requiring strict data privacy. To this end, the authors propose the MoR framework, which pioneers the use of preference signals—rather than model parameters—for federated alignment. Each client trains a local reward model using preference annotations, eliminating the need to transmit raw data. MoR further introduces a hybrid reward mechanism and an adaptive routing fusion strategy to harmonize feedback from heterogeneous clients. The global model is optimized via the GRPO algorithm with KL regularization. Notably, MoR operates without sharing data or enforcing uniform model architectures, achieving state-of-the-art performance across three VQA benchmarks, with superior generalization, robustness, and cross-client adaptability.
📝 Abstract
VLMs have broad potential in privacy-sensitive domains such as healthcare and finance, yet strict data-sharing constraints render centralized training infeasible. FL mitigates this issue by enabling decentralized training, but practical deployments face challenges due to client heterogeneity in computational resources, application requirements, and model architectures. We argue that while replacing data with model parameters characterizes the present of FL, replacing parameters with preferences represents a more scalable and privacy-preserving future. Motivated by this perspective, we propose MoR, a federated alignment framework based on GRPO with Mixture-of-Rewards for heterogeneous VLMs. MoR initializes a visual foundation model as a KL-regularized reference, while each client locally trains a reward model from local preference annotations, capturing specific evaluation signals without exposing raw data. To reconcile heterogeneous rewards, we introduce a routing-based fusion mechanism that adaptively aggregates client reward signals. Finally, the server performs GRPO with this mixed reward to optimize the base VLM. Experiments on three public VQA benchmarks demonstrate that MoR consistently outperforms federated alignment baselines in generalization, robustness, and cross-client adaptability. Our approach provides a scalable solution for privacy-preserving alignment of heterogeneous VLMs under federated settings.