Replacing Parameters with Preferences: Federated Alignment of Heterogeneous Vision-Language Models

📅 2026-01-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of aligning heterogeneous vision-language models in federated learning, where clients differ in resources, tasks, and architectures while requiring strict data privacy. To this end, the authors propose the MoR framework, which pioneers the use of preference signals—rather than model parameters—for federated alignment. Each client trains a local reward model using preference annotations, eliminating the need to transmit raw data. MoR further introduces a hybrid reward mechanism and an adaptive routing fusion strategy to harmonize feedback from heterogeneous clients. The global model is optimized via the GRPO algorithm with KL regularization. Notably, MoR operates without sharing data or enforcing uniform model architectures, achieving state-of-the-art performance across three VQA benchmarks, with superior generalization, robustness, and cross-client adaptability.

Technology Category

Application Category

📝 Abstract
VLMs have broad potential in privacy-sensitive domains such as healthcare and finance, yet strict data-sharing constraints render centralized training infeasible. FL mitigates this issue by enabling decentralized training, but practical deployments face challenges due to client heterogeneity in computational resources, application requirements, and model architectures. We argue that while replacing data with model parameters characterizes the present of FL, replacing parameters with preferences represents a more scalable and privacy-preserving future. Motivated by this perspective, we propose MoR, a federated alignment framework based on GRPO with Mixture-of-Rewards for heterogeneous VLMs. MoR initializes a visual foundation model as a KL-regularized reference, while each client locally trains a reward model from local preference annotations, capturing specific evaluation signals without exposing raw data. To reconcile heterogeneous rewards, we introduce a routing-based fusion mechanism that adaptively aggregates client reward signals. Finally, the server performs GRPO with this mixed reward to optimize the base VLM. Experiments on three public VQA benchmarks demonstrate that MoR consistently outperforms federated alignment baselines in generalization, robustness, and cross-client adaptability. Our approach provides a scalable solution for privacy-preserving alignment of heterogeneous VLMs under federated settings.
Problem

Research questions and friction points this paper is trying to address.

federated learning
vision-language models
model heterogeneity
privacy-preserving alignment
preference-based alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Alignment
Preference-based Learning
Mixture-of-Rewards
Heterogeneous VLMs
GRPO
🔎 Similar Papers
No similar papers found.
S
Shule Lu
Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing; Institute of Artificial Intelligence, Beihang University, China
Yujing Wang
Yujing Wang
Beihang University
NLPLarge Language Model
Hainan Zhang
Hainan Zhang
Beihang University
Dialogue GenerationText GenerationFederated LearningNatural Language Processing
X
Xiaoshan Yang
MAIS, Institute of Automation, Chinese Academy of Sciences, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, China
Hongwei Zheng
Hongwei Zheng
Shanghai Jiao Tong University
计算机视觉、联邦学习
Y
Yongxin Tong
Institute of Artificial Intelligence, Beihang University, China
Changsheng Xu
Changsheng Xu
Professor, Institute of Automation, Chinese Academy of Sciences
MultimediaComputer vision
Z
Zhiming Zheng
Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing; Institute of Artificial Intelligence, Beihang University, China