Decoupled Training with Local Reinforcement Fine-Tuning in Federated Learning

📅 2026-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of optimization inconsistency and over-specialization in federated learning of pre-trained vision-language models, which arise from client heterogeneity and full-data local updates. To this end, the authors propose FedDTL, a novel framework that decouples image and text encoders between the server and clients and introduces a modality alignment mechanism to ensure consistent global semantic updates. Additionally, FedDTL employs a two-stage local fine-tuning strategy: an initial supervised fine-tuning phase for rapid warm-starting, followed by reinforcement learning to enhance generalization. This approach is the first to integrate decoupled encoder architectures with reinforcement learning–based local fine-tuning in federated vision-language learning, achieving a significant balance between global task adaptability and generalization across diverse data distributions—including label skew and feature shift—and under both few-shot and full-data settings.
📝 Abstract
Federated Learning (FL) with pre-trained Vision-Language Models (VLMs) has emerged as a promising paradigm for various downstream tasks. By leveraging its strong representations, recent studies improve task adaptation under insufficient local data while preserving generalization. However, these methods emphasize fully local optimization with simple parameter aggregation,which can amplify inter-client optimization inconsistency and intra-client over-specialization under heterogeneous and full-data FL settings, making it difficult to balance global task adaptation and generalization. To address these challenges, we propose FedDTL, a novel federated VLM framework that decouples the image encoder and text encoder across clients and the server. Through decoupled encoder training with server-client modality alignment, FedDTL promotes coherent global semantic update and reduces inter-client optimization inconsistency, improving global task adaptation.To further mitigate intra-client over-specialization,we introduce a two-stage local fine-tuning, where a supervised fine-tuning stage enables rapid and reliable warm-start, followed by a reinforcement learning stage that enhances generalization. Extensive experiments on multiple benchmarks, including label skew and feature shift, demonstrate that FedDTL achieves an effective balance between global task adaptation and generalization under various FL data distributions in both few-shot and full-data regimes.
Problem

Research questions and friction points this paper is trying to address.

Federated Learning
Vision-Language Models
Optimization Inconsistency
Over-specialization
Generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decoupled Training
Federated Learning
Vision-Language Models
Reinforcement Fine-Tuning
Modality Alignment
🔎 Similar Papers