FedSQ: Optimized Weight Averaging via Fixed Gating

📅 2026-04-03
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the instability in federated aggregation caused by non-i.i.d. data and client drift by proposing a structure-quantization decoupled learning paradigm. Leveraging DualCopy network decomposition, the approach freezes a structural copy of the pre-trained model during federated fine-tuning to generate a fixed binary gating mask, while exclusively optimizing and aggregating the affine parameters of its quantized counterpart. By constraining learning to lightweight adjustments within the gating mechanism, the method effectively exploits pre-trained structural knowledge to stabilize optimization and reduce sensitivity to global model oscillations. Experiments demonstrate that this strategy significantly enhances robustness under both i.i.d. and Dirichlet-based non-i.i.d. data partitions, reduces the number of communication rounds required to reach optimal validation performance, and maintains high accuracy in transfer scenarios.
📝 Abstract
Federated learning (FL) enables collaborative training across organizations without sharing raw data, but it is hindered by statistical heterogeneity (non-i.i.d.\ client data) and by instability of naive weight averaging under client drift. In many cross-silo deployments, FL is warm-started from a strong pretrained backbone (e.g., ImageNet-1K) and then adapted to local domains. Motivated by recent evidence that ReLU-like gating regimes (structural knowledge) stabilize earlier than the remaining parameter values (quantitative knowledge), we propose FedSQ (Federated Structural-Quantitative learning), a transfer-initialized neural federated procedure based on a DualCopy, piecewise-linear view of deep networks. FedSQ freezes a structural copy of the pretrained model to induce fixed binary gating masks during federated fine-tuning, while only a quantitative copy is optimized locally and aggregated across rounds. Fixing the gating reduces learning to within-regime affine refinements, which stabilizes aggregation under heterogeneous partitions. Experiments on two convolutional neural network backbones under i.i.d.\ and Dirichlet splits show that FedSQ improves robustness and can reduce rounds-to-best validation performance relative to standard baselines while preserving accuracy in the transfer setting.
Problem

Research questions and friction points this paper is trying to address.

federated learning
statistical heterogeneity
non-i.i.d.
client drift
weight averaging
Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Learning
Structural-Quantitative Learning
Fixed Gating
Non-IID Data
Model Fine-tuning
🔎 Similar Papers
No similar papers found.
C
Cristian PĂŠrez-Corral
Universitat Politècnica de València
J
Jose I. Mestre
Universitat Politècnica de València
A
Alberto FernĂĄndez-HernĂĄndez
Universitat Politècnica de València
Manuel F. Dolz
Manuel F. Dolz
Universitat Jaume I
High Performance ComputingEnergy EfficiencyParallel Programming ModelsPerformance AnalysisDeep Learning
J
JosĂŠ Duato
Openchip & Software Technologies
Enrique S. Quintana-OrtĂ­
Enrique S. Quintana-OrtĂ­
Universitat Politècnica de València, Spain