FedSQ: Optimized Weight Averaging via Fixed Gating

📅 2026-04-03

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the instability in federated aggregation caused by non-i.i.d. data and client drift by proposing a structure-quantization decoupled learning paradigm. Leveraging DualCopy network decomposition, the approach freezes a structural copy of the pre-trained model during federated fine-tuning to generate a fixed binary gating mask, while exclusively optimizing and aggregating the affine parameters of its quantized counterpart. By constraining learning to lightweight adjustments within the gating mechanism, the method effectively exploits pre-trained structural knowledge to stabilize optimization and reduce sensitivity to global model oscillations. Experiments demonstrate that this strategy significantly enhances robustness under both i.i.d. and Dirichlet-based non-i.i.d. data partitions, reduces the number of communication rounds required to reach optimal validation performance, and maintains high accuracy in transfer scenarios.

Technology Category

Application Category

📝 Abstract

Federated learning (FL) enables collaborative training across organizations without sharing raw data, but it is hindered by statistical heterogeneity (non-i.i.d.\ client data) and by instability of naive weight averaging under client drift. In many cross-silo deployments, FL is warm-started from a strong pretrained backbone (e.g., ImageNet-1K) and then adapted to local domains. Motivated by recent evidence that ReLU-like gating regimes (structural knowledge) stabilize earlier than the remaining parameter values (quantitative knowledge), we propose FedSQ (Federated Structural-Quantitative learning), a transfer-initialized neural federated procedure based on a DualCopy, piecewise-linear view of deep networks. FedSQ freezes a structural copy of the pretrained model to induce fixed binary gating masks during federated fine-tuning, while only a quantitative copy is optimized locally and aggregated across rounds. Fixing the gating reduces learning to within-regime affine refinements, which stabilizes aggregation under heterogeneous partitions. Experiments on two convolutional neural network backbones under i.i.d.\ and Dirichlet splits show that FedSQ improves robustness and can reduce rounds-to-best validation performance relative to standard baselines while preserving accuracy in the transfer setting.

Problem

Research questions and friction points this paper is trying to address.

federated learning

statistical heterogeneity

non-i.i.d.

client drift

weight averaging

Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Learning

Structural-Quantitative Learning

Fixed Gating