On the Convergence and Stability of Distributed Sub-model Training

📅 2025-11-08

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

To address the challenge of training large models locally on resource-constrained devices in federated learning, this paper proposes Distributed Randomized Submodel Training (DRSMT). DRSMT pre-partitions the global model into multiple submodules; the server then distributes distinct submodels to clients per round—following a rotation strategy analogous to SGD’s random shuffling—where each client updates only its assigned submodel. The server reconstructs the full model during aggregation. We theoretically establish that DRSMT achieves the same convergence rate as standard FedAvg—namely, O(1/√T)—and further demonstrate, via algorithmic stability analysis, that it attains a tighter generalization error bound. Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet show that DRSMT significantly accelerates convergence and improves test accuracy, particularly under communication constraints and across heterogeneous device environments.

Technology Category

Application Category

📝 Abstract

As learning models continue to grow in size, enabling on-device local training of these models has emerged as a critical challenge in federated learning. A popular solution is sub-model training, where the server only distributes randomly sampled sub-models to the edge clients, and clients only update these small models. However, those random sampling of sub-models may not give satisfying convergence performance. In this paper, observing the success of SGD with shuffling, we propose a distributed shuffled sub-model training, where the full model is partitioned into several sub-models in advance, and the server shuffles those sub-models, sends each of them to clients at each round, and by the end of local updating period, clients send back the updated sub-models, and server averages them. We establish the convergence rate of this algorithm. We also study the generalization of distributed sub-model training via stability analysis, and find that the sub-model training can improve the generalization via amplifying the stability of training process. The extensive experiments also validate our theoretical findings.

Problem

Research questions and friction points this paper is trying to address.

Optimizing convergence in federated sub-model training through structured shuffling

Analyzing generalization improvements via training stability in distributed systems

Addressing performance limitations of random sub-model sampling in federated learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributed shuffled sub-model training for federated learning

Pre-partitioned sub-models with server-side shuffling mechanism

Convergence analysis and stability-enhanced generalization performance

🔎 Similar Papers

Spike No More: Stabilizing the Pre-training of Large Language Models