SADDLe: Sharpness-Aware Decentralized Deep Learning with Heterogeneous Data

📅 2024-05-22
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF

career value

276K/year
🤖 AI Summary
To address the dual challenges of local overfitting and poor global generalization caused by data heterogeneity, as well as high communication overhead under sparse network topologies in decentralized deep learning, this work introduces Sharpness-Aware Minimization (SAM) into decentralized training for the first time. We propose two SAM variants specifically designed for sparse graph-structured networks, integrating gradient compression with consensus optimization while preserving convergence guarantees and enhancing robustness to communication compression. Experiments across multiple heterogeneous datasets demonstrate improvements in test accuracy by 1–20%. Moreover, our methods support up to 4× gradient compression with only a 1% average accuracy drop—effectively balancing generalization performance and communication efficiency.

Technology Category

Application Category

📝 Abstract
Decentralized training enables learning with distributed datasets generated at different locations without relying on a central server. In realistic scenarios, the data distribution across these sparsely connected learning agents can be significantly heterogeneous, leading to local model over-fitting and poor global model generalization. Another challenge is the high communication cost of training models in such a peer-to-peer fashion without any central coordination. In this paper, we jointly tackle these two-fold practical challenges by proposing SADDLe, a set of sharpness-aware decentralized deep learning algorithms. SADDLe leverages Sharpness-Aware Minimization (SAM) to seek a flatter loss landscape during training, resulting in better model generalization as well as enhanced robustness to communication compression. We present two versions of our approach and conduct extensive experiments to show that SADDLe leads to 1-20% improvement in test accuracy compared to other existing techniques. Additionally, our proposed approach is robust to communication compression, with an average drop of only 1% in the presence of up to 4x compression.
Problem

Research questions and friction points this paper is trying to address.

Addresses decentralized deep learning with heterogeneous data.
Reduces communication cost in peer-to-peer training.
Improves model generalization and robustness to compression.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decentralized training with SAM
Handles heterogeneous data effectively
Robust to communication compression