Layer-wise Quantization for Quantized Optimistic Dual Averaging

📅 2025-05-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The inter-layer heterogeneity of deep neural networks—e.g., residual blocks and multi-head attention modules—exhibits significant disparities in dimensionality, activation patterns, and representation characteristics, leading to excessive communication overhead and slow convergence in distributed variational inequality (VI) optimization. Method: This paper introduces layer-aware quantization into the VI optimization framework for the first time, proposing a layer-adaptive quantization mechanism and the Quantized Optimistic Dual Averaging (QODA) algorithm. Contribution/Results: We derive tight bounds on quantization variance and minimum code length; design an adaptive step-size strategy ensuring optimal $O(1/T)$ convergence under monotone VIs. Evaluated on a 12+ GPU cluster training Wasserstein GANs, our method achieves a 150% end-to-end speedup, substantially outperforming existing quantization-based and distributed VI approaches.

Technology Category

Application Category

📝 Abstract
Modern deep neural networks exhibit heterogeneity across numerous layers of various types such as residuals, multi-head attention, etc., due to varying structures (dimensions, activation functions, etc.), distinct representation characteristics, which impact predictions. We develop a general layer-wise quantization framework with tight variance and code-length bounds, adapting to the heterogeneities over the course of training. We then apply a new layer-wise quantization technique within distributed variational inequalities (VIs), proposing a novel Quantized Optimistic Dual Averaging (QODA) algorithm with adaptive learning rates, which achieves competitive convergence rates for monotone VIs. We empirically show that QODA achieves up to a $150%$ speedup over the baselines in end-to-end training time for training Wasserstein GAN on $12+$ GPUs.
Problem

Research questions and friction points this paper is trying to address.

Addresses layer-wise heterogeneity in deep neural networks
Develops quantization framework with tight variance bounds
Proposes QODA algorithm for faster distributed training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Layer-wise quantization with tight variance bounds
Quantized Optimistic Dual Averaging algorithm
Adaptive learning rates for distributed VIs
🔎 Similar Papers
No similar papers found.
A
Anh Duc Nguyen
National University of Singapore (NUS), mostly work done at LIONS, EPFL
I
Ilia Markov
Neural Magic
F
Frank Zhengqing Wu
Laboratory for Information and Inference Systems (LIONS), École Polytechnique Fédérale de Lausanne (EPFL)
Ali Ramezani-Kebrya
Ali Ramezani-Kebrya
Associate Professor (with tenure) in CS at the University of Oslo
Machine Learning
Kimon Antonakopoulos
Kimon Antonakopoulos
LIONS-EPFL
Convex OptimizationContinuous OptimizationVariational Inequalities
Dan Alistarh
Dan Alistarh
Professor at IST Austria
Machine LearningAlgorithmsDistributed Computing
V
V. Cevher
Laboratory for Information and Inference Systems (LIONS), École Polytechnique Fédérale de Lausanne (EPFL)