Layer-wise Quantization for Quantized Optimistic Dual Averaging

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

The inter-layer heterogeneity of deep neural networks—e.g., residual blocks and multi-head attention modules—exhibits significant disparities in dimensionality, activation patterns, and representation characteristics, leading to excessive communication overhead and slow convergence in distributed variational inequality (VI) optimization. Method: This paper introduces layer-aware quantization into the VI optimization framework for the first time, proposing a layer-adaptive quantization mechanism and the Quantized Optimistic Dual Averaging (QODA) algorithm. Contribution/Results: We derive tight bounds on quantization variance and minimum code length; design an adaptive step-size strategy ensuring optimal $O(1/T)$ convergence under monotone VIs. Evaluated on a 12+ GPU cluster training Wasserstein GANs, our method achieves a 150% end-to-end speedup, substantially outperforming existing quantization-based and distributed VI approaches.

Technology Category

Application Category

📝 Abstract

Modern deep neural networks exhibit heterogeneity across numerous layers of various types such as residuals, multi-head attention, etc., due to varying structures (dimensions, activation functions, etc.), distinct representation characteristics, which impact predictions. We develop a general layer-wise quantization framework with tight variance and code-length bounds, adapting to the heterogeneities over the course of training. We then apply a new layer-wise quantization technique within distributed variational inequalities (VIs), proposing a novel Quantized Optimistic Dual Averaging (QODA) algorithm with adaptive learning rates, which achieves competitive convergence rates for monotone VIs. We empirically show that QODA achieves up to a $150%$ speedup over the baselines in end-to-end training time for training Wasserstein GAN on $12+$ GPUs.

Problem

Research questions and friction points this paper is trying to address.

Addresses layer-wise heterogeneity in deep neural networks

Develops quantization framework with tight variance bounds

Proposes QODA algorithm for faster distributed training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Layer-wise quantization with tight variance bounds

Quantized Optimistic Dual Averaging algorithm

Adaptive learning rates for distributed VIs

🔎 Similar Papers

Mixed Non-linear Quantization for Vision Transformers