LOTION: Smoothing the Optimization Landscape for Quantized Training

πŸ“… 2025-10-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

182K/year
πŸ€– AI Summary
Quantization-aware training (QAT) suffers from optimization difficulties due to the piecewise-constant nature of quantizers, causing gradients to vanish almost everywhere and rendering the loss non-differentiable at quantization thresholds. To address this, we propose a loss-smoothing technique based on unbiased stochastic rounding: we construct an expectation-based surrogate loss function that is differentiable and serves as a statistically consistent approximation of the original quantized loss. We theoretically prove that this surrogate preserves convergence guarantees of standard optimizers to local minima and shares identical global optima with the original problem. Our method integrates Nesterov smoothing principles into low-precision optimization, introducing controllable stochastic noise during backpropagation to enable end-to-end differentiability. Experiments on synthetic data and large language models (150M/300M parameters) demonstrate significant improvements over conventional QATβ€”achieving enhanced training stability, faster convergence, and higher final accuracy.

Technology Category

Application Category

πŸ“ Abstract
Optimizing neural networks for quantized objectives is fundamentally challenging because the quantizer is piece-wise constant, yielding zero gradients everywhere except at quantization thresholds where the derivative is undefined. Most existing methods deal with this issue by relaxing gradient computations with techniques like Straight Through Estimators (STE) and do not provide any guarantees of convergence. In this work, taking inspiration from Nesterov smoothing, we approximate the quantized loss surface with a continuous loss surface. In particular, we introduce LOTION, extbf{L}ow-precision extbf{O}ptimization via s extbf{T}ochastic-no extbf{I}se sm extbf{O}othi extbf{N}g, a principled smoothing framework that replaces the raw quantized loss with its expectation under unbiased randomized-rounding noise. In this framework, standard optimizers are guaranteed to converge to a local minimum of the loss surface. Moreover, when using noise derived from stochastic rounding, we show that the global minima of the original quantized loss are preserved. We empirically demonstrate that this method outperforms standard QAT on synthetic testbeds and on 150M- and 300M- parameter language models.
Problem

Research questions and friction points this paper is trying to address.

Addressing zero-gradient issue in quantized neural network training
Providing convergence guarantees for low-precision optimization methods
Preserving global minima while smoothing quantized loss surfaces
Innovation

Methods, ideas, or system contributions that make the work stand out.

Approximates quantized loss with continuous surface
Uses stochastic noise smoothing for optimization
Preserves global minima of original quantized loss
πŸ”Ž Similar Papers
No similar papers found.