Beyond Discreteness: Finite-Sample Analysis of Straight-Through Estimator for Quantization

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

Straight-through estimators (STE) are widely used in quantized training to approximate gradients through non-differentiable discrete operations, yet their finite-sample theoretical guarantees have remained elusive. Method: Leveraging tools from compressed sensing and dynamical systems theory, we develop a finite-sample convergence analysis for STE in quantized neural network training. Contribution/Results: We derive an explicit upper bound on the sample complexity required for global convergence—scaling with data dimension—and rigorously prove that STE converges to the global optimum under finite samples. Moreover, we uncover a novel oscillatory behavior in the presence of label noise: STE iterates periodically escape from and return to the optimal solution. Our results establish sample size as a critical determinant of STE’s success and provide the first theoretically grounded framework for analyzing gradient approximation in quantized training.

Technology Category

Application Category

📝 Abstract

Training quantized neural networks requires addressing the non-differentiable and discrete nature of the underlying optimization problem. To tackle this challenge, the straight-through estimator (STE) has become the most widely adopted heuristic, allowing backpropagation through discrete operations by introducing surrogate gradients. However, its theoretical properties remain largely unexplored, with few existing works simplifying the analysis by assuming an infinite amount of training data. In contrast, this work presents the first finite-sample analysis of STE in the context of neural network quantization. Our theoretical results highlight the critical role of sample size in the success of STE, a key insight absent from existing studies. Specifically, by analyzing the quantization-aware training of a two-layer neural network with binary weights and activations, we derive the sample complexity bound in terms of the data dimensionality that guarantees the convergence of STE-based optimization to the global minimum. Moreover, in the presence of label noises, we uncover an intriguing recurrence property of STE-gradient method, where the iterate repeatedly escape from and return to the optimal binary weights. Our analysis leverages tools from compressed sensing and dynamical systems theory.

Problem

Research questions and friction points this paper is trying to address.

Analyzes finite-sample performance of STE in quantization

Derives sample complexity bound for STE convergence

Investigates STE behavior with label noise

Innovation

Methods, ideas, or system contributions that make the work stand out.

Finite-sample analysis of STE

Sample complexity bound derivation

Dynamical systems theory application

🔎 Similar Papers

No similar papers found.