A probabilistic framework for dynamic quantization

📅 2025-05-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenge of balancing input adaptivity and memory overhead in neural network dynamic quantization, this paper proposes a sample-wise dynamic rescaling method based on probabilistic modeling. A lightweight proxy network models the pre-activation distribution to estimate optimal quantization parameters in real time, eliminating the need for explicit storage or repeated statistical computation. This work is the first to integrate probabilistic modeling into dynamic quantization frameworks, enabling input-adaptive quantization scheduling without additional GPU memory consumption. Evaluated on mainstream vision models—including ResNet and ViT—and tasks such as ImageNet classification and COCO object detection, our method incurs negligible accuracy degradation (<0.3% Top-1 drop) while significantly reducing computational overhead compared to existing dynamic quantization approaches. It achieves superior accuracy–efficiency trade-offs relative to both post-training quantization (PTQ) and quantization-aware training (QAT) baselines.

Technology Category

Application Category

📝 Abstract
We propose a probabilistic framework for dynamic quantization of neural networks that allows for a computationally efficient input-adaptive rescaling of the quantization parameters. Our framework applies a probabilistic model to the network's pre-activations through a lightweight surrogate, enabling the adaptive adjustment of the quantization parameters on a per-input basis without significant memory overhead. We validate our approach on a set of popular computer vision tasks and models, observing only a negligible loss in performance. Our method strikes the best performance and computational overhead tradeoff compared to standard quantization strategies.
Problem

Research questions and friction points this paper is trying to address.

Dynamic quantization of neural networks
Input-adaptive rescaling of quantization parameters
Balancing performance and computational overhead
Innovation

Methods, ideas, or system contributions that make the work stand out.

Probabilistic framework for dynamic quantization
Lightweight surrogate enables adaptive adjustment
Best performance and computational overhead tradeoff
🔎 Similar Papers
No similar papers found.