A probabilistic framework for dynamic quantization

📅 2025-05-15

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

Addressing the challenge of balancing input adaptivity and memory overhead in neural network dynamic quantization, this paper proposes a sample-wise dynamic rescaling method based on probabilistic modeling. A lightweight proxy network models the pre-activation distribution to estimate optimal quantization parameters in real time, eliminating the need for explicit storage or repeated statistical computation. This work is the first to integrate probabilistic modeling into dynamic quantization frameworks, enabling input-adaptive quantization scheduling without additional GPU memory consumption. Evaluated on mainstream vision models—including ResNet and ViT—and tasks such as ImageNet classification and COCO object detection, our method incurs negligible accuracy degradation (<0.3% Top-1 drop) while significantly reducing computational overhead compared to existing dynamic quantization approaches. It achieves superior accuracy–efficiency trade-offs relative to both post-training quantization (PTQ) and quantization-aware training (QAT) baselines.

Technology Category

Application Category

📝 Abstract

We propose a probabilistic framework for dynamic quantization of neural networks that allows for a computationally efficient input-adaptive rescaling of the quantization parameters. Our framework applies a probabilistic model to the network's pre-activations through a lightweight surrogate, enabling the adaptive adjustment of the quantization parameters on a per-input basis without significant memory overhead. We validate our approach on a set of popular computer vision tasks and models, observing only a negligible loss in performance. Our method strikes the best performance and computational overhead tradeoff compared to standard quantization strategies.

Problem

Research questions and friction points this paper is trying to address.

Dynamic quantization of neural networks

Input-adaptive rescaling of quantization parameters

Balancing performance and computational overhead

Innovation

Methods, ideas, or system contributions that make the work stand out.

Probabilistic framework for dynamic quantization

Lightweight surrogate enables adaptive adjustment

Best performance and computational overhead tradeoff

🔎 Similar Papers

No similar papers found.

Qualcomm

$140,800.00 - $211,200.00

San Diego, California, United States of America

Research Engineer, Monetization AI