🤖 AI Summary
To address the trade-off between accuracy degradation and computational overhead in post-training quantization, this paper proposes an efficient quantization method grounded in parameter sensitivity analysis. Our approach integrates column-wise sensitivity clustering with a row-parallel quantization framework, coupled with a globally shared inverse Hessian matrix update mechanism to enable error compensation and low-complexity optimization—without iterative parameter updates. This design significantly mitigates accuracy loss under high compression ratios. Experiments on ResNet-50 and YOLOv5s demonstrate that our method achieves 20–200× faster quantization than Optimal Brain Quantization (OBQ), with average accuracy loss below 0.3%. The core innovations lie in a sensitivity-aware compensation mechanism and a scalable parallel quantization architecture, effectively balancing efficiency and accuracy for edge deployment.
📝 Abstract
Model quantization reduces neural network parameter precision to achieve compression, but often compromises accuracy. Existing post-training quantization (PTQ) methods employ iterative parameter updates to preserve accuracy under high compression ratios, incurring significant computational complexity and resource overhead, which limits applicability in resource-constrained edge computing and real-time inference scenarios. This paper proposes an efficient PTQ method guided by parameter sensitivity analysis. The approach prioritizes quantization of high-sensitivity parameters, leveraging unquantized low-sensitivity parameters to compensate for quantization errors, thereby mitigating accuracy degradation. Furthermore, by exploiting column-wise clustering of parameter sensitivity, the method introduces a row-parallel quantization framework with a globally shared inverse Hessian matrix update mechanism, reducing computational complexity by an order of magnitude. Experimental results on ResNet-50 and YOLOv5s demonstrate a 20-200-fold quantization speedup over the Optimal Brain Quantization baseline, with mean accuracy loss below 0.3%, confirming the method's efficacy in balancing efficiency and accuracy.