🤖 AI Summary
To address the practical deployment challenges of Bayesian neural networks (BNNs) on edge devices—stemming from prohibitive computational and memory overhead—this paper proposes the first multi-level joint quantization framework tailored for stochastic variational inference (SVI)-based BNNs. We introduce three novel quantization strategies: variance-parameter-aware quantization (VPQ), structure-preserving activation quantization (SPQ), and joint low-bit quantization (JQ), along with an uncertainty-aware activation function. Our method enables accurate decoupling of epistemic and aleatoric uncertainty for the first time at 4-bit precision. Evaluated on the Dirty-MNIST benchmark, it achieves zero classification accuracy degradation, uncertainty estimation error below 3%, and an 8× reduction in memory footprint. This work significantly advances the deployment of lightweight “Bayesian machines” in resource-constrained environments.
📝 Abstract
Bayesian Neural Networks (BNNs) provide principled uncertainty quantification but suffer from substantial computational and memory overhead compared to deterministic networks. While quantization techniques have successfully reduced resource requirements in standard deep learning models, their application to probabilistic models remains largely unexplored. We introduce a systematic multi-level quantization framework for Stochastic Variational Inference based BNNs that distinguishes between three quantization strategies: Variational Parameter Quantization (VPQ), Sampled Parameter Quantization (SPQ), and Joint Quantization (JQ). Our logarithmic quantization for variance parameters, and specialized activation functions to preserve the distributional structure are essential for calibrated uncertainty estimation. Through comprehensive experiments on Dirty-MNIST, we demonstrate that BNNs can be quantized down to 4-bit precision while maintaining both classification accuracy and uncertainty disentanglement. At 4 bits, Joint Quantization achieves up to 8x memory reduction compared to floating-point implementations with minimal degradation in epistemic and aleatoric uncertainty estimation. These results enable deployment of BNNs on resource-constrained edge devices and provide design guidelines for future analog "Bayesian Machines" operating at inherently low precision.