Accelerated Execution of Bayesian Neural Networks using a Single Probabilistic Forward Pass and Code Generation

📅 2025-11-28

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Bayesian neural networks (BNNs) suffer from high inference overhead on resource-constrained devices due to repeated weight sampling, hindering practical deployment. Method: This paper proposes a single-forward uncertainty estimation method based on Gaussian propagation, replacing the conventional stochastic variational inference (SVI) paradigm requiring multiple weight samples. We design a Gaussian propagation operator library supporting both MLPs and CNNs, and integrate it with the TVM compiler and automated tuning strategies for end-to-end efficient deployment. Contribution/Results: Evaluated on Dirty-MNIST, our approach matches standard BNNs in classification accuracy and out-of-distribution (OOD) detection performance, while accelerating batched inference by up to 4200× and substantially reducing computational cost. This enables feasible deployment of BNNs in safety-critical embedded systems.

Technology Category

Application Category

📝 Abstract

Machine learning models perform well across domains such as diagnostics, weather forecasting, NLP, and autonomous driving, but their limited uncertainty handling restricts use in safety-critical settings. Traditional neural networks often fail to detect out-of-domain (OOD) data and may output confident yet incorrect predictions. Bayesian neural networks (BNNs) address this by providing probabilistic estimates, but incur high computational cost because predictions require sampling weight distributions and multiple forward passes. The Probabilistic Forward Pass (PFP) offers a highly efficient approximation to Stochastic Variational Inference (SVI) by assuming Gaussian-distributed weights and activations, enabling fully analytic uncertainty propagation and replacing sampling with a single deterministic forward pass. We present an end-to-end pipeline for training, compiling, optimizing, and deploying PFP-based BNNs on embedded ARM CPUs. Using the TVM deep learning compiler, we implement a dedicated library of Gaussian-propagating operators for multilayer perceptrons and convolutional neural networks, combined with manual and automated tuning strategies. Ablation studies show that PFP consistently outperforms SVI in computational efficiency, achieving speedups of up to 4200x for small mini-batches. PFP-BNNs match SVI-BNNs on Dirty-MNIST in accuracy, uncertainty estimation, and OOD detection while greatly reducing compute cost. These results highlight the potential of combining Bayesian approximations with code generation to enable efficient BNN deployment on resource-constrained systems.

Problem

Research questions and friction points this paper is trying to address.

Accelerating Bayesian neural networks through single-pass probabilistic inference

Reducing computational costs of uncertainty estimation in neural networks

Enabling efficient BNN deployment on resource-constrained embedded systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Single deterministic forward pass replaces sampling

Gaussian-propagating operators enable analytic uncertainty propagation

Code generation optimizes deployment on embedded ARM CPUs

🔎 Similar Papers

Amortized Bayesian Multilevel Models