A 65 nm Bayesian Neural Network Accelerator with 360 fJ/Sample In-Word GRNG for AI Uncertainty Estimation

📅 2025-01-08

📈 Citations: 0

✨ Influential: 0

career value

242K/year

🤖 AI Summary

Bayesian neural networks (BNNs) face significant hardware deployment bottlenecks on edge devices—including high computational overhead, frequent memory access, and poor compatibility with in-memory computing—hindering their adoption in safety-critical applications such as autonomous driving and medical diagnosis. To address these challenges, this work presents the first application-specific integrated circuit (ASIC) for in-memory BNN inference targeting uncertainty estimation. The chip innovatively integrates an on-die, energy-efficient Gaussian random number generator (RNG) within SRAM wordlines, achieving 360 fJ/sample and enabling fully parallel in-memory Bayesian sampling and inference. Fabricated in 65 nm CMOS, it delivers 5.12 GSa/s RNG throughput and 102 GOp/s computational capacity within a compact 0.45 mm² die area. This design establishes a new hardware acceleration paradigm for BNNs, significantly improving both energy efficiency and throughput for uncertainty-aware inference at the edge, thereby advancing trustworthy AI deployment on resource-constrained platforms.

Technology Category

Application Category

📝 Abstract

Uncertainty estimation is an indispensable capability for AI-enabled, safety-critical applications, e.g. autonomous vehicles or medical diagnosis. Bayesian neural networks (BNNs) use Bayesian statistics to provide both classification predictions and uncertainty estimation, but they suffer from high computational overhead associated with random number generation and repeated sample iterations. Furthermore, BNNs are not immediately amenable to acceleration through compute-in-memory architectures due to the frequent memory writes necessary after each RNG operation. To address these challenges, we present an ASIC that integrates 360 fJ/Sample Gaussian RNG directly into the SRAM memory words. This integration reduces RNG overhead and enables fully-parallel compute-in-memory operations for BNNs. The prototype chip achieves 5.12 GSa/s RNG throughput and 102 GOp/s neural network throughput while occupying 0.45 mm2, bringing AI uncertainty estimation to edge computation.

Problem

Research questions and friction points this paper is trying to address.

Bayesian Neural Networks

Computational Cost

Memory Architecture

Innovation

Methods, ideas, or system contributions that make the work stand out.

65nm Technology

Bayesian Neural Networks

Energy-efficient Computing

🔎 Similar Papers

No similar papers found.