Feature Preserving Shrinkage on Bayesian Neural Networks via the R2D2 Prior.

📅 2025-05-23

🏛️ IEEE Transactions on Pattern Analysis and Machine Intelligence

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

In Bayesian neural networks (BNNs), selecting weight priors faces a fundamental trade-off: overly strong shrinkage degrades discriminative feature representation, while insufficient shrinkage fails to suppress noise. To address this, we propose the R²-induced Dirichlet Decomposition (R2D2) prior—the first to integrate the R² statistic into the Dirichlet decomposition framework—enabling dynamic balancing between sparsity and preservation of task-critical features. We design a variational Gibbs inference algorithm that ensures convergence stability and posterior consistency under non-convex shrinkage objectives. Theoretically, we characterize both the evidence lower bound (ELBO) and the posterior contraction rate. Experiments across natural and medical image classification, as well as uncertainty estimation tasks, demonstrate that R2D2 significantly improves predictive accuracy, calibration, and robustness, while effectively distinguishing noise-corrupted weights from discriminative signal components.

Technology Category

Application Category

📝 Abstract

Bayesian neural networks (BNNs) treat neural network weights as random variables, which aim to provide posterior uncertainty estimates and avoid overfitting by performing inference on the posterior weights. However, selection of appropriate prior distributions remains a challenging task, and BNNs may suffer from catastrophic inflated variance or poor predictive performance when poor choices are made for the priors. Existing BNN designs apply different priors to weights, while the behaviours of these priors make it difficult to sufficiently shrink noisy signals or they are prone to overshrinking important signals in the weights. To alleviate this problem, we propose a novel R2D2-Net, which imposes the $R^{2}$-induced Dirichlet Decomposition (R2D2) prior to the BNN weights. The R2D2-Net can effectively shrink irrelevant coefficients towards zero, while preventing key features from over-shrinkage. To approximate the posterior distribution of weights more accurately, we further propose a variational Gibbs inference algorithm that combines the Gibbs updating procedure and gradient-based optimization. This strategy enhances stability and consistency in estimation when the variational objective involving the shrinkage parameters is non-convex. We also analyze the evidence lower bound (ELBO) and the posterior concentration rates from a theoretical perspective. Experiments on both natural and medical image classification and uncertainty estimation tasks demonstrate satisfactory performances of our method.

Problem

Research questions and friction points this paper is trying to address.

Selecting appropriate priors for Bayesian neural networks

Avoiding overshrinking or insufficient shrinkage of weight signals

Accurately approximating posterior distribution of BNN weights

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses R2D2 prior for Bayesian neural networks

Applies variational Gibbs inference algorithm

Shrinks irrelevant coefficients, preserves key features

🔎 Similar Papers

Bayesian Low-Rank LeArning (Bella): A Practical Approach to Bayesian Neural Networks

2024-07-30Citations: 1

Bosch Group

Renningen, BW, DE

Master Thesis AI-Based Keypoint Refinement for Autonomous Driving

Bosch Group

Hildesheim, NDS, DE

Authors to Follow