Explaining Bayesian Neural Networks

📅 2021-08-23
🏛️ Trans. Mach. Learn. Res.
📈 Citations: 28
Influential: 0
📄 PDF
🤖 AI Summary
While Bayesian neural networks (BNNs) offer transparent weight priors, they lack instance-level predictive interpretability. Method: We propose the first local attribution method tailored to the Bayesian framework—modeling an explanation distribution via posterior weight sampling. Specifically, we approximate the posterior using variational inference, sample weights, and apply gradient-based attribution (e.g., Integrated Gradients) to generate a collection of attribution maps; these are then aggregated to quantify attribution uncertainty and stability. Contribution/Results: This transforms deterministic, point-estimate attributions into probabilistic ones, explicitly characterizing explanation diversity and robustness. Experiments on synthetic benchmarks, standard image datasets (e.g., CIFAR-10, ImageNet), and real-world histopathological data demonstrate that our approach significantly improves explanation credibility and discriminative power, enabling uncertainty-aware, interpretable AI decision-making.
📝 Abstract
To advance the transparency of learning machines such as Deep Neural Networks (DNNs), the field of Explainable AI (XAI) was established to provide interpretations of DNNs'predictions. While different explanation techniques exist, a popular approach is given in the form of attribution maps, which illustrate, given a particular data point, the relevant patterns the model has used for making its prediction. Although Bayesian models such as Bayesian Neural Networks (BNNs) have a limited form of transparency built-in through their prior weight distribution, they lack explanations of their predictions for given instances. In this work, we take a step toward combining these two perspectives by examining how local attributions can be extended to BNNs. Within the Bayesian framework, network weights follow a probability distribution; hence, the standard point explanation extends naturally to an explanation distribution. Viewing explanations probabilistically, we aggregate and analyze multiple local attributions drawn from an approximate posterior to explore variability in explanation patterns. The diversity of explanations offers a way to further explore how predictive rationales may vary across posterior samples. Quantitative and qualitative experiments on toy and benchmark data, as well as on a real-world pathology dataset, illustrate that our framework enriches standard explanations with uncertainty information and may support the visualization of explanation stability.
Problem

Research questions and friction points this paper is trying to address.

Extending local attribution methods to Bayesian Neural Networks
Analyzing explanation variability through posterior sampling distributions
Enriching standard explanations with uncertainty and stability visualization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends local attributions to Bayesian Neural Networks
Models explanations as probability distributions from posteriors
Enriches explanations with uncertainty and stability visualization
🔎 Similar Papers
No similar papers found.