🤖 AI Summary
This work addresses the challenge in practical regression tasks where aleatoric and epistemic uncertainties are difficult to disentangle, and mean prediction accuracy remains suboptimal. We propose a novel co-training framework integrating Bayesian neural networks (BNNs) with a mean-variance estimation (MVE) network, enabling natural separation of the two uncertainty types without hand-crafted regularization, while jointly improving mean prediction accuracy. Our key contribution is the first end-to-end differentiable heteroscedastic Bayesian regression architecture—rigorous in theory yet lightweight in implementation. Evaluated on multiple standard regression benchmarks and a newly constructed time-varying heteroscedastic dataset, our method achieves significant improvements: expected calibration error (ECE) reduced by 32–58% and root-mean-square error (RMSE) decreased by 11–27%. The approach is architecture-agnostic, seamlessly compatible with mainstream neural network backbones, and demonstrates strong scalability.
📝 Abstract
Real-world data contains aleatoric uncertainty - irreducible noise arising from imperfect measurements or from incomplete knowledge about the data generation process. Mean variance estimation (MVE) networks can learn this type of uncertainty but require ad-hoc regularization strategies to avoid overfitting and are unable to predict epistemic uncertainty (model uncertainty). Conversely, Bayesian neural networks predict epistemic uncertainty but are notoriously difficult to train due to the approximate nature of Bayesian inference. We propose to cooperatively train a variance network with a Bayesian neural network and demonstrate that the resulting model disentangles aleatoric and epistemic uncertainties while improving the mean estimation. We demonstrate the effectiveness and scalability of this method across a diverse range of datasets, including a time-dependent heteroscedastic regression dataset we created where the aleatoric uncertainty is known. The proposed method is straightforward to implement, robust, and adaptable to various model architectures.