🤖 AI Summary
Addressing the longstanding trade-off between computational efficiency and estimation reliability in uncertainty quantification (UQ) for deep neural networks, this paper proposes a post-hoc, sampling-based Bayesian UQ method. Our approach performs stochastic gradient descent (SGD) sampling on an empirically linearized network—using the empirical neural tangent kernel (NTK)—to construct an efficient deep ensemble. To our knowledge, this is the first work to integrate empirical NTK linearization with SGD sampling, enabling high-fidelity Gaussian process posterior approximation. The method achieves both low inference overhead and high UQ reliability, breaking the conventional dichotomy between “cheap but unreliable” and “reliable but expensive” approaches. On standard regression and classification benchmarks, it attains state-of-the-art performance across key UQ metrics—including negative log-likelihood (NLL), expected calibration error (ECE), and area under the ROC curve (AUROC)—while reducing inference cost by 3–5× compared to existing Bayesian UQ methods.
📝 Abstract
While neural networks have demonstrated impressive performance across various tasks, accurately quantifying uncertainty in their predictions is essential to ensure their trustworthiness and enable widespread adoption in critical systems. Several Bayesian uncertainty quantification (UQ) methods exist that are either cheap or reliable, but not both. We propose a post-hoc, sampling-based UQ method for over-parameterized networks at the end of training. Our approach constructs efficient and meaningful deep ensembles by employing a (stochastic) gradient-descent sampling process on appropriately linearized networks. We demonstrate that our method effectively approximates the posterior of a Gaussian process using the empirical Neural Tangent Kernel. Through a series of numerical experiments, we show that our method not only outperforms competing approaches in computational efficiency (often reducing costs by multiple factors) but also maintains state-of-the-art performance across a variety of UQ metrics for both regression and classification tasks.