🤖 AI Summary
In high-reliability engineering applications, AI models face a fundamental trade-off between in-distribution (ID) prediction calibration and out-of-distribution (OOD) input detection. Method: This paper proposes a selective calibration framework grounded in variational Bayesian learning, jointly incorporating temperature scaling regularization, an OOD confidence minimization penalty, and an adaptive input rejection mechanism. Contribution/Results: To our knowledge, this is the first approach to jointly optimize ID calibration accuracy and OOD detection capability—overcoming the inherent performance compromise of conventional Bayesian ensembles. Evaluated on multiple benchmark datasets, our method achieves state-of-the-art performance, attaining superior ID calibration error and OOD detection AUC scores. It incurs only a controlled input rejection rate, thereby establishing a novel paradigm for deploying trustworthy AI in safety-critical engineering systems.
📝 Abstract
The application of artificial intelligence (AI) models in fields such as engineering is limited by the known difficulty of quantifying the reliability of an AI's decision. A well-calibrated AI model must correctly report its accuracy on in-distribution (ID) inputs, while also enabling the detection of out-of-distribution (OOD) inputs. A conventional approach to improve calibration is the application of Bayesian ensembling. However, owing to computational limitations and model misspecification, practical ensembling strategies do not necessarily enhance calibration. This paper proposes an extension of variational inference (VI)-based Bayesian learning that integrates calibration regularization for improved ID performance, confidence minimization for OOD detection, and selective calibration to ensure a synergistic use of calibration regularization and confidence minimization. The scheme is constructed successively by first introducing calibration-regularized Bayesian learning (CBNN), then incorporating out-of-distribution confidence minimization (OCM) to yield CBNN-OCM, and finally integrating also selective calibration to produce selective CBNN-OCM (SCBNN-OCM). Selective calibration rejects inputs for which the calibration performance is expected to be insufficient. Numerical results illustrate the trade-offs between ID accuracy, ID calibration, and OOD calibration attained by both frequentist and Bayesian learning methods. Among the main conclusions, SCBNN-OCM is seen to achieve best ID and OOD performance as compared to existing state-of-the-art approaches at the cost of rejecting a sufficiently large number of inputs.