🤖 AI Summary
This paper addresses the disconnect between local accuracy and global trustworthiness in probabilistic model inference. To bridge this gap, we propose the I-trustworthy framework—the first to formally integrate capability-based trust theory into local calibration assessment. Our core innovations include: (1) defining Local Calibration Error (LCE); (2) constructing a kernelized LCE (KLCE) test statistic with provable convergence bounds for its unbiased estimator; and (3) designing interpretable calibration bias diagnostics and visualization tools. Extensive experiments on synthetic and real-world datasets demonstrate that KLCE effectively identifies untrustworthy predictions, whereas mainstream recalibration methods—including temperature scaling and isotonic regression—fail to satisfy the I-trustworthy criterion. Theoretically, our framework provides rigorous statistical guarantees; practically, it delivers an open-source evaluation toolkit. This work establishes a novel paradigm for trustworthy AI, unifying formal calibration theory with actionable reliability assessment.
📝 Abstract
As probabilistic models continue to permeate various facets of our society and contribute to scientific advancements, it becomes a necessity to go beyond traditional metrics such as predictive accuracy and error rates and assess their trustworthiness. Grounded in the competence-based theory of trust, this work formalizes I-trustworthy framework -- a novel framework for assessing the trustworthiness of probabilistic classifiers for inference tasks by linking local calibration to trustworthiness. To assess I-trustworthiness, we use the local calibration error (LCE) and develop a method of hypothesis-testing. This method utilizes a kernel-based test statistic, Kernel Local Calibration Error (KLCE), to test local calibration of a probabilistic classifier. This study provides theoretical guarantees by offering convergence bounds for an unbiased estimator of KLCE. Additionally, we present a diagnostic tool designed to identify and measure biases in cases of miscalibration. The effectiveness of the proposed test statistic is demonstrated through its application to both simulated and real-world datasets. Finally, LCE of related recalibration methods is studied, and we provide evidence of insufficiency of existing methods to achieve I-trustworthiness.