🤖 AI Summary
Deep neural networks often yield miscalibrated output probabilities, undermining their reliability despite strong predictive performance. To address this, we propose the *h-calibration* framework—a novel, differentiable, and theoretically grounded calibration paradigm with bounded approximation error. Our method formalizes probabilistic learning and ideal calibration, integrates posterior recalibration algorithm design, and establishes a rigorous convergence relationship between statistical estimator consistency and theoretical error bounds. Unlike conventional approaches based on proper scoring rules, h-calibration overcomes ten fundamental limitations of existing methods—including non-differentiability, lack of theoretical guarantees, and poor generalization across architectures and datasets. Theoretical analysis demonstrates its superiority in terms of both calibration fidelity and optimization tractability. Extensive experiments on standard post-hoc calibration benchmarks achieve state-of-the-art performance, validating its effectiveness, generalizability, and robustness across diverse models and datasets.
📝 Abstract
Deep neural networks have demonstrated remarkable performance across numerous learning tasks but often suffer from miscalibration, resulting in unreliable probability outputs. This has inspired many recent works on mitigating miscalibration, particularly through post-hoc recalibration methods that aim to obtain calibrated probabilities without sacrificing the classification performance of pre-trained models. In this study, we summarize and categorize previous works into three general strategies: intuitively designed methods, binning-based methods, and methods based on formulations of ideal calibration. Through theoretical and practical analysis, we highlight ten common limitations in previous approaches. To address these limitations, we propose a probabilistic learning framework for calibration called h-calibration, which theoretically constructs an equivalent learning formulation for canonical calibration with boundedness. On this basis, we design a simple yet effective post-hoc calibration algorithm. Our method not only overcomes the ten identified limitations but also achieves markedly better performance than traditional methods, as validated by extensive experiments. We further analyze, both theoretically and experimentally, the relationship and advantages of our learning objective compared to traditional proper scoring rule. In summary, our probabilistic framework derives an approximately equivalent differentiable objective for learning error-bounded calibrated probabilities, elucidating the correspondence and convergence properties of computational statistics with respect to theoretical bounds in canonical calibration. The theoretical effectiveness is verified on standard post-hoc calibration benchmarks by achieving state-of-the-art performance. This research offers valuable reference for learning reliable likelihood in related fields.