🤖 AI Summary
Traditional calibration evaluation relies solely on global confidence, failing to capture input-dependent heterogeneity in local calibration. This work proposes the first framework for discovering calibration anomalies without requiring predefined data slices. By leveraging calibration-aware input representations and kernel smoothing techniques, the method estimates local signed calibration bias and automatically identifies hidden regions of calibration failure. Experiments across four real-world LLM benchmarks and twelve large language models demonstrate that the approach effectively uncovers pervasive input-specific calibration biases and substantially reduces systematic calibration error, thereby enabling targeted confidence correction.
📝 Abstract
Calibration is commonly evaluated by comparing model confidence with its empirical correctness, implicitly treating reliability as a function of the confidence score alone. However, this view can hide substantial structure: models may be systematically overconfident on some kinds of inputs and underconfident on others, causing global reliability diagnostics to obscure localised calibration failures. To address this, we formulate the problem of discovering hidden miscalibration regimes without assuming access to predefined data slices. We define the corresponding miscalibration field and propose a diagnostic framework for estimating it. Our approach learns a calibration-aware representation of the input space and estimates signed local miscalibration by kernel smoothing in the learned geometry. Across four real-world LLM benchmarks and twelve LLMs, we find that input-dependent calibration heterogeneity is prevalent. We further show that the discovered fields are actionable: they support local confidence correction and reduce calibration error in systematically miscalibrated regions where confidence-based methods such as isotonic regression and temperature scaling are less effective.