π€ AI Summary
This work identifies a fundamental tension between alignment fine-tuning and calibration of large language models (LLMs): while alignment improves instruction-following capability, it systematically degrades confidence calibration. Method: We develop a multidimensional evaluation framework, conducting controlled cross-model and cross-task comparisons across model architectures, task types, calibration metrics, and confidence estimation methods (logits, softmax, token-prob). Contribution/Results: Under rigorously controlled conditions, we provide the first empirical evidence of this trade-offβalignment fine-tuning increases Expected Calibration Error (ECE) by 37β62%. These findings challenge the prevailing assumption that alignment and calibration are mutually compatible, call for a paradigm shift in confidence evaluation, and establish critical theoretical and empirical foundations for designing new algorithms that jointly optimize instruction following and calibration.
π Abstract
Large Language Models (LLMs) have shown remarkable progress, but their real-world application necessitates reliable calibration. This study conducts a comprehensive analysis of calibration degradation of LLMs across four dimensions: models, calibration metrics, tasks, and confidence extraction methods. Initial analysis showed that the relationship between alignment and calibration is not always a trade-off, but under stricter analysis conditions, we found the alignment process consistently harms calibration. This highlights the need for (1) a careful approach when measuring model confidences and calibration errors and (2) future research into algorithms that can help LLMs to achieve both instruction-following and calibration without sacrificing either.