CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration?

📅 2025-10-28

📈 Citations: 0

✨ Influential: 0

career value

139K/year

🤖 AI Summary

This paper addresses the problem of miscalibrated confidence estimation by large language models (LLMs) in high-stakes scenarios. To this end, we propose a novel natural-language-critique-driven calibration paradigm that operates without gold-standard labels. Methodologically, we introduce a Self-Critique mechanism to enable self-feedback during reasoning and design the CritiCal training framework, which translates interpretable, open-ended natural language critiques into explicit confidence calibration signals. Our key contribution is the first use of human-interpretable, free-form linguistic critique—rather than numeric labels or implicit loss functions—for direct confidence modeling. Experiments demonstrate that CritiCal significantly outperforms state-of-the-art baselines on multi-step reasoning and open-ended generation tasks, even surpassing GPT-4o in calibration accuracy. Moreover, it exhibits strong out-of-distribution generalization, offering a promising new pathway toward trustworthy AI.

Technology Category

Application Category

📝 Abstract

Accurate confidence calibration in Large Language Models (LLMs) is critical for safe use in high-stakes domains, where clear verbalized confidence enhances user trust. Traditional methods that mimic reference confidence expressions often fail to capture the reasoning needed for accurate confidence assessment. We propose natural language critiques as a solution, ideally suited for confidence calibration, as precise gold confidence labels are hard to obtain and often require multiple generations. This paper studies how natural language critiques can enhance verbalized confidence, addressing: (1) What to critique: uncertainty (question-focused) or confidence (answer-specific)? Analysis shows confidence suits multiple-choice tasks, while uncertainty excels in open-ended scenarios. (2) How to critique: self-critique or critique calibration training? We propose Self-Critique, enabling LLMs to critique and optimize their confidence beyond mere accuracy, and CritiCal, a novel Critique Calibration training method that leverages natural language critiques to improve confidence calibration, moving beyond direct numerical optimization. Experiments show that CritiCal significantly outperforms Self-Critique and other competitive baselines, even surpassing its teacher model, GPT-4o, in complex reasoning tasks. CritiCal also shows robust generalization in out-of-distribution settings, advancing LLM's reliability.

Problem

Research questions and friction points this paper is trying to address.

Enhancing LLM confidence calibration through natural language critiques

Comparing uncertainty versus answer-specific critique approaches

Developing critique methods beyond traditional numerical optimization techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses natural language critiques for confidence calibration

Proposes Self-Critique method for LLM self-evaluation

Introduces CritiCal training with critique-based optimization

🔎 Similar Papers

Does Alignment Tuning Really Break LLMs' Internal Confidence?