🤖 AI Summary
It remains unclear whether the verbalized confidence of large language models (LLMs) genuinely reflects their risk perception and decision-making behavior. This work proposes RiskEval, the first systematic evaluation framework that introduces error penalties and an option to abstain, enabling assessment of whether LLMs can dynamically adjust their confidence expressions and abstention strategies in risk-sensitive scenarios. Experimental results demonstrate that prevailing LLMs fail to calibrate their stated confidence according to penalty costs and lack strategic awareness to proactively abstain in order to avoid high-risk losses, leading to substantially reduced practical utility. This study is the first to reveal a critical disconnect between LLMs’ expressed confidence and their actual decision behavior, establishing a new paradigm for evaluating trustworthy AI systems.
📝 Abstract
Large Language Models (LLMs) can produce surprisingly sophisticated estimates of their own uncertainty. However, it remains unclear to what extent this expressed confidence is tied to the reasoning, knowledge, or decision making of the model. To test this, we introduce $\textbf{RiskEval}$: a framework designed to evaluate whether models adjust their abstention policies in response to varying error penalties. Our evaluation of several frontier models reveals a critical dissociation: models are neither cost-aware when articulating their verbal confidence, nor strategically responsive when deciding whether to engage or abstain under high-penalty conditions. Even when extreme penalties render frequent abstention the mathematically optimal strategy, models almost never abstain, resulting in utility collapse. This indicates that calibrated verbal confidence scores may not be sufficient to create trustworthy and interpretable AI systems, as current models lack the strategic agency to convert uncertainty signals into optimal and risk-sensitive decisions.