🤖 AI Summary
This work addresses the limitations of entropy-based uncertainty estimation in selective prediction with large language models, which often fails to reliably trigger rejection mechanisms under low target error rates due to model-dependent biases. To enhance prediction safety in high-stakes scenarios, the authors propose a novel uncertainty assessment method that integrates entropy with correctness-probe signals. Extensive experiments across three question-answering benchmarks—TriviaQA, BioASQ, and MedicalQA—and four model families demonstrate that the proposed approach consistently outperforms entropy-only baselines in both risk–coverage trade-offs and calibration performance, effectively mitigating the shortcomings of conventional entropy-based measures.
📝 Abstract
Selective prediction systems can mitigate harms resulting from language model hallucinations by abstaining from answering in high-risk cases. Uncertainty quantification techniques are often employed to identify such cases, but are rarely evaluated in the context of the wider selective prediction policy and its ability to operate at low target error rates. We identify a model-dependent failure mode of entropy-based uncertainty methods that leads to unreliable abstention behaviour, and address it by combining entropy scores with a correctness probe signal. We find that across three QA benchmarks (TriviaQA, BioASQ, MedicalQA) and four model families, the combined score generally improves both the risk--coverage trade-off and calibration performance relative to entropy-only baselines. Our results highlight the importance of deployment-facing evaluation of uncertainty methods, using metrics that directly reflect whether a system can be trusted to operate at a stated risk level.