Quantifying Uncertainty in Natural Language Explanations of Large Language Models for Question Answering

📅 2025-09-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing approaches lack reliable methods for uncertainty quantification (UQ) of natural language explanations generated by large language models (LLMs), particularly in noise-sensitive domains such as medicine; the autoregressive nature of LLMs further complicates confidence estimation. Method: We propose the first posterior, model-agnostic UQ framework specifically designed for LLM-generated explanations. It estimates explanation credibility without modifying the base LLM by jointly analyzing generation paths and semantic consistency of the explanation text, augmented with a robust calibration mechanism. Contribution/Results: Experiments across diverse question-answering benchmarks demonstrate significant improvements in both uncertainty calibration and discriminative capability. Our method is the first to provide verifiable, explanation-level confidence guarantees for natural language outputs—enabling trustworthy interpretability in high-stakes applications.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have shown strong capabilities, enabling concise, context-aware answers in question answering (QA) tasks. The lack of transparency in complex LLMs has inspired extensive research aimed at developing methods to explain large language behaviors. Among existing explanation methods, natural language explanations stand out due to their ability to explain LLMs in a self-explanatory manner and enable the understanding of model behaviors even when the models are closed-source. However, despite these promising advancements, there is no existing work studying how to provide valid uncertainty guarantees for these generated natural language explanations. Such uncertainty quantification is critical in understanding the confidence behind these explanations. Notably, generating valid uncertainty estimates for natural language explanations is particularly challenging due to the auto-regressive generation process of LLMs and the presence of noise in medical inquiries. To bridge this gap, in this work, we first propose a novel uncertainty estimation framework for these generated natural language explanations, which provides valid uncertainty guarantees in a post-hoc and model-agnostic manner. Additionally, we also design a novel robust uncertainty estimation method that maintains valid uncertainty guarantees even under noise. Extensive experiments on QA tasks demonstrate the desired performance of our methods.
Problem

Research questions and friction points this paper is trying to address.

Quantifying uncertainty in natural language explanations
Providing valid uncertainty guarantees for LLM explanations
Maintaining uncertainty estimation under noisy medical inquiries
Innovation

Methods, ideas, or system contributions that make the work stand out.

Post-hoc model-agnostic uncertainty estimation framework
Robust uncertainty estimation under noisy conditions
Valid uncertainty guarantees for natural explanations
🔎 Similar Papers
No similar papers found.