Confidence-Based Response Abstinence: Improving LLM Trustworthiness via Activation-Based Uncertainty Estimation

📅 2025-10-15

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

In retrieval-augmented generation (RAG) systems, unreliable confidence estimation from large language models (LLMs) hinders high-stakes decision-making in domains such as finance and healthcare. To address this, we propose a lightweight uncertainty modeling method grounded in feed-forward network (FFN) activation values: specifically, we directly leverage raw FFN activations from layer 16 of Llama 3.1-8B as autoregressive confidence signals—bypassing softmax to preserve information fidelity. Confidence prediction is formulated as a sequence classification task, optimized with Huber loss to enhance robustness against noisy human annotations. Evaluated on a real-world financial customer service benchmark under stringent latency constraints, our approach significantly outperforms strong baselines. Crucially, it achieves high accuracy and low inference latency using only a single-layer activation, eliminating the need for auxiliary modules or fine-tuning. This work establishes a deployable, architecture-aware confidence estimation paradigm for trustworthy RAG.

Technology Category

Application Category

📝 Abstract

We propose a method for confidence estimation in retrieval-augmented generation (RAG) systems that aligns closely with the correctness of large language model (LLM) outputs. Confidence estimation is especially critical in high-stakes domains such as finance and healthcare, where the cost of an incorrect answer outweighs that of not answering the question. Our approach extends prior uncertainty quantification methods by leveraging raw feed-forward network (FFN) activations as auto-regressive signals, avoiding the information loss inherent in token logits and probabilities after projection and softmax normalization. We model confidence prediction as a sequence classification task, and regularize training with a Huber loss term to improve robustness against noisy supervision. Applied in a real-world financial industry customer-support setting with complex knowledge bases, our method outperforms strong baselines and maintains high accuracy under strict latency constraints. Experiments on Llama 3.1 8B model show that using activations from only the 16th layer preserves accuracy while reducing response latency. Our results demonstrate that activation-based confidence modeling offers a scalable, architecture-aware path toward trustworthy RAG deployment.

Problem

Research questions and friction points this paper is trying to address.

Improving LLM trustworthiness via activation-based uncertainty estimation

Enhancing confidence estimation in retrieval-augmented generation systems

Reducing incorrect answers in high-stakes domains like finance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses raw FFN activations for uncertainty estimation

Models confidence prediction as sequence classification

Employs Huber loss for robust training regularization

🔎 Similar Papers

No similar papers found.