Beyond Surface Statistics: Robust Conformal Prediction for LLMs via Internal Representations

๐Ÿ“… 2026-04-17
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

196K/year
๐Ÿค– AI Summary
This work addresses the unreliability of output-level uncertainty signalsโ€”such as token probabilities and entropyโ€”in large language models under distributional shift, which undermines prediction trustworthiness. The authors propose a conformal prediction framework grounded in internal model representations, introducing for the first time an inter-layer information (LI) score as a measure of nonconformity. This score quantifies the influence of an input on predictive entropy across different layers and is integrated into a split conformal prediction pipeline. Evaluated on both closed-set and open-domain question answering tasks, the method substantially outperforms existing text-level baselines, particularly in cross-domain settings. It achieves a superior trade-off between validity and efficiency while maintaining reliable coverage within the source domain.

Technology Category

Application Category

๐Ÿ“ Abstract
Large language models are increasingly deployed in settings where reliability matters, yet output-level uncertainty signals such as token probabilities, entropy, and self-consistency can become brittle under calibration--deployment mismatch. Conformal prediction provides finite-sample validity under exchangeability, but its practical usefulness depends on the quality of the nonconformity score. We propose a conformal framework for LLM question answering that uses internal representations rather than output-facing statistics: specifically, we introduce Layer-Wise Information (LI) scores, which measure how conditioning on the input reshapes predictive entropy across model depth, and use them as nonconformity scores within a standard split conformal pipeline. Across closed-ended and open-domain QA benchmarks, with the clearest gains under cross-domain shift, our method achieves a better validity--efficiency trade-off than strong text-level baselines while maintaining competitive in-domain reliability at the same nominal risk level. These results suggest that internal representations can provide more informative conformal scores when surface-level uncertainty is unstable under distribution shift.
Problem

Research questions and friction points this paper is trying to address.

conformal prediction
large language models
distribution shift
uncertainty estimation
internal representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Conformal Prediction
Internal Representations
Layer-Wise Information
Distribution Shift
Uncertainty Quantification
๐Ÿ”Ž Similar Papers
2024-10-03International Conference on Learning RepresentationsCitations: 28