Beyond Surface Statistics: Robust Conformal Prediction for LLMs via Internal Representations

📅 2026-04-17

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses the unreliability of output-level uncertainty signals—such as token probabilities and entropy—in large language models under distributional shift, which undermines prediction trustworthiness. The authors propose a conformal prediction framework grounded in internal model representations, introducing for the first time an inter-layer information (LI) score as a measure of nonconformity. This score quantifies the influence of an input on predictive entropy across different layers and is integrated into a split conformal prediction pipeline. Evaluated on both closed-set and open-domain question answering tasks, the method substantially outperforms existing text-level baselines, particularly in cross-domain settings. It achieves a superior trade-off between validity and efficiency while maintaining reliable coverage within the source domain.

Technology Category

Application Category

📝 Abstract

Large language models are increasingly deployed in settings where reliability matters, yet output-level uncertainty signals such as token probabilities, entropy, and self-consistency can become brittle under calibration--deployment mismatch. Conformal prediction provides finite-sample validity under exchangeability, but its practical usefulness depends on the quality of the nonconformity score. We propose a conformal framework for LLM question answering that uses internal representations rather than output-facing statistics: specifically, we introduce Layer-Wise Information (LI) scores, which measure how conditioning on the input reshapes predictive entropy across model depth, and use them as nonconformity scores within a standard split conformal pipeline. Across closed-ended and open-domain QA benchmarks, with the clearest gains under cross-domain shift, our method achieves a better validity--efficiency trade-off than strong text-level baselines while maintaining competitive in-domain reliability at the same nominal risk level. These results suggest that internal representations can provide more informative conformal scores when surface-level uncertainty is unstable under distribution shift.

Problem

Research questions and friction points this paper is trying to address.

conformal prediction

large language models

distribution shift

uncertainty estimation

internal representations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Conformal Prediction

Internal Representations

Layer-Wise Information