Adaptive Conformal Prediction for Improving Factuality of Generations by Large Language Models

📅 2026-04-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

160K/year
🤖 AI Summary
Large language models often generate factually incorrect content, and existing conformal prediction methods struggle to adapt to varying input prompts, resulting in either under- or over-coverage. This work proposes an adaptive conformal prediction approach that extends the conformal score transformation mechanism to enable prompt-dependent dynamic calibration. The method significantly improves conditional coverage while maintaining marginal coverage guarantees and supports selective prediction to filter out unreliable outputs. By introducing adaptive conformal prediction into factual verification for large language models, this study bridges theoretical rigor with practical performance. Evaluated across multiple white-box models and diverse tasks, the proposed method substantially outperforms current baselines and effectively identifies unreliable generations.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) are prone to generating factually incorrect outputs. Recent work has applied conformal prediction to provide uncertainty estimates and statistical guarantees for the factuality of LLM generations. However, existing approaches are typically not prompt-adaptive, limiting their ability to capture input-dependent variability. As a result, they may filter out too few items (leading to over-coverage) or too many (under-coverage) for a given task or prompt. We propose an adaptive conformal prediction approach that extends conformal score transformation methods to LLMs, with applications to long-form generation and multiple-choice question answering. This enables prompt-dependent calibration, retaining marginal coverage guarantees while improving conditional coverage. In addition, the approach naturally supports selective prediction, allowing unreliable claims or answer choices to be filtered out in downstream applications. We evaluate our approach on multiple white-box models across diverse domains and show that it significantly outperforms existing baselines in terms of conditional coverage.
Problem

Research questions and friction points this paper is trying to address.

factuality
large language models
conformal prediction
prompt-adaptive
coverage
Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive conformal prediction
conditional coverage
large language models
selective prediction
factuality calibration