Domain-Shift-Aware Conformal Prediction for Large Language Models

📅 2025-10-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) suffer from hallucination and miscalibrated uncertainty under distributional shift, leading to insufficient coverage and unreliable prediction sets. To address this, we propose Domain-Aware Prediction Sets with Calibration (DAPC), the first conformal prediction framework explicitly incorporating distribution shift awareness. DAPC dynamically reweights calibration examples based on a context-aware distance metric that quantifies similarity between test prompts and the calibration set, enabling adaptive uncertainty calibration of LLM outputs. We theoretically establish its finite-sample coverage guarantee under distributional shift. Extensive experiments on benchmarks including MMLU demonstrate that DAPC significantly improves coverage accuracy over standard conformal methods under substantial domain shifts, while maintaining computational efficiency. Our approach thus enhances both the reliability and practical applicability of uncertainty quantification for LLMs.

Technology Category

Application Category

📝 Abstract
Large language models have achieved impressive performance across diverse tasks. However, their tendency to produce overconfident and factually incorrect outputs, known as hallucinations, poses risks in real world applications. Conformal prediction provides finite-sample, distribution-free coverage guarantees, but standard conformal prediction breaks down under domain shift, often leading to under-coverage and unreliable prediction sets. We propose a new framework called Domain-Shift-Aware Conformal Prediction (DS-CP). Our framework adapts conformal prediction to large language models under domain shift, by systematically reweighting calibration samples based on their proximity to the test prompt, thereby preserving validity while enhancing adaptivity. Our theoretical analysis and experiments on the MMLU benchmark demonstrate that the proposed method delivers more reliable coverage than standard conformal prediction, especially under substantial distribution shifts, while maintaining efficiency. This provides a practical step toward trustworthy uncertainty quantification for large language models in real-world deployment.
Problem

Research questions and friction points this paper is trying to address.

Addressing unreliable LLM outputs under domain shift conditions
Improving conformal prediction coverage during distribution shifts
Providing trustworthy uncertainty quantification for real-world LLM deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reweights calibration samples by test prompt proximity
Adapts conformal prediction for domain shift scenarios
Maintains coverage guarantees while enhancing model adaptivity
🔎 Similar Papers
No similar papers found.
Zhexiao Lin
Zhexiao Lin
University of California, Berkeley
StatisticsCausal InferenceEconometricsDeep Learning
Y
Yuanyuan Li
Munich RE
Neeraj Sarna
Neeraj Sarna
Munich RE
data-driven methodsmodel-order reductionscientific computing
Y
Yuanyuan Gao
Department of Statistics, University of California, Berkeley
M
Michael von Gablenz
Munich RE