Domain-Shift-Aware Conformal Prediction for Large Language Models

📅 2025-10-07

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

Large language models (LLMs) suffer from hallucination and miscalibrated uncertainty under distributional shift, leading to insufficient coverage and unreliable prediction sets. To address this, we propose Domain-Aware Prediction Sets with Calibration (DAPC), the first conformal prediction framework explicitly incorporating distribution shift awareness. DAPC dynamically reweights calibration examples based on a context-aware distance metric that quantifies similarity between test prompts and the calibration set, enabling adaptive uncertainty calibration of LLM outputs. We theoretically establish its finite-sample coverage guarantee under distributional shift. Extensive experiments on benchmarks including MMLU demonstrate that DAPC significantly improves coverage accuracy over standard conformal methods under substantial domain shifts, while maintaining computational efficiency. Our approach thus enhances both the reliability and practical applicability of uncertainty quantification for LLMs.

Technology Category

Application Category

📝 Abstract

Large language models have achieved impressive performance across diverse tasks. However, their tendency to produce overconfident and factually incorrect outputs, known as hallucinations, poses risks in real world applications. Conformal prediction provides finite-sample, distribution-free coverage guarantees, but standard conformal prediction breaks down under domain shift, often leading to under-coverage and unreliable prediction sets. We propose a new framework called Domain-Shift-Aware Conformal Prediction (DS-CP). Our framework adapts conformal prediction to large language models under domain shift, by systematically reweighting calibration samples based on their proximity to the test prompt, thereby preserving validity while enhancing adaptivity. Our theoretical analysis and experiments on the MMLU benchmark demonstrate that the proposed method delivers more reliable coverage than standard conformal prediction, especially under substantial distribution shifts, while maintaining efficiency. This provides a practical step toward trustworthy uncertainty quantification for large language models in real-world deployment.

Problem

Research questions and friction points this paper is trying to address.

Addressing unreliable LLM outputs under domain shift conditions

Improving conformal prediction coverage during distribution shifts

Providing trustworthy uncertainty quantification for real-world LLM deployment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reweights calibration samples by test prompt proximity

Adapts conformal prediction for domain shift scenarios

Maintains coverage guarantees while enhancing model adaptivity

🔎 Similar Papers

How Useful is Continued Pre-Training for Generative Unsupervised Domain Adaptation?