Conformal Prediction Beyond the Seen: A Missing Mass Perspective for Uncertainty Quantification in Generative Models

📅 2025-06-05

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Addressing the challenge of uncertainty quantification for black-box generative models—such as large language models (LLMs)—under limited query budgets, where prediction set coverage, query efficiency, and informativeness are difficult to jointly optimize, this paper proposes CPQ, a novel conformal prediction paradigm. CPQ introduces missing mass estimation into the conformal framework for the first time, establishing theoretical foundations for optimal querying strategies and sample-to-prediction-set mappings. By integrating derivative estimation with the Good–Turing method, CPQ enables efficient and robust prediction set construction. Evaluated on three open-ended tasks across two mainstream LLM families, CPQ achieves a 23.6% average improvement in prediction set informativeness over existing conformal methods, while strictly maintaining statistical coverage guarantees and minimizing query overhead. Ablation studies confirm the modular contributions of each component.

Technology Category

Application Category

📝 Abstract

Uncertainty quantification (UQ) is essential for safe deployment of generative AI models such as large language models (LLMs), especially in high stakes applications. Conformal prediction (CP) offers a principled uncertainty quantification framework, but classical methods focus on regression and classification, relying on geometric distances or softmax scores: tools that presuppose structured outputs. We depart from this paradigm by studying CP in a query only setting, where prediction sets must be constructed solely from finite queries to a black box generative model, introducing a new trade off between coverage, test time query budget, and informativeness. We introduce Conformal Prediction with Query Oracle (CPQ), a framework characterizing the optimal interplay between these objectives. Our finite sample algorithm is built on two core principles: one governs the optimal query policy, and the other defines the optimal mapping from queried samples to prediction sets. Remarkably, both are rooted in the classical missing mass problem in statistics. Specifically, the optimal query policy depends on the rate of decay, or the derivative, of the missing mass, for which we develop a novel estimator. Meanwhile, the optimal mapping hinges on the missing mass itself, which we estimate using Good Turing estimators. We then turn our focus to implementing our method for language models, where outputs are vast, variable, and often under specified. Fine grained experiments on three real world open ended tasks and two LLMs, show CPQ applicability to any black box LLM and highlight: (1) individual contribution of each principle to CPQ performance, and (2) CPQ ability to yield significantly more informative prediction sets than existing conformal methods for language uncertainty quantification.

Problem

Research questions and friction points this paper is trying to address.

Quantify uncertainty in black-box generative models with limited queries

Optimize coverage, query budget, and informativeness in conformal prediction

Apply novel missing mass estimators to language model uncertainty quantification

Innovation

Methods, ideas, or system contributions that make the work stand out.

CPQ framework optimizes coverage, query budget, informativeness

Novel missing mass decay rate estimator for query policy

Good Turing estimators optimize mapping to prediction sets

🔎 Similar Papers

No similar papers found.