Conformal Prediction Sets for Deep Generative Models via Reduction to Conformal Regression

📅 2025-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of constructing minimal valid prediction sets for black-box large language models (e.g., CodeLlama, GPT) in code and mathematical text generation, where validity is defined by user-specified acceptability criteria—such as at least one output passing all test cases—and coverage of the true solution must be guaranteed with a pre-specified confidence level (e.g., 90%). We formulate generative prediction set construction within the conformal regression framework for the first time, leveraging the distributional structure of the minimal sampling number to achieve theoretically provable statistical coverage guarantees and compact set sizes. Our method operates solely via black-box model sampling—requiring no gradient access or internal parameter inspection. Evaluated on multiple code and math word problem benchmarks, our approach reduces average prediction set size by 35% at 90% confidence compared to state-of-the-art methods, while strictly satisfying the validity constraint.

Technology Category

Application Category

📝 Abstract
We consider the problem of generating valid and small prediction sets by sampling outputs (e.g., software code and natural language text) from a black-box deep generative model for a given input (e.g., textual prompt). The validity of a prediction set is determined by a user-defined binary admissibility function depending on the target application. For example, requiring at least one program in the set to pass all test cases in code generation application. To address this problem, we develop a simple and effective conformal inference algorithm referred to as Generative Prediction Sets (GPS). Given a set of calibration examples and black-box access to a deep generative model, GPS can generate prediction sets with provable guarantees. The key insight behind GPS is to exploit the inherent structure within the distribution over the minimum number of samples needed to obtain an admissible output to develop a simple conformal regression approach over the minimum number of samples. Experiments on multiple datasets for code and math word problems using different large language models demonstrate the efficacy of GPS over state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Generating valid prediction sets from deep generative models.
Ensuring prediction sets meet user-defined admissibility criteria.
Developing conformal inference algorithm with provable guarantees.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative Prediction Sets (GPS) algorithm
Conformal regression for deep generative models
Provable guarantees for prediction set validity
🔎 Similar Papers
No similar papers found.
Hooman Shahrokhi
Hooman Shahrokhi
Washington State University
Large Language ModelsConformal Prediction
D
Devjeet Raj Roy
School of EECS, Washington State University
Y
Yan Yan
School of EECS, Washington State University
Venera Arnaoudova
Venera Arnaoudova
Washington State University
empirical software engineeringprogram comprehensionsource code analysisprogram lexiconsoftware evolution
J
Janaradhan Rao Doppa
School of EECS, Washington State University