🤖 AI Summary
Generative models lack statistical reliability guarantees in safety-critical applications. To address this, we propose SCOPE-Gen—the first framework integrating sequential conformal prediction with greedy filtering to enforce *conformal acceptability control*, i.e., ensuring that the predicted set contains at least one valid sample with a user-specified high probability. Its key innovation is modeling admissibility checking as a Markov chain, enabling independent calibration of statistical constraints at each generation step and substantially reducing reliance on human evaluation. Evaluated on natural language generation and molecular graph expansion tasks, SCOPE-Gen maintains compact prediction sets while drastically decreasing the number of human evaluations required for calibration. It is the first method to achieve simultaneous optimization of rigorous statistical guarantees and practical generation efficiency.
📝 Abstract
Generative models lack rigorous statistical guarantees for their outputs and are therefore unreliable in safety-critical applications. In this work, we propose Sequential Conformal Prediction for Generative Models (SCOPE-Gen), a sequential conformal prediction method producing prediction sets that satisfy a rigorous statistical guarantee called conformal admissibility control. This guarantee states that with high probability, the prediction sets contain at least one admissible (or valid) example. To this end, our method first samples an initial set of i.i.d. examples from a black box generative model. Then, this set is iteratively pruned via so-called greedy filters. As a consequence of the iterative generation procedure, admissibility of the final prediction set factorizes as a Markov chain. This factorization is crucial, because it allows to control each factor separately, using conformal prediction. In comparison to prior work, our method demonstrates a large reduction in the number of admissibility evaluations during calibration. This reduction is important in safety-critical applications, where these evaluations must be conducted manually by domain experts and are therefore costly and time consuming. We highlight the advantages of our method in terms of admissibility evaluations and cardinality of the prediction sets through experiments in natural language generation and molecular graph extension tasks.