Empirical Bayes Conformal Prediction for Vision and Language Models

📅 2026-05-21

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

This work addresses a key limitation of traditional conformal prediction, which relies on a single nonconformity score and often fails to distinguish signal from noise, thereby including high-variance erroneous candidates in the prediction set. The authors propose the first framework that integrates empirical Bayes with conformal prediction, introducing r-values to transform the variability of multiple nonconformity scores into uncertainty-aware scores that jointly account for both mean performance and uncertainty when evaluating candidates. The method guarantees the target coverage while effectively excluding high-variance incorrect candidates, and naturally reduces to standard conformal prediction when variability vanishes. Experiments demonstrate that the approach significantly improves ranking stability and yields smaller prediction sets across image classification, CLIP, and large language models, with particularly pronounced gains when score variability is discriminative.

📝 Abstract

Conformal prediction (CP) gives distribution-free coverage for modern vision and language models, but it is often forced to make a ranking decision from a single unstable nonconformity score. Standard CP uses one realization, while average-then-calibrate variants smooth multiple realizations into a point estimate. Both options discard the inconsistency that can help identify whether a candidate is indeed stable. A weak answer can enter the conformal set even if the evidence is not strong, simply because one posterior sample or prompt phrasing made it look strong. But variability can help distinguish a stable signal from noise-driven fluctuations. We describe an empirical Bayes conformal prediction framework that uses $r$-values to convert score variability into an uncertainty informed nonconformity score. The resulting $r$-value estimates how likely a candidate's latent score belongs to the top-ranked group after accounting for both its mean score and its uncertainty. It admits both a closed-form Normal-Normal empirical Bayes estimator and a nonparametric posterior-sampling estimator. Using the $r$-value as the nonconformity score preserves the target conformal coverage while provably reducing the inclusion of high variance false candidates under mild regularity conditions. Across image classification, CLIP-based VLM benchmarks, and LLMs, we show that $r$-value conformal prediction preserves target coverage while improving ranking stability and reducing set size when variability is informative, and reverting to CP-like behavior when variability vanishes.

Problem

Research questions and friction points this paper is trying to address.

Conformal Prediction

Nonconformity Score

Score Variability

Ranking Stability

Vision and Language Models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Empirical Bayes

Conformal Prediction

r-values