Reliable Statistical Guarantees for Conformal Predictors with Small Datasets

📅 2025-12-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional conformal prediction (CP) provides only marginal coverage guarantees under small-sample calibration, exhibiting high variance in coverage distribution and frequent violations below the nominal level—thereby undermining reliability in uncertainty quantification. To address this, we propose a novel conformal prediction framework that, for the first time, delivers probabilistic coverage guarantees for individual predictors—e.g., $ mathbb{P}( ext{Coverage} geq 1-alpha) geq 1-delta $—overcoming the fundamental limitation of marginal guarantees. This guarantee holds rigorously even with limited calibration data and asymptotically recovers classical CP guarantees under large samples. Our method leverages nonparametric concentration inequalities, requires no assumptions on error distributions, and integrates seamlessly with mainstream CP libraries. Experiments demonstrate substantial improvements in coverage stability and safety under low-data regimes, providing verifiable statistical guarantees for uncertainty quantification in resource-constrained settings.

Technology Category

Application Category

📝 Abstract
Surrogate models (including deep neural networks and other machine learning algorithms in supervised learning) are capable of approximating arbitrarily complex, high-dimensional input-output problems in science and engineering, but require a thorough data-agnostic uncertainty quantification analysis before these can be deployed for any safety-critical application. The standard approach for data-agnostic uncertainty quantification is to use conformal prediction (CP), a well-established framework to build uncertainty models with proven statistical guarantees that do not assume any shape for the error distribution of the surrogate model. However, since the classic statistical guarantee offered by CP is given in terms of bounds for the marginal coverage, for small calibration set sizes (which are frequent in realistic surrogate modelling that aims to quantify error at different regions), the potentially strong dispersion of the coverage distribution around its average negatively impacts the reliability of the uncertainty model, often obtaining coverages below the expected value, resulting in a less applicable framework. After providing a gentle presentation of uncertainty quantification for surrogate models for machine learning practitioners, in this paper we bridge the gap by proposing a new statistical guarantee that offers probabilistic information for the coverage of a single conformal predictor. We show that the proposed framework converges to the standard solution offered by CP for large calibration set sizes and, unlike the classic guarantee, still offers reliable information about the coverage of a conformal predictor for small data sizes. We illustrate and validate the methodology in a suite of examples, and implement an open access software solution that can be used alongside common conformal prediction libraries to obtain uncertainty models that fulfil the new guarantee.
Problem

Research questions and friction points this paper is trying to address.

Enhances conformal prediction reliability for small datasets
Provides probabilistic coverage guarantees for single predictors
Addresses coverage dispersion issues in safety-critical applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

New statistical guarantee for single predictor coverage
Converges to standard CP for large calibration sets
Reliable coverage information for small dataset sizes
🔎 Similar Papers
No similar papers found.
M
Miguel Sánchez-Domínguez
ETSIAE-UPM - School of Aeronautics, Universidad Politécnica de Madrid, Plaza Cardenal Cisneros 3, E-28040, Madrid, Spain
Lucas Lacasa
Lucas Lacasa
Instituto de Fisica Interdisciplinar y Sistemas Complejos, IFISC (CSIC-UIB)
Complex SystemsNetworksMachine LearningTime SeriesChaos
J
Javier de Vicente
ETSIAE-UPM - School of Aeronautics, Universidad Politécnica de Madrid, Plaza Cardenal Cisneros 3, E-28040, Madrid, Spain
Gonzalo Rubio
Gonzalo Rubio
ETSIAE-UPM (School of Aeronautics in Madrid)
Applied MathematicsCFDHigh-order methodsMultiphase Flows
Eusebio Valero
Eusebio Valero
Universidad Politecnica de madrid
Computational Fluid DynamicsFlow ControlStability AnalysisHigh Order SchemesNumerical Methods