Conformalized Super Learner

πŸ“… 2026-04-24
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

173K/year
πŸ€– AI Summary
This work proposes a novel framework that integrates the learner weights of the Super Learner ensemble method with conformal prediction to construct prediction intervals with finite-sample coverage guarantees. By employing a weighted majority vote to aggregate conformity scores from individual base learners, the approach is tailored for continuous response variables and maintains valid coverage even under complex data-generating mechanisms such as non-exchangeability, heteroscedasticity, or sparsity. Theoretical analysis and simulation studies confirm its strong finite-sample coverage properties. When applied to real-world creatinine level prediction, the method demonstrates remarkable robustness and adaptability in handling nonlinear relationships, interaction effects, and outliers.

Technology Category

Application Category

πŸ“ Abstract
The Super Learner (SL) is a widely used ensemble method that combines predictions from a library of learners based on their predictive performance. Interval predictions are of considerable practical interest because they allow uncertainty in predictions produced by an individual learner or an ensemble to be quantified. Several methods have been proposed for constructing interval predictions based on the SL, however, these approaches are typically justified using asymptotic arguments or rely on computationally intensive procedures such as the bootstrap. Conformal prediction (CP) is a machine learning framework for constructing prediction intervals with finite-sample and asymptotic coverage guarantees under mild conditions. We propose coupling CP with the SL through a natural construction that mirrors the original SL framework, using individual learner weights and combining learner-specific conformity scores via a weighted majority vote. We characterize the properties of the resulting SL-based prediction intervals for continuous outcomes. We cover settings under exchangeability, potential violations of exchangeability, and data-generating mechanisms exhibiting heteroscedasticity, sparsity, and other forms of distributional heterogeneity. A comprehensive simulation study shows that the conformalized SL achieves valid finite-sample coverage with competitive performance relative to the true data-generating mechanism. A central contribution of this work is an application to predicting creatinine levels using socio-demographic, biometric, and laboratory measurements. This example demonstrates the benefits of an ensemble with carefully selected learners designed to capture key aspects of complex regression functions, including non-linear effects, interactions, sparsity, heteroscedasticity, and robustness to outliers.R
Problem

Research questions and friction points this paper is trying to address.

Super Learner
prediction intervals
conformal prediction
uncertainty quantification
ensemble methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Conformal Prediction
Super Learner
Prediction Intervals
Ensemble Learning
Finite-sample Coverage
πŸ”Ž Similar Papers