coverforest: Conformal Predictions with Random Forest in Python

๐Ÿ“… 2025-01-24
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Random forests-based conformal prediction suffers from high computational overhead, hindering simultaneous attainment of efficiency and theoretical validity. To address this, we propose the first efficient conformal prediction framework deeply integrated with out-of-bag (OOB) scoring, supporting both regression and classification tasksโ€”including Split Conformal, CV+, Jackknife+-after-bootstrap, and adaptive prediction sets. Our method innovatively unifies cross-conformal prediction with OOB estimation and significantly reduces redundant computation via fine-grained parallelization and Cython-level optimization. Extensive experiments demonstrate that, while strictly maintaining the nominal coverage probability, our framework achieves 2โ€“9ร— speedup in both training and prediction over state-of-the-art implementations. The implementation is publicly available.

Technology Category

Application Category

๐Ÿ“ Abstract
Conformal prediction provides a framework for uncertainty quantification, specifically in the forms of prediction intervals and sets with distribution-free guaranteed coverage. While recent cross-conformal techniques such as CV+ and Jackknife+-after-bootstrap achieve better data efficiency than traditional split conformal methods, they incur substantial computational costs due to required pairwise comparisons between training and test samples' out-of-bag scores. Observing that these methods naturally extend from ensemble models, particularly random forests, we leverage existing optimized random forest implementations to enable efficient cross-conformal predictions. We present coverforest, a Python package that implements efficient conformal prediction methods specifically optimized for random forests. coverforest supports both regression and classification tasks through various conformal prediction methods, including split conformal, CV+, Jackknife+-after-bootstrap, and adaptive prediction sets. Our package leverages parallel computing and Cython optimizations to speed up out-of-bag calculations. Our experiments demonstrate that coverforest's predictions achieve the desired level of coverage. In addition, its training and prediction times can be faster than an existing implementation by 2--9 times. The source code for the coverforest is hosted on GitHub at https://github.com/donlapark/coverforest.
Problem

Research questions and friction points this paper is trying to address.

Random Forest Optimization
Prediction Efficiency
Regression and Classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

coverforest
parallel computing
Cython acceleration
๐Ÿ”Ž Similar Papers
No similar papers found.
P
Panisara Meehinkong
Department of Statistics, Chiang Mai University, Chiang Mai 50200, Thailand
Donlapark Ponnoprat
Donlapark Ponnoprat
Chiang Mai University
optimal transportcausal inferencedifferential privacymachine learning