🤖 AI Summary
This paper addresses the insufficient local uncertainty quantification in random forests. We propose an out-of-bag (OOB) error distribution modeling method grounded in tree-based proximities: leveraging intra-tree sample proximity weights to construct localized error distributions, thereby enabling sample-wise regression prediction intervals and classification trust scores. Our key contribution lies in reinterpreting proximity not merely as a similarity metric but as a foundation for local statistical inference—supporting dynamic coverage control and prediction rejection mechanisms. Experiments demonstrate that our approach significantly improves prediction interval coverage and sharpness in regression tasks, while in classification, credibility-driven sample rejection substantially enhances the accuracy-rejection AUC. Overall, the method outperforms existing uncertainty quantification techniques across both regression and classification benchmarks.
📝 Abstract
In machine learning, uncertainty quantification helps assess the reliability of model predictions, which is important in high-stakes scenarios. Traditional approaches often emphasize predictive accuracy, but there is a growing focus on incorporating uncertainty measures. This paper addresses localized uncertainty quantification in random forests. While current methods often rely on quantile regression or Monte Carlo techniques, we propose a new approach using naturally occurring test sets and similarity measures (proximities) typically viewed as byproducts of random forests. Specifically, we form localized distributions of OOB errors around nearby points, defined using the proximities, to create prediction intervals for regression and trust scores for classification. By varying the number of nearby points, our intervals can be adjusted to achieve the desired coverage while retaining the flexibility that reflects the certainty of individual predictions. For classification, excluding points identified as unclassifiable by our method generally enhances the accuracy of the model and provides higher accuracy-rejection AUC scores than competing methods.