Localized Uncertainty Quantification in Random Forests via Proximities

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the insufficient local uncertainty quantification in random forests. We propose an out-of-bag (OOB) error distribution modeling method grounded in tree-based proximities: leveraging intra-tree sample proximity weights to construct localized error distributions, thereby enabling sample-wise regression prediction intervals and classification trust scores. Our key contribution lies in reinterpreting proximity not merely as a similarity metric but as a foundation for local statistical inference—supporting dynamic coverage control and prediction rejection mechanisms. Experiments demonstrate that our approach significantly improves prediction interval coverage and sharpness in regression tasks, while in classification, credibility-driven sample rejection substantially enhances the accuracy-rejection AUC. Overall, the method outperforms existing uncertainty quantification techniques across both regression and classification benchmarks.

Technology Category

Application Category

📝 Abstract
In machine learning, uncertainty quantification helps assess the reliability of model predictions, which is important in high-stakes scenarios. Traditional approaches often emphasize predictive accuracy, but there is a growing focus on incorporating uncertainty measures. This paper addresses localized uncertainty quantification in random forests. While current methods often rely on quantile regression or Monte Carlo techniques, we propose a new approach using naturally occurring test sets and similarity measures (proximities) typically viewed as byproducts of random forests. Specifically, we form localized distributions of OOB errors around nearby points, defined using the proximities, to create prediction intervals for regression and trust scores for classification. By varying the number of nearby points, our intervals can be adjusted to achieve the desired coverage while retaining the flexibility that reflects the certainty of individual predictions. For classification, excluding points identified as unclassifiable by our method generally enhances the accuracy of the model and provides higher accuracy-rejection AUC scores than competing methods.
Problem

Research questions and friction points this paper is trying to address.

Quantifying localized prediction uncertainty in random forests
Using proximity-based OOB errors for prediction intervals
Providing adjustable trust scores for classification reliability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses random forest proximities for uncertainty quantification
Forms localized OOB error distributions for prediction intervals
Adjusts intervals via nearby points for flexible coverage
🔎 Similar Papers
Jake S. Rhodes
Jake S. Rhodes
Brigham Young University
Machine LearningData Science
S
Scott D. Brown
Department of Statistics, Brigham Young University, Provo, Utah, USA
J
J. Riley Wilkinson
Department of Statistics, Texas A&M University, College Station, Texas, USA