Localized Uncertainty Quantification in Random Forests via Proximities

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This paper addresses the insufficient local uncertainty quantification in random forests. We propose an out-of-bag (OOB) error distribution modeling method grounded in tree-based proximities: leveraging intra-tree sample proximity weights to construct localized error distributions, thereby enabling sample-wise regression prediction intervals and classification trust scores. Our key contribution lies in reinterpreting proximity not merely as a similarity metric but as a foundation for local statistical inference—supporting dynamic coverage control and prediction rejection mechanisms. Experiments demonstrate that our approach significantly improves prediction interval coverage and sharpness in regression tasks, while in classification, credibility-driven sample rejection substantially enhances the accuracy-rejection AUC. Overall, the method outperforms existing uncertainty quantification techniques across both regression and classification benchmarks.

Technology Category

Application Category

📝 Abstract

In machine learning, uncertainty quantification helps assess the reliability of model predictions, which is important in high-stakes scenarios. Traditional approaches often emphasize predictive accuracy, but there is a growing focus on incorporating uncertainty measures. This paper addresses localized uncertainty quantification in random forests. While current methods often rely on quantile regression or Monte Carlo techniques, we propose a new approach using naturally occurring test sets and similarity measures (proximities) typically viewed as byproducts of random forests. Specifically, we form localized distributions of OOB errors around nearby points, defined using the proximities, to create prediction intervals for regression and trust scores for classification. By varying the number of nearby points, our intervals can be adjusted to achieve the desired coverage while retaining the flexibility that reflects the certainty of individual predictions. For classification, excluding points identified as unclassifiable by our method generally enhances the accuracy of the model and provides higher accuracy-rejection AUC scores than competing methods.

Problem

Research questions and friction points this paper is trying to address.

Quantifying localized prediction uncertainty in random forests

Using proximity-based OOB errors for prediction intervals

Providing adjustable trust scores for classification reliability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses random forest proximities for uncertainty quantification

Forms localized OOB error distributions for prediction intervals

Adjusts intervals via nearby points for flexible coverage

🔎 Similar Papers

Fast Calculation of Feature Contributions in Boosting Trees