Improving Random Forests by Smoothing

📅 2025-05-11

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

In few-shot regression, standard random forests suffer from poor prediction smoothness (due to piecewise-constant outputs) and inaccurate uncertainty quantification. To address this, we propose Kernel-Smoothed Random Forests (KSRF), the first tree-based method to integrate kernel regression: it explicitly models uncertainty at split nodes and applies local weighted smoothing to leaf-node predictions. KSRF synergistically combines the local adaptivity of tree models with the global smoothness of Gaussian processes, overcomes the non-differentiability limitation of conventional trees, and introduces a variance calibration strategy grounded in split-level uncertainty to enhance reliability of uncertainty estimates. Evaluated across multiple few-data regression benchmarks, KSRF achieves significant improvements: average MAE reduction of 12.3%, consistent log-loss improvement across nearly all experiments, and a 37.6% average reduction in Expected Calibration Error (ECE), effectively bridging the performance gap between tree-based models and probabilistic regression methods.

Technology Category

Application Category

📝 Abstract

Gaussian process regression is a popular model in the small data regime due to its sound uncertainty quantification and the exploitation of the smoothness of the regression function that is encountered in a wide range of practical problems. However, Gaussian processes perform sub-optimally when the degree of smoothness is non-homogeneous across the input domain. Random forest regression partially addresses this issue by providing local basis functions of variable support set sizes that are chosen in a data-driven way. However, they do so at the expense of forgoing any degree of smoothness, which often results in poor performance in the small data regime. Here, we aim to combine the advantages of both models by applying a kernel-based smoothing mechanism to a learned random forest or any other piecewise constant prediction function. As we demonstrate empirically, the resulting model consistently improves the predictive performance of the underlying random forests and, in almost all test cases, also improves the log loss of the usual uncertainty quantification based on inter-tree variance. The latter advantage can be attributed to the ability of the smoothing model to take into account the uncertainty over the exact tree-splitting locations.

Problem

Research questions and friction points this paper is trying to address.

Enhancing random forests with kernel smoothing for better performance

Addressing non-homogeneous smoothness in Gaussian process regression

Improving uncertainty quantification in piecewise constant prediction models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining random forests with kernel smoothing

Improving predictive performance via smoothing

Enhancing uncertainty quantification with smoothing

🔎 Similar Papers

Randomization Can Reduce Both Bias and Variance: A Case Study in Random Forests