🤖 AI Summary
This work addresses the challenge of adaptively quantifying uncertainty under heteroscedasticity in tabular regression with gradient-boosted trees, where existing conformal prediction methods often rely on auxiliary models or data splitting at the cost of efficiency. The authors propose LoBoost, a model-native local conformal prediction approach that innovatively reuses prefix paths of leaf nodes in gradient-boosted trees to construct multi-scale calibration groups—eliminating the need for retraining or auxiliary models. By encoding leaf node sequences, performing multi-scale prefix matching, and applying local residual quantile calibration, LoBoost achieves efficient and adaptive uncertainty quantification using only a standard train/calibration split. Experiments demonstrate that LoBoost yields high-quality prediction intervals across multiple datasets, frequently reduces test MSE, and substantially accelerates the calibration process.
📝 Abstract
Gradient-boosted decision trees are among the strongest off-the-shelf predictors for tabular regression, but point predictions alone do not quantify uncertainty. Conformal prediction provides distribution-free marginal coverage, yet split conformal uses a single global residual quantile and can be poorly adaptive under heteroscedasticity. Methods that improve adaptivity typically fit auxiliary nuisance models or introduce additional data splits/partitions to learn the conformal score, increasing cost and reducing data efficiency. We propose LoBoost, a model-native local conformal method that reuses the fitted ensemble's leaf structure to define multiscale calibration groups. Each input is encoded by its sequence of visited leaves; at resolution level k, we group points by matching prefixes of leaf indices across the first k trees and calibrate residual quantiles within each group. LoBoost requires no retraining, auxiliary models, or extra splitting beyond the standard train/calibration split. Experiments show competitive interval quality, improved test MSE on most datasets, and large calibration speedups.