DiNo and RanBu: Lightweight Predictions from Shallow Random Forests

📅 2025-10-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Random forests achieve strong performance in tabular prediction but suffer from high latency and memory overhead due to deep-tree ensembles, hindering deployment in resource-constrained settings. To address this, we propose two lightweight methods: DiNo, which employs shallow-tree ensembles with distance-weighted prediction based on agglomerative clustering distances; and RanBu, which applies kernel smoothing to Breiman’s proximity measure. Both transform shallow random forests into distance-weighted predictors without retraining—only bandwidth tuning is required for adaptive performance under low- or high-noise regimes—and natively support quantile regression. Leveraging matrix-vector operations, they enable efficient hyperparameter search. Evaluated on 25 public datasets, RanBu matches the accuracy of full-depth random forests while reducing training and inference time by 95%; DiNo achieves optimal bias-variance trade-offs under low noise and delivers significantly faster inference.

Technology Category

Application Category

📝 Abstract
Random Forest ensembles are a strong baseline for tabular prediction tasks, but their reliance on hundreds of deep trees often results in high inference latency and memory demands, limiting deployment in latency-sensitive or resource-constrained environments. We introduce DiNo (Distance with Nodes) and RanBu (Random Bushes), two shallow-forest methods that convert a small set of depth-limited trees into efficient, distance-weighted predictors. DiNo measures cophenetic distances via the most recent common ancestor of observation pairs, while RanBu applies kernel smoothing to Breiman's classical proximity measure. Both approaches operate entirely after forest training: no additional trees are grown, and tuning of the single bandwidth parameter $h$ requires only lightweight matrix-vector operations. Across three synthetic benchmarks and 25 public datasets, RanBu matches or exceeds the accuracy of full-depth random forests-particularly in high-noise settings-while reducing training plus inference time by up to 95%. DiNo achieves the best bias-variance trade-off in low-noise regimes at a modest computational cost. Both methods extend directly to quantile regression, maintaining accuracy with substantial speed gains. The implementation is available as an open-source R/C++ package at https://github.com/tiagomendonca/dirf. We focus on structured tabular random samples (i.i.d.), leaving extensions to other modalities for future work.
Problem

Research questions and friction points this paper is trying to address.

Reduces high inference latency in random forests for tabular prediction
Addresses memory demands in resource-constrained deployment environments
Maintains accuracy while substantially improving computational efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distance-weighted predictors from shallow trees
Kernel smoothing applied to proximity measures
Lightweight parameter tuning via matrix operations
🔎 Similar Papers
No similar papers found.
T
Tiago Mendonça dos Santos
Insper Institute of Education and Research, São Paulo, Brazil
Rafael Izbicki
Rafael Izbicki
Federal University of São Carlos
StatisticsMachine LearningNonparametric MethodsHigh-dimensional InferenceData Science
L
Luís Gustavo Esteves
Institute of Mathematics and Statistics, University of São Paulo, São Paulo, Brazil