Minimax Rates and Spectral Distillation for Tree Ensembles

πŸ“… 2026-05-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

227K/year
πŸ€– AI Summary
Tree ensemble models, such as random forests and gradient boosting machines, suffer from limited theoretical understanding and high deployment costs. This work addresses these challenges through a spectral perspective, leveraging the eigenvalue decay of kernel operators to characterize their statistical convergence rates and establishing, for the first time, minimax-optimal convergence bounds for random forest regression. Building upon dominant eigenfunctions or singular vectors, the authors propose a general spectral distillation framework that compresses models by several orders of magnitude while preserving predictive performance. Experimental results demonstrate that this approach substantially outperforms existing pruning and rule extraction techniques, offering both theoretical rigor and practical efficiency.
πŸ“ Abstract
Tree ensembles such as random forests (RFs) and gradient boosting machines (GBMs) are among the most widely used supervised learners, yet their theoretical properties remain incompletely understood. We adopt a spectral perspective on these algorithms, with two main contributions. First, we derive minimax-optimal convergence for RF regression, showing that, under mild regularity conditions on tree growth, the eigenvalue decay of the induced kernel operator governs the statistical rate. Second, we exploit this spectral viewpoint to develop compression schemes for tree ensembles. For RFs, leading eigenfunctions of the kernel operator capture the dominant predictive directions; for GBMs, leading singular vectors of the smoother matrix play an analogous role. Learning nonlinear maps for these spectral representations yields distilled models that are orders of magnitude smaller than the originals while maintaining competitive predictive performance. Our methods compare favorably to state of the art algorithms for forest pruning and rule extraction, with applications to resource constrained computing.
Problem

Research questions and friction points this paper is trying to address.

tree ensembles
minimax rates
spectral analysis
model compression
theoretical understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

minimax rates
spectral distillation
tree ensembles
kernel operator
model compression