Statistical-Computational Trade-offs for Recursive Adaptive Partitioning Estimators

📅 2024-11-07

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This paper characterizes the fundamental statistical–computational trade-off inherent in greedy training of recursive adaptive partitioning estimators (e.g., decision trees): when the true regression function satisfies the Merged Staircase Property (MSP), greedy algorithms achieve optimal estimation with only $O(log d)$ samples; otherwise, they require $exp(Omega(d))$ samples and get trapped in poor local optima. Method: The authors develop the first rigorous theoretical framework linking greedy partitioning to the global optimum of empirical risk minimization (ERM), integrating stochastic process analysis and coupling techniques. They further establish the first formal comparison between greedy tree learning and SGD-trained two-layer neural networks. Contribution/Results: The analysis quantitatively identifies the origin of local optimality in greedy partitioning, provides novel criteria for model selection and algorithm design in high-dimensional sparse regression, and unifies statistical efficiency with computational tractability under structural assumptions.

Technology Category

Application Category

📝 Abstract

Models based on recursive adaptive partitioning such as decision trees and their ensembles are popular for high-dimensional regression as they can potentially avoid the curse of dimensionality. Because empirical risk minimization (ERM) is computationally infeasible, these models are typically trained using greedy algorithms. Although effective in many cases, these algorithms have been empirically observed to get stuck at local optima. We explore this phenomenon in the context of learning sparse regression functions over $d$ binary features, showing that when the true regression function $f^*$ does not satisfy Abbe et al. (2022)'s Merged Staircase Property (MSP), greedy training requires $exp(Omega(d))$ to achieve low estimation error. Conversely, when $f^*$ does satisfy MSP, greedy training can attain small estimation error with only $O(log d)$ samples. This dichotomy mirrors that of two-layer neural networks trained with stochastic gradient descent (SGD) in the mean-field regime, thereby establishing a head-to-head comparison between SGD-trained neural networks and greedy recursive partitioning estimators. Furthermore, ERM-trained recursive partitioning estimators achieve low estimation error with $O(log d)$ samples irrespective of whether $f^*$ satisfies MSP, thereby demonstrating a statistical-computational trade-off for greedy training. Our proofs are based on a novel interpretation of greedy recursive partitioning using stochastic process theory and a coupling technique that may be of independent interest.

Problem

Research questions and friction points this paper is trying to address.

Analyzing greedy algorithms' local optima in recursive partitioning models

Establishing sample complexity dichotomy based on Merged Staircase Property

Demonstrating statistical-computational trade-off between greedy and ERM training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Greedy training for recursive partitioning estimators

Statistical-computational trade-offs analysis

Stochastic process theory coupling technique

🔎 Similar Papers

No similar papers found.