Revisiting Randomization in Greedy Model Search

๐Ÿ“… 2025-06-18
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Greedy forward selection in sparse linear regression suffers from high computational cost and opaque mechanisms, while ensemble methods like random forests lack theoretical interpretability and efficient implementations. Method: This paper proposes a randomized greedy estimator ensemble based on feature subsampling. Contribution/Results: We provide the first rigorous proof that this randomized ensemble simultaneously reduces both training error and model degrees of freedom, thereby reshaping the biasโ€“variance trade-off. Under orthogonal design, we derive an OLS coefficient rescaling mechanism with logistic weights, revealing its implicit regularization as non-shrinking in nature. Computationally efficient via feature subsampling and dynamic programming, the method significantly outperforms Lasso and elastic net across multiple benchmark datasets, achieving both superior predictive accuracy and enhanced model interpretability.

Technology Category

Application Category

๐Ÿ“ Abstract
Combining randomized estimators in an ensemble, such as via random forests, has become a fundamental technique in modern data science, but can be computationally expensive. Furthermore, the mechanism by which this improves predictive performance is poorly understood. We address these issues in the context of sparse linear regression by proposing and analyzing an ensemble of greedy forward selection estimators that are randomized by feature subsampling -- at each iteration, the best feature is selected from within a random subset. We design a novel implementation based on dynamic programming that greatly improves its computational efficiency. Furthermore, we show via careful numerical experiments that our method can outperform popular methods such as lasso and elastic net across a wide range of settings. Next, contrary to prevailing belief that randomized ensembling is analogous to shrinkage, we show via numerical experiments that it can simultaneously reduce training error and degrees of freedom, thereby shifting the entire bias-variance trade-off curve of the base estimator. We prove this fact rigorously in the setting of orthogonal features, in which case, the ensemble estimator rescales the ordinary least squares coefficients with a two-parameter family of logistic weights, thereby enlarging the model search space. These results enhance our understanding of random forests and suggest that implicit regularization in general may have more complicated effects than explicit regularization.
Problem

Research questions and friction points this paper is trying to address.

Improving computational efficiency of randomized ensemble methods
Understanding mechanism behind predictive performance enhancement
Comparing performance with lasso and elastic net
Innovation

Methods, ideas, or system contributions that make the work stand out.

Ensemble of greedy forward selection estimators
Dynamic programming for computational efficiency
Two-parameter logistic weights for coefficients
๐Ÿ”Ž Similar Papers
No similar papers found.
X
Xin Chen
Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, USA
Jason M. Klusowski
Jason M. Klusowski
Assistant Professor, Department of Operations Research & Financial Engineering
statisticsprobabilitymachine learninginformation theory
Yan Shuo Tan
Yan Shuo Tan
Assistant Professor, National University of Singapore
decision treesensemblesinterpretable machine learningcausality
C
Chang Yu
Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, USA