๐ค AI Summary
Standard random forests, built upon CART trees, exhibit inherent bias in modeling linear relationships. To address this limitation, we propose RaFFLE (Random Forest Featuring Linear Extensions), the first random forest variant that integrates Piecewise Linear Organized Trees (PILOT) as base learners within the random forest framework. RaFFLE enhances ensemble diversity and generalization through node-level feature subsampling and โโ regularization. We establish theoretical guarantees showing that RaFFLE is strongly consistent under mild regularity conditions and achieves a faster convergence rate than classical random forests on linear data. Empirical evaluation across 136 regression benchmark datasets demonstrates that RaFFLE significantly outperforms CART, conventional random forests, Lasso, Ridge regression, and XGBoostโachieving superior predictive accuracy while maintaining competitive computational efficiency.
๐ Abstract
Random forests are widely used in regression. However, the decision trees used as base learners are poor approximators of linear relationships. To address this limitation we propose RaFFLE (Random Forest Featuring Linear Extensions), a novel framework that integrates the recently developed PILOT trees (Piecewise Linear Organic Trees) as base learners within a random forest ensemble. PILOT trees combine the computational efficiency of traditional decision trees with the flexibility of linear model trees. To ensure sufficient diversity of the individual trees, we introduce an adjustable regularization parameter and use node-level feature sampling. These modifications improve the accuracy of the forest. We establish theoretical guarantees for the consistency of RaFFLE under weak conditions, and its faster convergence when the data are generated by a linear model. Empirical evaluations on 136 regression datasets demonstrate that RaFFLE outperforms the classical CART and random forest methods, the regularized linear methods Lasso and Ridge, and the state-of-the-art XGBoost algorithm, across both linear and nonlinear datasets. By balancing predictive accuracy and computational efficiency, RaFFLE proves to be a versatile tool for tackling a wide variety of regression problems.