Asymptotic Consistency and Generalization in Hybrid Models of Regularized Selection and Nonlinear Learning

📅 2025-08-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenge of simultaneously achieving accuracy, interpretability, and robustness in variable selection and prediction under high-dimensional noisy data, this paper proposes a hybrid modeling paradigm that synergistically integrates regularization techniques with nonlinear models. Specifically, the framework unifies sparse regularizers—including Lasso, Ridge, and Elastic Net—with tree-based ensemble methods such as Random Forest, XGBoost, and LightGBM. We systematically evaluate performance on the Friedman simulation benchmark using RMSE, Jaccard index, and recall. Theoretically, the approach guarantees asymptotic consistency and strong generalization. Empirically, it outperforms both pure regularized and pure black-box models: it improves predictive accuracy while substantially enhancing variable identification consistency and robustness—particularly as sample size increases. This work thus provides a novel, trustworthy decision-support pathway that reconciles interpretability with statistical stability.

Technology Category

Application Category

📝 Abstract
This study explores how different types of supervised models perform in the task of predicting and selecting relevant variables in high-dimensional contexts, especially when the data is very noisy. We analyzed three approaches: regularized models (such as Lasso, Ridge, and Elastic Net), black-box models (such as Random Forest, XGBoost, LightGBM, CatBoost, and H2O GBM), and hybrid models that combine both approaches: regularization with nonlinear algorithms. Based on simulations inspired by the Friedman equation, we evaluated 23 models using three complementary metrics: RMSE, Jaccard index, and recall rate. The results reveal that, although black-box models excel in predictive accuracy, they lack interpretability and simplicity, essential factors in many real-world contexts. Regularized models, on the other hand, proved to be more sensitive to an excess of irrelevant variables. In this scenario, hybrid models stood out for their balance: they maintain good predictive performance, identify relevant variables more consistently, and offer greater robustness, especially as the sample size increases. Therefore, we recommend using this hybrid framework in market applications, where it is essential that the results make sense in a practical context and support decisions with confidence.
Problem

Research questions and friction points this paper is trying to address.

Evaluating hybrid models for high-dimensional noisy data prediction
Comparing performance of regularized, black-box, and hybrid models
Balancing predictive accuracy and interpretability in variable selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid models combine regularization and nonlinear learning
Evaluate performance using RMSE, Jaccard index, recall
Balanced predictive accuracy and interpretability in applications
🔎 Similar Papers
No similar papers found.