Non-Heuristic Selection via Hybrid Regularized and Machine Learning Models for Insurance

📅 2025-06-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the dual challenge of enhancing both interpretability and predictive accuracy in customer purchase behavior prediction for travel insurance, this paper proposes a tightly coupled hybrid modeling paradigm—Lasso-CatBoost. First, Lasso regression performs interpretability-driven feature selection, yielding sparse linear coefficients that preserve statistical transparency and directional insights. Subsequently, the selected features are fed into CatBoost to construct a high-accuracy classifier. This approach overcomes the trust deficit of conventional black-box models in insurance risk management by jointly optimizing statistical interpretability and classification performance. Experimental results demonstrate that the hybrid model achieves an AUC of 0.861 and an F1-score of 0.808—significantly outperforming individual models (e.g., XGBoost, LightGBM, Random Forest) and pure regularization methods. Moreover, it reduces feature dimensionality by over 60% while retaining key business-relevant variables and their interpretable sign-based effects.

Technology Category

Application Category

📝 Abstract
In this study, machine learning models were tested to predict whether or not a customer of an insurance company would purchase a travel insurance product. For this purpose, secondary data provided by an open-source website that compiles databases from statistical modeling competitions were used. The dataset used presents approximately 2,700 records from an unidentified company in the tourism insurance sector. Initially, the feature engineering stage was carried out, which were selected through regularized models: Ridge, Lasso and Elastic-Net. In this phase, gains were observed not only in relation to dimensionality, but also in the maintenance of interpretative capacity, through the coefficients obtained. After this process, five classification models were evaluated (Random Forests, XGBoost, H2O GBM, LightGBM and CatBoost) separately and in a hybrid way with the previous regularized models, all these stages using the k-fold stratified cross-validation technique. The evaluations were conducted by traditional metrics, including AUC, precision, recall and F1 score. A very competitive hybrid model was obtained using CatBoost combined with Lasso feature selection, achieving an AUC of 0.861 and an F1 score of 0.808. These findings motivate us to present the effectiveness of using hybrid models as a way to obtain high predictive power and maintain the interpretability of the estimation process
Problem

Research questions and friction points this paper is trying to address.

Predicting travel insurance purchases using machine learning
Combining regularized and hybrid models for better accuracy
Maintaining interpretability while improving predictive performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid regularized and machine learning models
Feature selection via Ridge, Lasso, Elastic-Net
CatBoost combined with Lasso for high AUC