🤖 AI Summary
This work proposes BlendNet, a particle swarm optimization (PSO)-based weighted ensemble framework designed to address the limitations of traditional models in financial loan default prediction, which stem from nonlinear relationships, class imbalance, and dynamic shifts in borrower behavior. BlendNet integrates tree-based models and neural networks, employing recursive feature elimination for feature selection and a dynamic greedy weighting mechanism that assigns base model weights based on empirical performance. To capture higher-order interactions among model outputs while ensuring both predictive accuracy and well-calibrated probabilities, a neural network meta-learner is introduced in a stacking architecture. Evaluated on the Lending Club dataset, BlendNet achieves an AUC of 0.80, a macro-averaged F1-score of 0.73, and a default recall of 0.81, significantly outperforming individual baseline models.
📝 Abstract
Accurate prediction of loan defaults is a central challenge in credit risk management, particularly in modern financial datasets characterised by nonlinear relationships, class imbalance, and evolving borrower behaviour. Traditional statistical models and static ensemble methods often struggle to maintain reliable performance under such conditions. This study proposes an Optimised Greedy-Weighted Ensemble framework for loan default prediction that dynamically allocates model weights based on empirical predictive performance. The framework integrates multiple machine learning classifiers, with their hyperparameters first optimised using Particle Swarm Optimisation. Model predictions are then combined via a regularised greedy weighting mechanism. At the same time, a neural-network-based meta-learner is employed within stacked-ensemble to capture higher-order relationships among model outputs. Experiments conducted on the Lending Club dataset demonstrate that the proposed framework improves predictive performance compared with individual classifiers. The BlendNet ensemble achieved the strongest results with an AUC of 0.80, a macro-average F1-score of 0.73, and a default recall of 0.81. Calibration analysis further shows that tree-based ensembles such as Extra Trees and Gradient Boosting provide the most reliable probability estimates, while the stacked ensemble offers superior ranking capability. Feature analysis using Recursive Feature Elimination identifies revolving utilisation, annual income, and debt-to-income ratio as the most influential predictors of loan default. These findings demonstrate that performance-driven ensemble weighting can improve both predictive accuracy and interpretability in credit risk modelling. The proposed framework provides a scalable data-driven approach to support institutional credit assessment, risk monitoring, and financial decision-making.