🤖 AI Summary
Credit default prediction demands both high predictive performance and model interpretability for regulatory compliance and business decision-making.
Method: This study proposes an integrated modeling framework that synergistically combines ensemble learning with SHAP (Shapley Additive Explanations) analysis—first applied jointly in credit risk modeling. We systematically benchmark XGBoost, LightGBM, and Random Forest against a logistic regression baseline, incorporating SMOTE for class imbalance mitigation.
Contribution/Results: The ensemble models achieve significant improvements over baselines: +3.2% accuracy, +5.1% precision, and +4.8% recall. SHAP-based global and local interpretability analysis identifies external credit scores as the most influential feature—empirically validating their domain relevance. The framework demonstrates strong robustness, operational feasibility, and seamless deployability, delivering a production-ready solution that reconciles state-of-the-art predictive performance with regulatory-grade transparency in credit risk management.
📝 Abstract
This study focuses on the problem of credit default prediction, builds a modeling framework based on machine learning, and conducts comparative experiments on a variety of mainstream classification algorithms. Through preprocessing, feature engineering, and model training of the Home Credit dataset, the performance of multiple models including logistic regression, random forest, XGBoost, LightGBM, etc. in terms of accuracy, precision, and recall is evaluated. The results show that the ensemble learning method has obvious advantages in predictive performance, especially in dealing with complex nonlinear relationships between features and data imbalance problems. It shows strong robustness. At the same time, the SHAP method is used to analyze the importance and dependency of features, and it is found that the external credit score variable plays a dominant role in model decision making, which helps to improve the model's interpretability and practical application value. The research results provide effective reference and technical support for the intelligent development of credit risk control systems.