Interpretable Credit Default Prediction with Ensemble Learning and SHAP

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Credit default prediction demands both high predictive performance and model interpretability for regulatory compliance and business decision-making. Method: This study proposes an integrated modeling framework that synergistically combines ensemble learning with SHAP (Shapley Additive Explanations) analysis—first applied jointly in credit risk modeling. We systematically benchmark XGBoost, LightGBM, and Random Forest against a logistic regression baseline, incorporating SMOTE for class imbalance mitigation. Contribution/Results: The ensemble models achieve significant improvements over baselines: +3.2% accuracy, +5.1% precision, and +4.8% recall. SHAP-based global and local interpretability analysis identifies external credit scores as the most influential feature—empirically validating their domain relevance. The framework demonstrates strong robustness, operational feasibility, and seamless deployability, delivering a production-ready solution that reconciles state-of-the-art predictive performance with regulatory-grade transparency in credit risk management.

Technology Category

Application Category

📝 Abstract
This study focuses on the problem of credit default prediction, builds a modeling framework based on machine learning, and conducts comparative experiments on a variety of mainstream classification algorithms. Through preprocessing, feature engineering, and model training of the Home Credit dataset, the performance of multiple models including logistic regression, random forest, XGBoost, LightGBM, etc. in terms of accuracy, precision, and recall is evaluated. The results show that the ensemble learning method has obvious advantages in predictive performance, especially in dealing with complex nonlinear relationships between features and data imbalance problems. It shows strong robustness. At the same time, the SHAP method is used to analyze the importance and dependency of features, and it is found that the external credit score variable plays a dominant role in model decision making, which helps to improve the model's interpretability and practical application value. The research results provide effective reference and technical support for the intelligent development of credit risk control systems.
Problem

Research questions and friction points this paper is trying to address.

Develops machine learning framework for credit default prediction
Evaluates ensemble learning performance on imbalanced data
Enhances interpretability using SHAP for feature analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Ensemble learning for credit default prediction
SHAP method enhances model interpretability
Handles data imbalance and nonlinear relationships
🔎 Similar Papers
No similar papers found.
S
Shiqi Yang
New York University, New York, USA
Ziyi Huang
Ziyi Huang
Assistant Professor @ Arizona State University
Trustworthy AI for Health
W
Wengran Xiao
University of Michigan, New York, USA
X
Xinyu Shen
Georgia Institute of Technology, Atlanta, USA