Data-Driven Machine Learning Approaches for Predicting In-Hospital Sepsis Mortality

📅 2024-08-03
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Predicting in-hospital mortality for ICU patients with sepsis faces dual challenges of poor model interpretability and severe class imbalance in clinical data. Method: We propose a “clinically driven and data-driven collaborative” feature selection paradigm: integrating MIMIC-III clinical knowledge, leveraging random forest (RF) feature importance scores, and incorporating domain expert consensus; preprocessing includes multiple imputation, standardization, and SMOTE to address missing values and class imbalance. Contribution/Results: Among RF, gradient boosting machine (GBM), logistic regression (LR), support vector machine (SVM), and k-nearest neighbors (KNN), the optimized RF model achieves 0.90 accuracy, 0.97 AUROC, and 0.92 F1-score—significantly outperforming existing approaches. The framework balances high predictive performance with clinical interpretability, offering a deployable, transparent paradigm for sepsis prognosis modeling in real-world ICU settings.

Technology Category

Application Category

📝 Abstract
Sepsis is a severe condition responsible for many deaths in the United States and worldwide, making accurate prediction of outcomes crucial for timely and effective treatment. Previous studies employing machine learning faced limitations in feature selection and model interpretability, reducing their clinical applicability. This research aimed to develop an interpretable and accurate machine learning model to predict in-hospital sepsis mortality, addressing these gaps. Using ICU patient records from the MIMIC-III database, we extracted relevant data through a combination of literature review, clinical input refinement, and Random Forest-based feature selection, identifying the top 35 features. Data preprocessing included cleaning, imputation, standardization, and applying the Synthetic Minority Over-sampling Technique (SMOTE) to address class imbalance, resulting in a dataset of 4,683 patients with 17,429 admissions. Five models-Random Forest, Gradient Boosting, Logistic Regression, Support Vector Machine, and K-Nearest Neighbor-were developed and evaluated. The Random Forest model demonstrated the best performance, achieving an accuracy of 0.90, AUROC of 0.97, precision of 0.93, recall of 0.91, and F1-score of 0.92. These findings underscore the potential of data-driven machine learning approaches to improve critical care, offering clinicians a powerful tool for predicting in-hospital sepsis mortality and enhancing patient outcomes.
Problem

Research questions and friction points this paper is trying to address.

Sepsis Prediction
Machine Learning
Patient Mortality Risk
Innovation

Methods, ideas, or system contributions that make the work stand out.

Interpretable Machine Learning
Sepsis Mortality Prediction
Random Forest
🔎 Similar Papers
No similar papers found.
A
Arseniy Shumilov
University of Southern California, Los Angeles, USA
Y
Yueting Zhu
University of Southern California, Los Angeles, USA
Negin Ashrafi
Negin Ashrafi
Graduate Student at University of Southern California
Machine LearningDeep LearningOptimizationNLPStatistical Learning
G
Gaojie Lian
S
Shilong Ren
M
M. Pishgar
University of Southern California, Los Angeles, USA