🤖 AI Summary
Sepsis exhibits high mortality in intensive care units, yet early identification remains challenging due to the sparsity, heterogeneity, and temporal complexity of electronic health record (EHR) data. To address this, we propose Triplet-GCN—a graph-based model that represents EHRs as a patient–feature–value tripartite graph structure, preserving fine-grained clinical semantics. We design a type-aware preprocessing pipeline comprising median imputation and standardization for numerical features, effect coding for categorical variables, and mode imputation with low-dimensional embedding for missing categorical values. The architecture integrates graph convolutional networks (GCNs) with a lightweight multilayer perceptron (MLP) for end-to-end sepsis risk stratification. Evaluated on a multicenter Chinese cohort, Triplet-GCN significantly outperforms conventional baselines—including KNN, SVM, and XGBoost—achieving state-of-the-art performance in AUC, balanced error rate, and sensitivity–specificity trade-off. Moreover, it offers strong discriminative capability alongside clinically interpretable predictions.
📝 Abstract
In the intensive care setting, sepsis continues to be a major contributor to patient illness and death; however, its timely detection is hindered by the complex, sparse, and heterogeneous nature of electronic health record (EHR) data. We propose Triplet-GCN, a single-branch graph convolutional model that represents each encounter as patient--feature--value triplets, constructs a bipartite EHR graph, and learns patient embeddings via a Graph Convolutional Network (GCN) followed by a lightweight multilayer perceptron (MLP). The pipeline applies type-specific preprocessing -- median imputation and standardization for numeric variables, effect coding for binary features, and mode imputation with low-dimensional embeddings for rare categorical attributes -- and initializes patient nodes with summary statistics, while retaining measurement values on edges to preserve "who measured what and by how much". In a retrospective, multi-center Chinese cohort (N = 648; 70/30 train--test split) drawn from three tertiary hospitals, Triplet-GCN consistently outperforms strong tabular baselines (KNN, SVM, XGBoost, Random Forest) across discrimination and balanced error metrics, yielding a more favorable sensitivity--specificity trade-off and improved overall utility for early warning. These findings indicate that encoding EHR as triplets and propagating information over a patient--feature graph produce more informative patient representations than feature-independent models, offering a simple, end-to-end blueprint for deployable sepsis risk stratification.