Sepsis Prediction Using Graph Convolutional Networks over Patient-Feature-Value Triplets

📅 2025-12-04

📈 Citations: 0

✨ Influential: 0

career value

147K/year

🤖 AI Summary

Sepsis exhibits high mortality in intensive care units, yet early identification remains challenging due to the sparsity, heterogeneity, and temporal complexity of electronic health record (EHR) data. To address this, we propose Triplet-GCN—a graph-based model that represents EHRs as a patient–feature–value tripartite graph structure, preserving fine-grained clinical semantics. We design a type-aware preprocessing pipeline comprising median imputation and standardization for numerical features, effect coding for categorical variables, and mode imputation with low-dimensional embedding for missing categorical values. The architecture integrates graph convolutional networks (GCNs) with a lightweight multilayer perceptron (MLP) for end-to-end sepsis risk stratification. Evaluated on a multicenter Chinese cohort, Triplet-GCN significantly outperforms conventional baselines—including KNN, SVM, and XGBoost—achieving state-of-the-art performance in AUC, balanced error rate, and sensitivity–specificity trade-off. Moreover, it offers strong discriminative capability alongside clinically interpretable predictions.

Technology Category

Application Category

📝 Abstract

In the intensive care setting, sepsis continues to be a major contributor to patient illness and death; however, its timely detection is hindered by the complex, sparse, and heterogeneous nature of electronic health record (EHR) data. We propose Triplet-GCN, a single-branch graph convolutional model that represents each encounter as patient--feature--value triplets, constructs a bipartite EHR graph, and learns patient embeddings via a Graph Convolutional Network (GCN) followed by a lightweight multilayer perceptron (MLP). The pipeline applies type-specific preprocessing -- median imputation and standardization for numeric variables, effect coding for binary features, and mode imputation with low-dimensional embeddings for rare categorical attributes -- and initializes patient nodes with summary statistics, while retaining measurement values on edges to preserve "who measured what and by how much". In a retrospective, multi-center Chinese cohort (N = 648; 70/30 train--test split) drawn from three tertiary hospitals, Triplet-GCN consistently outperforms strong tabular baselines (KNN, SVM, XGBoost, Random Forest) across discrimination and balanced error metrics, yielding a more favorable sensitivity--specificity trade-off and improved overall utility for early warning. These findings indicate that encoding EHR as triplets and propagating information over a patient--feature graph produce more informative patient representations than feature-independent models, offering a simple, end-to-end blueprint for deployable sepsis risk stratification.

Problem

Research questions and friction points this paper is trying to address.

Predicting sepsis early from complex EHR data

Building patient-feature graphs for better representation learning

Outperforming traditional models in clinical risk stratification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph Convolutional Network models patient-feature-value triplets

Bipartite EHR graph retains measurement values on edges

Type-specific preprocessing with median imputation and effect coding

🔎 Similar Papers

A deep graph model for the signed interaction prediction in biological network