MIMIC-Sepsis: A Curated Benchmark for Modeling and Learning from Sepsis Trajectories in the ICU

📅 2025-10-28
📈 Citations: 0
Influential: 0
📄 PDF

career value

187K/year
🤖 AI Summary
Existing ICU-based sepsis studies suffer from outdated data, non-reproducible preprocessing, and insufficient coverage of therapeutic interventions. To address these limitations, this work constructs a standardized sepsis cohort (n=35,239) from MIMIC-IV, strictly adhering to the Sepsis-3 definition and integrating time-aligned clinical variables with multidimensional treatment data—including vasopressors, fluid administration, mechanical ventilation, and antibiotics. We propose a transparent, reproducible preprocessing pipeline featuring structured missing-value imputation and establish three benchmark tasks: early mortality prediction, length-of-stay estimation, and shock onset classification. Experimental results demonstrate that incorporating treatment variables significantly improves model performance—particularly under Transformer architectures. This study introduces the first open-source, reproducible benchmark platform specifically designed for sequential modeling in critical care, enabling standardized, comparable sepsis prediction research.

Technology Category

Application Category

📝 Abstract
Sepsis is a leading cause of mortality in intensive care units (ICUs), yet existing research often relies on outdated datasets, non-reproducible preprocessing pipelines, and limited coverage of clinical interventions. We introduce MIMIC-Sepsis, a curated cohort and benchmark framework derived from the MIMIC-IV database, designed to support reproducible modeling of sepsis trajectories. Our cohort includes 35,239 ICU patients with time-aligned clinical variables and standardized treatment data, including vasopressors, fluids, mechanical ventilation and antibiotics. We describe a transparent preprocessing pipeline-based on Sepsis-3 criteria, structured imputation strategies, and treatment inclusion-and release it alongside benchmark tasks focused on early mortality prediction, length-of-stay estimation, and shock onset classification. Empirical results demonstrate that incorporating treatment variables substantially improves model performance, particularly for Transformer-based architectures. MIMIC-Sepsis serves as a robust platform for evaluating predictive and sequential models in critical care research.
Problem

Research questions and friction points this paper is trying to address.

Addressing outdated datasets and non-reproducible sepsis research pipelines
Providing standardized clinical intervention data for ICU sepsis trajectories
Improving predictive model performance through treatment variable integration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Curated cohort with standardized treatment data
Transparent preprocessing pipeline with structured imputation
Transformer-based models enhanced by treatment variables
💼 Related Jobs
Postdoctoral Fellow – AI-Driven Multi-Omics Integration for Predictive Toxicology
Pfizer
The annual base salary for this position ranges from $64,600.00 to $107,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 7.5% of the base salary. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
Hybrid
Y
Yong Huang
Department of Computer Science, University of California, Irvine, Irvine, California
Zhongqi Yang
Zhongqi Yang
University of California, Irvine
Digital HealthMachine LearningPersonalizationLLMs
Amir Rahmani
Amir Rahmani
NASA Jet Propulsion Laboratory