Cross-Representation Benchmarking in Time-Series Electronic Health Records for Clinical Outcome Prediction

📅 2025-10-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the lack of standardized evaluation criteria for data representation methods in electronic health record (EHR)-based clinical prediction. We introduce the first standardized benchmark framework, systematically evaluating three representation paradigms—multivariate time series, event sequences, and LLM-oriented textual event sequences—across two clinically distinct scenarios: short-term ICU prediction and long-term care trajectory modeling. Our framework integrates diverse models—including Transformer, LSTM, Retain, CLMBR, MLP, count-based models, and 8–20B-parameter LLMs—alongside a missingness-driven feature pruning strategy. Results demonstrate that event-sequence representations achieve overall superior performance; pre-trained models excel under few-shot settings; simple models remain highly competitive with large-scale data; and sparse features are particularly critical for long-term forecasting. The study uncovers systematic “scenario–task–representation–model” alignment principles, delivering a reproducible, interpretable, and deployable methodology for EHR-driven clinical prediction.

Technology Category

Application Category

📝 Abstract
Electronic Health Records (EHRs) enable deep learning for clinical predictions, but the optimal method for representing patient data remains unclear due to inconsistent evaluation practices. We present the first systematic benchmark to compare EHR representation methods, including multivariate time-series, event streams, and textual event streams for LLMs. This benchmark standardises data curation and evaluation across two distinct clinical settings: the MIMIC-IV dataset for ICU tasks (mortality, phenotyping) and the EHRSHOT dataset for longitudinal care (30-day readmission, 1-year pancreatic cancer). For each paradigm, we evaluate appropriate modelling families--including Transformers, MLP, LSTMs and Retain for time-series, CLMBR and count-based models for event streams, 8-20B LLMs for textual streams--and analyse the impact of feature pruning based on data missingness. Our experiments reveal that event stream models consistently deliver the strongest performance. Pre-trained models like CLMBR are highly sample-efficient in few-shot settings, though simpler count-based models can be competitive given sufficient data. Furthermore, we find that feature selection strategies must be adapted to the clinical setting: pruning sparse features improves ICU predictions, while retaining them is critical for longitudinal tasks. Our results, enabled by a unified and reproducible pipeline, provide practical guidance for selecting EHR representations based on the clinical context and data regime.
Problem

Research questions and friction points this paper is trying to address.

Benchmarking diverse EHR representation methods for clinical prediction tasks
Evaluating time-series, event stream and text-based models across clinical settings
Analyzing how feature selection impacts performance in different healthcare contexts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark compares time-series, event, and text EHR representations
Event stream models deliver strongest clinical prediction performance
Feature selection strategies must adapt to specific clinical settings
🔎 Similar Papers
No similar papers found.