Stable Prediction of Adverse Events in Medical Time-Series Data

📅 2025-10-16

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Existing early event prediction (EEP) methods for clinical time series suffer from unstable risk scoring and temporally inconsistent risk trajectories, undermining clinical trustworthiness. To address this, we propose CAREBench—the first multimodal EEP benchmark explicitly designed for clinical reliability—integrating electronic health records, electrocardiogram waveforms, and clinical notes to support end-to-end risk trajectory modeling. We introduce a novel stability metric based on the local Lipschitz constant and, for the first time in multimodal EEP, jointly optimize predictive accuracy and trajectory smoothness. Extensive experiments reveal that state-of-the-art models—including large language models—exhibit markedly low recall in high-precision regimes, exposing critical limitations in evidence alignment and dynamic smoothing. Our findings advocate a new paradigm for risk prediction grounded in clinically interpretable, dynamically smoothed, and evidence-aligned trajectory estimation.

Technology Category

Application Category

📝 Abstract

Early event prediction (EEP) systems continuously estimate a patient's imminent risk to support clinical decision-making. For bedside trust, risk trajectories must be accurate and temporally stable, shifting only with new, relevant evidence. However, current benchmarks (a) ignore stability of risk scores and (b) evaluate mainly on tabular inputs, leaving trajectory behavior untested. To address this gap, we introduce CAREBench, an EEP benchmark that evaluates deployability using multi-modal inputs-tabular EHR, ECG waveforms, and clinical text-and assesses temporal stability alongside predictive accuracy. We propose a stability metric that quantifies short-term variability in per-patient risk and penalizes abrupt oscillations based on local-Lipschitz constants. CAREBench spans six prediction tasks such as sepsis onset and compares classical learners, deep sequence models, and zero-shot LLMs. Across tasks, existing methods, especially LLMs, struggle to jointly optimize accuracy and stability, with notably poor recall at high-precision operating points. These results highlight the need for models that produce evidence-aligned, stable trajectories to earn clinician trust in continuous monitoring settings. (Code: https://github.com/SeewonChoi/CAREBench.)

Problem

Research questions and friction points this paper is trying to address.

Evaluating temporal stability in medical risk prediction models

Assessing multimodal inputs for early adverse event detection

Addressing poor recall at high-precision clinical decision points

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introducing CAREBench benchmark for multi-modal EEP

Proposing stability metric using local-Lipschitz constants

Evaluating classical, deep sequence, and zero-shot LLM methods

🔎 Similar Papers

No similar papers found.