One Loss to Rule Them All: Marked Time-to-Event for Structured EHR Foundation Models

📅 2026-01-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes ORA, a novel foundation model for electronic health records (EHR) that introduces the time-to-event pretraining objective to jointly model the timing of clinical events and their associated continuous measurements. In contrast to existing EHR foundation models that primarily rely on next-item prediction and struggle to capture temporal dynamics and continuous-valued observations, ORA overcomes the limitation of predicting only event types by explicitly incorporating event timestamps and numerical values into a unified framework. The approach is architecture-agnostic and seamlessly integrates discrete clinical events with continuous physiological measurements. Extensive experiments across multiple datasets, downstream tasks, and backbone architectures demonstrate that ORA consistently achieves superior performance in classification, regression, and temporal prediction tasks, highlighting its enhanced generalization capability.

Technology Category

Application Category

📝 Abstract
Clinical events captured in Electronic Health Records (EHR) are irregularly sampled and may consist of a mixture of discrete events and numerical measurements, such as laboratory values or treatment dosages. The sequential nature of EHR, analogous to natural language, has motivated the use of next-token prediction to train prior EHR Foundation Models (FMs) over events. However, this training fails to capture the full structure of EHR. We propose ORA, a marked time-to-event pretraining objective that jointly models event timing and associated measurements. Across multiple datasets, downstream tasks, and model architectures, this objective consistently yields more generalizable representations than next-token prediction and pretraining losses that ignore continuous measurements. Importantly, the proposed objective yields improvements beyond traditional classification evaluation, including better regression and time-to-event prediction. Beyond introducing a new family of FMs, our results suggest a broader takeaway: pretraining objectives that account for EHR structure are critical for expanding downstream capabilities and generalizability
Problem

Research questions and friction points this paper is trying to address.

Electronic Health Records
Foundation Models
Time-to-Event Prediction
Pretraining Objective
Structured Clinical Data
Innovation

Methods, ideas, or system contributions that make the work stand out.

marked time-to-event
EHR foundation models
pretraining objective
structured clinical data
joint modeling
🔎 Similar Papers
No similar papers found.
Z
Zilin Jing
Department of Computer Science, Columbia University, New York, USA
V
Vincent Jeanselme
Department of Biomedical Informatics, Columbia University, New York, USA
Y
Yuta Kobayashi
Department of Biomedical Informatics, Columbia University, New York, USA
Simon A. Lee
Simon A. Lee
Ph.D. Student, UCLA
AI in HealthcareMachine LearningFoundation Models
C
Chao Pang
Department of Biomedical Informatics, Columbia University, New York, USA; Formation Bio
A
Aparajita Kashyap
Department of Biomedical Informatics, Columbia University, New York, USA
Yanwei Li
Yanwei Li
Research Scientist, ByteDance
Computer VisionGenerative AI
X
Xinzhuo Jiang
Department of Biomedical Informatics, Columbia University, New York, USA
Shalmali Joshi
Shalmali Joshi
Columbia University
Artificial IntelligenceMachine LearningBiomedical SciencesClinical Informatics