Text Knows What, Tables Know When: Clinical Timeline Reconstruction via Retrieval-Augmented Multimodal Alignment

📅 2026-05-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

168K/year
🤖 AI Summary
This work addresses the incompleteness of patient timelines caused by the absence of precise timestamps in clinical notes and the omission of numerous events in structured electronic health records (EHRs). It proposes the first multimodal alignment framework incorporating retrieval-augmented mechanisms to jointly leverage the semantic richness of clinical text and the temporal precision of EHR tabular data. Anchoring on key events, the method constructs absolute clinical timelines by inferring relative temporal offsets from text and calibrating them with structured EHR entries. The approach employs graph-based multi-stage modeling, instruction-tuned large language models, and cross-modal retrieval alignment, evaluated on MIMIC-III/IV using the AULTC metric. Experiments demonstrate significant improvements in absolute timestamp accuracy and temporal consistency on the i2m4 benchmark, with 34.8% of text-derived events absent from tabular records, underscoring the efficacy of multimodal fusion for building more complete and precise patient trajectories.
📝 Abstract
Reconstructing precise clinical timelines is essential for modeling patient trajectories and forecasting risk in complex, heterogeneous conditions like sepsis. While unstructured clinical narratives offer semantically rich and contextually complete descriptions of a patient's course, they often lack temporal precision and contain ambiguous event timing. Conversely, structured electronic health record (EHR) data provides precise temporal anchors but misses a substantial portion of clinically meaningful events. We introduce a retrieval-augmented multimodal alignment framework that bridges this gap to improve the temporal precision of absolute clinical timelines extracted from text. Our approach formulates timeline reconstruction as a graph-based multistep process: it first extracts central anchor events from narratives to build an initial temporal scaffold, places non-central events relative to this backbone, and then calibrates the timeline using retrieved structured EHR rows as external temporal evidence. Evaluated using instruction-tuned large language models on the i2m4 benchmark spanning MIMIC-III and MIMIC-IV, our multimodal pipeline consistently improves absolute timestamp accuracy (AULTC) and improves temporal concordance across nearly all evaluated models over unimodal text-only reconstruction, without compromising event match rates. Furthermore, our empirical gap analysis reveals that 34.8% of text-derived events are entirely absent from tabular records, demonstrating that aligning these modalities can produce a more temporally faithful and clinically informative reconstruction of patient trajectories than either source alone.
Problem

Research questions and friction points this paper is trying to address.

clinical timeline reconstruction
temporal precision
multimodal alignment
electronic health records
clinical narratives
Innovation

Methods, ideas, or system contributions that make the work stand out.

retrieval-augmented
multimodal alignment
clinical timeline reconstruction
temporal calibration
electronic health records