A Large-Language Model Framework for Relative Timeline Extraction from PubMed Case Reports

📅 2025-04-15

📈 Citations: 1

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This study introduces the clinical event relative timeline extraction task for PubMed case reports, aiming to convert unstructured text into temporally ordered event sequences annotated with relative temporal relations—enabling patient trajectory modeling, causal reasoning, and process prediction. Methodologically, we establish the first medical-domain relative timeline annotation guideline and design a multi-LLM consistency evaluation framework to create a new benchmark; zero-shot event identification and relative ordering are performed using large language models (e.g., O1-preview), followed by human verification and cross-model consistency analysis. Experiments achieve 0.80 event recall and 0.95 temporal ordering accuracy on real-world case reports, demonstrating high-fidelity temporal structuring. Our core contributions include: (1) formal definition of a novel NLP task in clinical text understanding; (2) construction of the first domain-specific annotation schema and evaluation framework for relative timelines; and (3) empirical validation of LLMs’ effectiveness in medical relative temporal relation extraction.

Technology Category

Application Category

📝 Abstract

Timing of clinical events is central to characterization of patient trajectories, enabling analyses such as process tracing, forecasting, and causal reasoning. However, structured electronic health records capture few data elements critical to these tasks, while clinical reports lack temporal localization of events in structured form. We present a system that transforms case reports into textual time series-structured pairs of textual events and timestamps. We contrast manual and large language model (LLM) annotations (n=320 and n=390 respectively) of ten randomly-sampled PubMed open-access (PMOA) case reports (N=152,974) and assess inter-LLM agreement (n=3,103; N=93). We find that the LLM models have moderate event recall(O1-preview: 0.80) but high temporal concordance among identified events (O1-preview: 0.95). By establishing the task, annotation, and assessment systems, and by demonstrating high concordance, this work may serve as a benchmark for leveraging the PMOA corpus for temporal analytics.

Problem

Research questions and friction points this paper is trying to address.

Extracting relative timelines from clinical case reports

Transforming unstructured clinical events into time series data

Assessing LLM performance in temporal event annotation

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM extracts events and timestamps from case reports

System transforms reports into time series-structured pairs

Benchmark for temporal analytics using PubMed corpus

🔎 Similar Papers

No similar papers found.