Forecasting from Clinical Textual Time Series: Adaptations of the Encoder and Decoder Language Model Families

📅 2025-04-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper introduces the first text-based time series forecasting paradigm for clinical prediction tasks, using LLM-assisted timestamped clinical findings as input to jointly model event occurrence prediction, temporal ordering, and survival analysis. Methodologically, it constructs temporally structured textual inputs, comparatively fine-tunes encoder (e.g., BERT) and decoder (e.g., Llama) architectures, incorporates masked ordering learning to explicitly capture temporal dependencies, and designs a time-aware survival modeling strategy. Key contributions include: (1) the first empirical demonstration that temporal structure—not lexical sequence—is decisive for clinical prediction performance; (2) complementary strengths—encoders excel in event prediction (superior F1 and temporal consistency), while decoders outperform in early-stage survival prognosis; and (3) an average 12.3% improvement in Concordance index over baselines.

Technology Category

Application Category

📝 Abstract
Clinical case reports encode rich, temporal patient trajectories that are often underexploited by traditional machine learning methods relying on structured data. In this work, we introduce the forecasting problem from textual time series, where timestamped clinical findings--extracted via an LLM-assisted annotation pipeline--serve as the primary input for prediction. We systematically evaluate a diverse suite of models, including fine-tuned decoder-based large language models and encoder-based transformers, on tasks of event occurrence prediction, temporal ordering, and survival analysis. Our experiments reveal that encoder-based models consistently achieve higher F1 scores and superior temporal concordance for short- and long-horizon event forecasting, while fine-tuned masking approaches enhance ranking performance. In contrast, instruction-tuned decoder models demonstrate a relative advantage in survival analysis, especially in early prognosis settings. Our sensitivity analyses further demonstrate the importance of time ordering, which requires clinical time series construction, as compared to text ordering, the format of the text inputs that LLMs are classically trained on. This highlights the additional benefit that can be ascertained from time-ordered corpora, with implications for temporal tasks in the era of widespread LLM use.
Problem

Research questions and friction points this paper is trying to address.

Forecasting clinical events from textual time series data
Comparing encoder and decoder models for temporal predictions
Evaluating time ordering impact on clinical text analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-assisted annotation pipeline for clinical text
Encoder-based models for event forecasting
Fine-tuned decoder models for survival analysis
🔎 Similar Papers
No similar papers found.