MILM: Large Language Models for Multimodal Irregular Time Series with Informative Sampling

📅 2026-05-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

178K/year
🤖 AI Summary
This work addresses the challenge of modeling heterogeneous, asynchronous, and irregularly sampled multimodal time series in electronic health records—such as laboratory tests and clinical notes—by representing data as temporally ordered XML triplets. The authors propose a two-stage large language model fine-tuning strategy: the first stage models only the sampling patterns, while the second jointly learns both sampling patterns and observed values. This approach is the first to explicitly leverage sampling timestamps and channel information as predictive signals, yielding significant performance gains in scenarios with missing numerical observations. Experiments demonstrate that the model achieves state-of-the-art or near-best average performance across multiple EHR datasets. In held-out value evaluation, the two-stage architecture substantially outperforms single-stage baselines, and preserving the temporal and channel context of missing values further enhances in-hospital mortality prediction.
📝 Abstract
Multimodal irregular time series (MITS) consist of asynchronous and irregularly sampled observations from heterogeneous numerical and textual channels. In healthcare, for example, patients' electronic health records (EHR) include irregular lab measurements and clinical notes. The irregular timing and channel patterns of observations carry predictive signal alongside the numerical values and textual content. LLMs are natural candidates for processing such heterogeneous data, given their extensive pretrained knowledge spanning textual and numerical domains. We introduce MILM (Multimodal Irregular time series Language Model), which represents MITS as time-ordered triplets in Extensible Markup Language (XML) format and fine-tunes an LLM through a two-stage strategy for MITS classification. The first stage trains on value-redacted MITS to predict from sampling patterns alone, and the second stage trains on full MITS to jointly model sampling patterns and observed values. Our two-stage model (MILM-2S) and its single-stage counterpart (MILM-Direct) achieve the best and second-best average performance on multiple EHR datasets. Further value redaction evaluations confirm that sampling patterns carry predictive signal and that MILM-2S learns to exploit them. In the value pending evaluation we introduce, where some values are unavailable at prediction time, MILM-2S outperforms MILM-Direct by a larger margin compared to standard evaluation. For MILM-2S, preserving the time and channel of value-pending observations as additional sampling information further improves in-hospital mortality prediction.
Problem

Research questions and friction points this paper is trying to address.

Multimodal Irregular Time Series
Informative Sampling
Missing Values
Electronic Health Records
Predictive Modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal Irregular Time Series
Large Language Models
Informative Sampling
Two-stage Fine-tuning
Value-pending Prediction