OpenTSLM: Time-Series Language Models for Reasoning over Multivariate Medical Text- and Time-Series Data

πŸ“… 2025-10-02
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing large language models (LLMs) struggle to effectively model multivariate medical time series, limiting their utility in clinical decision support. To address this, we propose OpenTSLMβ€”the first LLM architecture that natively incorporates time series as a first-class modality during pretraining. It employs soft prompt initialization and Flamingo-style cross-modal attention to enable joint reasoning over textual and long-horizon sequential data. The method supports memory-efficient training, chain-of-thought reasoning, and seamless multimodal fusion. Evaluated on sleep staging (F1 = 69.9) and human activity recognition (accuracy = 65.4%), OpenTSLM significantly outperforms established baselines. Notably, its compact-parameter variant surpasses GPT-4o in domain-specific performance and has been validated by clinical experts for robust medical reasoning. This work establishes a scalable, high-fidelity paradigm for temporal understanding in multimodal medical AI.

Technology Category

Application Category

πŸ“ Abstract
LLMs have emerged as powerful tools for interpreting multimodal data. In medicine, they hold particular promise for synthesizing large volumes of clinical information into actionable insights and digital health applications. Yet, a major limitation remains their inability to handle time series. To overcome this gap, we present OpenTSLM, a family of Time Series Language Models (TSLMs) created by integrating time series as a native modality to pretrained LLMs, enabling reasoning over multiple time series of any length. We investigate two architectures for OpenTSLM. The first, OpenTSLM-SoftPrompt, models time series implicitly by concatenating learnable time series tokens with text tokens via soft prompting. Although parameter-efficient, we hypothesize that explicit time series modeling scales better and outperforms implicit approaches. We thus introduce OpenTSLM-Flamingo, which integrates time series with text via cross-attention. We benchmark both variants against baselines that treat time series as text tokens or plots, across a suite of text-time-series Chain-of-Thought (CoT) reasoning tasks. We introduce three datasets: HAR-CoT, Sleep-CoT, and ECG-QA-CoT. Across all, OpenTSLM models outperform baselines, reaching 69.9 F1 in sleep staging and 65.4 in HAR, compared to 9.05 and 52.2 for finetuned text-only models. Notably, even 1B-parameter OpenTSLM models surpass GPT-4o (15.47 and 2.95). OpenTSLM-Flamingo matches OpenTSLM-SoftPrompt in performance and outperforms on longer sequences, while maintaining stable memory requirements. By contrast, SoftPrompt grows exponentially in memory with sequence length, requiring around 110 GB compared to 40 GB VRAM when training on ECG-QA with LLaMA-3B. Expert reviews by clinicians find strong reasoning capabilities exhibited by OpenTSLMs on ECG-QA. To facilitate further research, we provide all code, datasets, and models open-source.
Problem

Research questions and friction points this paper is trying to address.

Enabling reasoning over multivariate medical time-series data with language models
Overcoming LLM limitations in handling time series data of varying lengths
Integrating time series as native modality for clinical Chain-of-Thought reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates time series as native modality to LLMs
Uses soft prompting for implicit time series modeling
Employs cross-attention for explicit time series integration
πŸ”Ž Similar Papers
No similar papers found.
P
Patrick Langer
Stanford Mussallem Center for Biodesign, Stanford University
T
Thomas Kaar
Stanford Mussallem Center for Biodesign, Stanford University
M
Max Rosenblattl
Stanford Mussallem Center for Biodesign, Stanford University
M
Maxwell A. Xu
University of Illinois Urbana-Champaign
W
Winnie Chow
Stanford University
Martin Maritsch
Martin Maritsch
Machine Learning Engineer, Amazon Web Services (AWS)
Generative AIMachine LearningDigital Health
A
Aradhana Verma
Division of Cardiovascular Medicine, Stanford University
B
Brian Han
Pediatric Cardiology, Stanford University
D
Daniel Seung Kim
Division of Cardiology, University of Washington
H
Henry Chubb
Pediatric Cardiology, Stanford University
S
Scott Ceresnak
Pediatric Cardiology, Stanford University
A
Aydin Zahedivash
Stanford Mussallem Center for Biodesign, Stanford University
A
Alexander Tarlochan Singh Sandhu
Division of Cardiovascular Medicine, Stanford University
Fatima Rodriguez
Fatima Rodriguez
Assistant Professor in Cardiovascular Medicine, Stanford University
Cardiovascular MedicinePreventionLipid DisordersCardiovascular Disease
Daniel McDuff
Daniel McDuff
Google and University of Washington
Affective ComputingDeep LearningHuman-Computer InteractionHuman-Centered AIComputer Vision
Elgar Fleisch
Elgar Fleisch
Professor for Information and Technology Management
Internet of ThingsInformation ManagementTechnology Management
O
Oliver Aalami
Stanford Mussallem Center for Biodesign, Stanford University
Filipe Barata
Filipe Barata
ETH Zurich - Centre for Digital Health Interventions
Digital BiomarkersMachine LearningDigital HealthUbiquitous ComputingArtificial Intelligence
Paul Schmiedmayer
Paul Schmiedmayer
Stanford University
Digital HealthTSLMAISoftware EngineeringMobile Applications