Can we generate portable representations for clinical time series data using LLMs?

📅 2026-03-25

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This work addresses the significant performance degradation of clinical machine learning models when deployed across hospitals, primarily caused by distribution shifts stemming from a lack of transferable patient representations. To overcome this, the authors propose a lightweight, no-fine-tuning-required portable representation method: irregular ICU time series are converted into natural language summaries via structured prompts using a frozen large language model (LLM), and these summaries are then encoded into fixed-length patient vectors by a frozen text embedding model for downstream prediction tasks. Evaluated on three cohorts—MIMIC-IV, HiRID, and PPICU—the approach matches state-of-the-art baselines in within-hospital performance while exhibiting smaller performance drops, lower prediction variance, and reduced leakage of sensitive attributes during cross-center deployment. It also substantially lowers engineering overhead and enhances few-shot learning efficacy.

Technology Category

Application Category

📝 Abstract

Deploying clinical ML is slow and brittle: models that work at one hospital often degrade under distribution shifts at the next. In this work, we study a simple question -- can large language models (LLMs) create portable patient embeddings i.e. representations of patients enable a downstream predictor built on one hospital to be used elsewhere with minimal-to-no retraining and fine-tuning. To do so, we map from irregular ICU time series onto concise natural language summaries using a frozen LLM, then embed each summary with a frozen text embedding model to obtain a fixed length vector capable of serving as input to a variety of downstream predictors. Across three cohorts (MIMIC-IV, HIRID, PPICU), on multiple clinically grounded forecasting and classification tasks, we find that our approach is simple, easy to use and competitive with in-distribution with grid imputation, self-supervised representation learning, and time series foundation models, while exhibiting smaller relative performance drops when transferring to new hospitals. We study the variation in performance across prompt design, with structured prompts being crucial to reducing the variance of the predictive models without altering mean accuracy. We find that using these portable representations improves few-shot learning and does not increase demographic recoverability of age or sex relative to baselines, suggesting little additional privacy risk. Our work points to the potential that LLMs hold as tools to enable the scalable deployment of production grade predictive models by reducing the engineering overhead.

Problem

Research questions and friction points this paper is trying to address.

portable representations

clinical time series

distribution shift

model transferability

LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

portable patient embeddings

large language models

clinical time series