Benchmarking LLMs for Predictive Applications in the Intensive Care Units

📅 2025-12-23
📈 Citations: 0
Influential: 0
📄 PDF

career value

181K/year
🤖 AI Summary
Clinical shock prediction in ICU settings remains challenging due to complex temporal dynamics and severe class imbalance (e.g., Shock Index > 0.7). Method: This study systematically evaluates large language models (LLMs) on ICU shock prediction using a benchmark of 17,294 longitudinal clinical episodes derived from MIMIC-III. It comparatively assesses general-purpose LLMs—including GatorTron-Base, Llama-8B, and Mistral-7B—against domain-specialized small language models (SLMs) such as BioBERT and Doc2Vec. To address class imbalance, we propose a joint optimization strategy combining focal loss and cross-entropy loss. Contribution/Results: GatorTron-Base achieves the highest weighted recall (80.5%), yet overall LLM performance is comparable to SLMs—no statistically significant advantage is observed. The findings challenge the assumption that general-purpose LLMs inherently outperform domain-adapted models in intricate clinical time-series forecasting. We advocate a paradigm shift toward modeling dynamic clinical processes explicitly, rather than relying on generic sequence modeling capabilities.

Technology Category

Application Category

📝 Abstract
With the advent of LLMs, various tasks across the natural language processing domain have been transformed. However, their application in predictive tasks remains less researched. This study compares large language models, including GatorTron-Base (trained on clinical data), Llama 8B, and Mistral 7B, against models like BioBERT, DocBERT, BioClinicalBERT, Word2Vec, and Doc2Vec, setting benchmarks for predicting Shock in critically ill patients. Timely prediction of shock can enable early interventions, thus improving patient outcomes. Text data from 17,294 ICU stays of patients in the MIMIC III database were scored for length of stay > 24 hours and shock index (SI) > 0.7 to yield 355 and 87 patients with normal and abnormal SI-index, respectively. Both focal and cross-entropy losses were used during finetuning to address class imbalances. Our findings indicate that while GatorTron Base achieved the highest weighted recall of 80.5%, the overall performance metrics were comparable between SLMs and LLMs. This suggests that LLMs are not inherently superior to SLMs in predicting future clinical events despite their strong performance on text-based tasks. To achieve meaningful clinical outcomes, future efforts in training LLMs should prioritize developing models capable of predicting clinical trajectories rather than focusing on simpler tasks such as named entity recognition or phenotyping.
Problem

Research questions and friction points this paper is trying to address.

Benchmarking LLMs for predicting shock in ICU patients
Comparing LLMs and SLMs on clinical predictive tasks
Addressing class imbalance in clinical event prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Comparing LLMs and SLMs for clinical shock prediction
Using focal and cross-entropy loss to address class imbalance
Training models on clinical data for trajectory prediction
🔎 Similar Papers
No similar papers found.
C
Chehak Malhotra
Computer Science, Indraprastha Institute of Information Technology Delhi, Delhi, India
M
Mehak Gopal
Computational Biology, Indraprastha Institute of Information Technology Delhi, Delhi, India
A
Akshaya Devadiga
Computational Biology, Indraprastha Institute of Information Technology Delhi, Delhi, India
Pradeep Singh
Pradeep Singh
Professor of Mechanical Engineering, Sant Longowal Institute of Engineering & Technology, Longowal
Tolerance Design of Mechanical AssembliesConcurrent Engineering – Design for Manufacture and AssemblyModelling & Simulatio
Ridam Pal
Ridam Pal
Computational Biology, Indraprastha Institute of Information Technology Delhi, Delhi, India
R
Ritwik Kashyap
Computational Biology, Indraprastha Institute of Information Technology Delhi, Delhi, India
Tavpritesh Sethi
Tavpritesh Sethi
Computational Biology, Indraprastha Institute of Information Technology Delhi, Delhi, India