Predicting Post-Traumatic Epilepsy from Clinical Records using Large Language Model Embeddings

📅 2026-04-15

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This study addresses the challenge of early prediction of post-traumatic epilepsy (PTE), which is hindered by high clinical data heterogeneity, scarcity of positive cases, and reliance on costly neuroimaging. The authors propose a novel imaging-free predictive framework that leverages a pre-trained large language model (LLM) to generate embeddings from routine acute-phase clinical text, integrates these with structured tabular features via a modality-aware fusion strategy, and employs a gradient-boosting tree classifier. In stratified cross-validation, the model achieves strong performance with an AUC-ROC of 0.892 and an AUPRC of 0.798, demonstrating for the first time that PTE risk can be effectively predicted using only standard clinical documentation. Key predictive factors include acute symptomatic seizures, injury severity, neurosurgical intervention, and duration of ICU stay.

Technology Category

Application Category

📝 Abstract

Objective: Post-traumatic epilepsy (PTE) is a debilitating neurological disorder that develops after traumatic brain injury (TBI). Early prediction of PTE remains challenging due to heterogeneous clinical data, limited positive cases, and reliance on resource-intensive neuroimaging data. We investigate whether routinely collected acute clinical records alone can support early PTE prediction using language model-based approaches. Methods: Using a curated subset of the TRACK-TBI cohort, we developed an automated PTE prediction framework that implements pretrained large language models (LLMs) as fixed feature extractors to encode clinical records. Tabular features, LLM-generated embeddings, and hybrid feature representations were evaluated using gradient-boosted tree classifiers under stratified cross-validation. Results: LLM embeddings achieved performance improvements by capturing contextual clinical information compared to using tabular features alone. The best performance was achieved by a modality-aware feature fusion strategy combining tabular features and LLM embeddings, achieving an AUC-ROC of 0.892 and AUPRC of 0.798. Acute post-traumatic seizures, injury severity, neurosurgical intervention, and ICU stay are key contributors to the predictive performance. Significance: These findings demonstrate that routine acute clinical records contain information suitable for early PTE risk prediction using LLM embeddings in conjunction with gradient-boosted tree classifiers. This approach represents a promising complement to imaging-based prediction.

Problem

Research questions and friction points this paper is trying to address.

Post-traumatic epilepsy

Traumatic brain injury

Clinical records

Early prediction

Large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

large language models

post-traumatic epilepsy

clinical text embeddings