Seizure-Semiology-Suite (S3): A Clinically Multimodal Dataset, Benchmark, and Models for Seizure Semiology Understanding

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This study addresses the challenge that current vision-language models struggle to accurately interpret the involuntary, spatiotemporally dynamic pathological motor behaviors characteristic of epileptic seizures. To bridge this gap, we introduce the first multimodal benchmark dataset dedicated to seizure semiology, comprising 438 videos with over 35,000 dense annotations, and propose a seven-task hierarchical evaluation framework spanning from visual perception to diagnostic report generation. We further develop Seizure-RQI, a clinically interpretable metric for assessing report quality, and conduct baseline experiments using eleven open-source multimodal large language models enhanced by seizure-specific fine-tuning and a two-stage neuro-symbolic hybrid framework. Our approach achieves an F1 score of 0.96 on epileptic versus non-epileptic classification, significantly outperforming end-to-end methods, while exposing systematic deficiencies in existing models regarding lateralization inference and temporal symptom modeling.

📝 Abstract

While Multimodal Large Language Models (MLLMs) have demonstrated remarkable proficiency in general video understanding, their capacity to interpret involuntary, and spatio-temporally evolving pathologic motor behaviors such as seizure semiology remains largely untested. To address this gap, we introduce Seizure-Semiology-Suite, a clinically grounded dataset and benchmark for fine-grained, structured seizure semiology understanding. The dataset includes 438 seizure videos annotated with over 35,000 dense labels covering 20 ILAE-defined semiological features. Building on this dataset, we propose a seven-task hierarchical benchmark that systematically evaluates MLLMs from low-level visual perception to temporal sequencing, narrative report generation, and seizure diagnosis. To enable clinically meaningful evaluation of generated reports, we further introduce the Report Quality Index for Seizure Semiology (Seizure-RQI). Extensive baselines across 11 open-weight MLLMs reveal systematic weaknesses in laterality reasoning, temporal localization, symptom sequencing, and clinically faithful reporting. We show that seizure-specific fine-tuning substantially improves performance across tasks, and that a two-stage neuro-symbolic framework achieves an F1 score of 0.96 on epileptic versus non-epileptic seizure classification. Seizure-Semiology-Suite establishes a rigorous benchmark for evaluating multimodal models in safety-critical medical video understanding and guides the development of clinically reliable, domain-adaptive multimodal intelligence.

Problem

Research questions and friction points this paper is trying to address.

seizure semiology

multimodal large language models

clinical video understanding

epileptic seizure

medical AI benchmark

Innovation

Methods, ideas, or system contributions that make the work stand out.

Seizure Semiology

Multimodal Large Language Models

Clinical Video Understanding