Learning Cross-Joint Attention for Generalizable Video-Based Seizure Detection

📅 2026-03-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing video-based epilepsy detection methods suffer from limited generalization, susceptibility to background distractions, and reliance on subject-specific appearance cues. To address these limitations, this work proposes a joint-centric attention model that leverages only human skeletal dynamics. The approach first extracts video clips centered on body joints, tokenizes them using the Video Vision Transformer (ViViT), and incorporates a cross-joint attention mechanism to capture spatiotemporal coordination patterns among body parts. By focusing exclusively on joint motion, the method effectively eliminates background bias and achieves superior performance under cross-subject evaluation settings, significantly outperforming state-of-the-art CNNs, graph neural networks, and Transformer-based approaches in generalizing to unseen subjects.

Technology Category

Application Category

📝 Abstract
Automated seizure detection from long-term clinical videos can substantially reduce manual review time and enable real-time monitoring. However, existing video-based methods often struggle to generalize to unseen subjects due to background bias and reliance on subject-specific appearance cues. We propose a joint-centric attention model that focuses exclusively on body dynamics to improve cross-subject generalization. For each video segment, body joints are detected and joint-centered clips are extracted, suppressing background context. These joint-centered clips are tokenized using a Video Vision Transformer (ViViT), and cross-joint attention is learned to model spatial and temporal interactions between body parts, capturing coordinated movement patterns characteristic of seizure semiology. Extensive cross-subject experiments show that the proposed method consistently outperforms state-of-the-art CNN-, graph-, and transformer-based approaches on unseen subjects.
Problem

Research questions and friction points this paper is trying to address.

seizure detection
cross-subject generalization
video-based analysis
background bias
appearance cues
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-joint attention
joint-centric modeling
video-based seizure detection
generalizable representation
Video Vision Transformer
🔎 Similar Papers
No similar papers found.
O
Omar Zamzam
Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California
T
Takfarinas Medani
Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California
C
Chinmay Chinara
Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California
Richard M. Leahy
Richard M. Leahy
Leonard Silverman Chair in Electrical and Computer Engineering University of Southern California
medical imagingbrain mappingsignal processingimage processing