Multimodal Sleep Stage and Sleep Apnea Classification Using Vision Transformer: A Multitask Explainable Learning Approach

📅 2025-02-18

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Existing sleep staging and disorder classification methods are predominantly single-task and single-modality, failing to capture the underlying physiological interdependencies between them. To address this, we propose the first multimodal, multitask 1D-Vision Transformer (1D-ViT) framework specifically designed for univariate physiological time-series signals, jointly performing five-stage sleep staging and sleep apnea classification using synchronized PPG, airflow, and respiratory effort signals. Our approach innovatively adapts the Vision Transformer architecture to 1D physiological sequences and incorporates attention-weight visualization to enhance model interpretability. Experimental results demonstrate staging accuracy of 78% (quadratic weighted kappa κ = 0.66) and apnea classification accuracy of 74% (κ = 0.58). Attention analysis further reveals that respiratory waveform peaks and troughs serve as critical discriminative features. This work establishes a novel paradigm for multitask modeling of physiological time-series data.

Technology Category

Application Category

📝 Abstract

Sleep is an essential component of human physiology, contributing significantly to overall health and quality of life. Accurate sleep staging and disorder detection are crucial for assessing sleep quality. Studies in the literature have proposed PSG-based approaches and machine-learning methods utilizing single-modality signals. However, existing methods often lack multimodal, multilabel frameworks and address sleep stages and disorders classification separately. In this paper, we propose a 1D-Vision Transformer for simultaneous classification of sleep stages and sleep disorders. Our method exploits the sleep disorders' correlation with specific sleep stage patterns and performs a simultaneous identification of a sleep stage and sleep disorder. The model is trained and tested using multimodal-multilabel sensory data (including photoplethysmogram, respiratory flow, and respiratory effort signals). The proposed method shows an overall accuracy (cohen's Kappa) of 78% (0.66) for five-stage sleep classification and 74% (0.58) for sleep apnea classification. Moreover, we analyzed the encoder attention weights to clarify our models' predictions and investigate the influence different features have on the models' outputs. The result shows that identified patterns, such as respiratory troughs and peaks, make a higher contribution to the final classification process.

Problem

Research questions and friction points this paper is trying to address.

Simultaneous sleep stage and disorder classification

Multimodal-multilabel sensory data utilization

Explainable learning via encoder attention weights

Innovation

Methods, ideas, or system contributions that make the work stand out.

1D-Vision Transformer for sleep analysis

Multimodal-multilabel sensory data utilization

Explainable learning with encoder attention weights

🔎 Similar Papers

Developing a Dual-Stage Vision Transformer Model for Lung Disease Classification