Introducing Multimodal Paradigm for Learning Sleep Staging PSG via General-Purpose Model

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing automated sleep staging methods rely heavily on handcrafted polysomnography (PSG) features and domain-specific models, suffering from poor interpretability and high data requirements. To address these limitations, we propose a multimodal foundation modeling paradigm: raw PSG time-series signals are losslessly transformed into 2D waveform images to preserve temporal structure and emulate clinical visual inspection; for the first time, a general-purpose multimodal large language model is adapted to sleep staging via end-to-end fine-tuning, enabling cross-modal feature fusion and attention-driven interpretability. Evaluated on three large-scale public benchmarks—ISRUC, MASS, and SHHS—our method achieves significant improvements over state-of-the-art approaches in accuracy, robustness, and generalizability. Results demonstrate strong clinical applicability and highlight the paradigm’s potential for broader biomedical signal analysis.

Technology Category

Application Category

📝 Abstract

Sleep staging is essential for diagnosing sleep disorders and assessing neurological health. Existing automatic methods typically extract features from complex polysomnography (PSG) signals and train domain-specific models, which often lack intuitiveness and require large, specialized datasets. To overcome these limitations, we introduce a new paradigm for sleep staging that leverages large multimodal general-purpose models to emulate clinical diagnostic practices. Specifically, we convert raw one-dimensional PSG time-series into intuitive two-dimensional waveform images and then fine-tune a multimodal large model to learn from these representations. Experiments on three public datasets (ISRUC, MASS, SHHS) demonstrate that our approach enables general-purpose models, without prior exposure to sleep data, to acquire robust staging capabilities. Moreover, explanation analysis reveals our model learned to mimic the visual diagnostic workflow of human experts for sleep staging by PSG images. The proposed method consistently outperforms state-of-the-art baselines in accuracy and robustness, highlighting its efficiency and practical value for medical applications. The code for the signal-to-image pipeline and the PSG image dataset will be released.

Problem

Research questions and friction points this paper is trying to address.

Transforming 1D sleep signals into 2D visual representations

Fine-tuning general models for robust sleep staging without prior data

Mimicking clinical visual workflows to improve diagnostic accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Converts 1D PSG signals into 2D waveform images

Fine-tunes multimodal general-purpose models for sleep staging

Mimics visual diagnostic workflow using PSG image representations

🔎 Similar Papers

No similar papers found.

Authors to Follow