Introducing Multimodal Paradigm for Learning Sleep Staging PSG via General-Purpose Model

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing automated sleep staging methods rely heavily on handcrafted polysomnography (PSG) features and domain-specific models, suffering from poor interpretability and high data requirements. To address these limitations, we propose a multimodal foundation modeling paradigm: raw PSG time-series signals are losslessly transformed into 2D waveform images to preserve temporal structure and emulate clinical visual inspection; for the first time, a general-purpose multimodal large language model is adapted to sleep staging via end-to-end fine-tuning, enabling cross-modal feature fusion and attention-driven interpretability. Evaluated on three large-scale public benchmarks—ISRUC, MASS, and SHHS—our method achieves significant improvements over state-of-the-art approaches in accuracy, robustness, and generalizability. Results demonstrate strong clinical applicability and highlight the paradigm’s potential for broader biomedical signal analysis.

Technology Category

Application Category

📝 Abstract
Sleep staging is essential for diagnosing sleep disorders and assessing neurological health. Existing automatic methods typically extract features from complex polysomnography (PSG) signals and train domain-specific models, which often lack intuitiveness and require large, specialized datasets. To overcome these limitations, we introduce a new paradigm for sleep staging that leverages large multimodal general-purpose models to emulate clinical diagnostic practices. Specifically, we convert raw one-dimensional PSG time-series into intuitive two-dimensional waveform images and then fine-tune a multimodal large model to learn from these representations. Experiments on three public datasets (ISRUC, MASS, SHHS) demonstrate that our approach enables general-purpose models, without prior exposure to sleep data, to acquire robust staging capabilities. Moreover, explanation analysis reveals our model learned to mimic the visual diagnostic workflow of human experts for sleep staging by PSG images. The proposed method consistently outperforms state-of-the-art baselines in accuracy and robustness, highlighting its efficiency and practical value for medical applications. The code for the signal-to-image pipeline and the PSG image dataset will be released.
Problem

Research questions and friction points this paper is trying to address.

Transforming 1D sleep signals into 2D visual representations
Fine-tuning general models for robust sleep staging without prior data
Mimicking clinical visual workflows to improve diagnostic accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Converts 1D PSG signals into 2D waveform images
Fine-tunes multimodal general-purpose models for sleep staging
Mimics visual diagnostic workflow using PSG image representations
🔎 Similar Papers
No similar papers found.
J
Jianheng Zhou
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore
C
Chenyu Liu
College of Computing and Data Science, Nanyang Technological University, Singapore
J
Jinan Zhou
Nutanix, CA, USA
Y
Yi Ding
College of Computing and Data Science, Nanyang Technological University, Singapore
Y
Yang Liu
Center for Machine Vision and Signal Analysis, University of Oulu, Oulu, Finland
Haoran Luo
Haoran Luo
Nanyang Technological University
Knowledge GraphLarge Language ModelsGraph Neural Networks
Z
Ziyu Jia
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Xinliang Zhou
Xinliang Zhou
Nanyang Technological University, Singapore
Brain Computer InterfacesFoundation ModelsxAI