Staging by the Book: Automatic Sleep Stage Classification Using Scoring Rules

📅 2026-05-19
📈 Citations: 0
Influential: 0
📄 PDF

career value

189K/year
🤖 AI Summary
This study addresses the lack of transparency and explicit adherence to clinical guidelines in existing automatic sleep staging models, despite their near-expert performance. The authors present the first complete formalization of the American Academy of Sleep Medicine (AASM) scoring rules into an executable program, yielding a deterministic, rule-driven system that produces clinically compliant and interpretable sleep stage assignments. The approach generates natural language explanations grounded in the AASM rules for each staging decision and enables auditing and debugging of deep learning models. Evaluated on polysomnographic data from 50 subjects, the system achieves an overall accuracy of 60.5% (Cohen’s κ = 0.42), with a recall of 83.5% for N2 sleep and 68.7% for rapid eye movement (REM) sleep, demonstrating the feasibility and traceability of formalized clinical rule implementation.
📝 Abstract
Automated sleep staging is commonly approached as a supervised machine learning problem, with deep learning methods dominating recent research. While machine learning models achieve near-human level agreement with human-scored reference sleep stages, their decisions are typically opaque and not designed to follow clinical scoring rules. We propose a transparent alternative: a deterministic, rule-based sleep staging method that explicitly operationalizes the American Academy of Sleep Medicine's (AASM) scoring logic as executable code, coupled with epoch-level natural-language justifications derived from an explanation trace. We evaluate the approach on 50 polysomnography recordings with a 10-scorer majority-vote consensus as reference. Across all recordings, the method agreed with the majority-vote reference in 60.5% of epochs ($κ=0.42$), with substantially higher agreement on a dataset used during development (77.1%, $κ=0.61$). Agreement with the reference was highest for sleep stage N2 (recall 83.5%) and moderate for sleep stage R (recall 68.7%), while Wake and N1 recall were low. Despite lower agreement with the reference than contemporary deep learning models, the method provides deterministic decisions and natural language explanations aligned with AASM scoring rules, making it a complementary tool for auditing, debugging, and governing deep learning-based sleep staging.
Problem

Research questions and friction points this paper is trying to address.

sleep staging
AASM scoring rules
interpretability
clinical guidelines
automated sleep classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

rule-based sleep staging
AASM scoring rules
explainable AI
deterministic decision
natural language justification
E
Emil Hardarson
Department of Computer Science, Reykjavik University, Reykjavik, Iceland; Reykjavik University Sleep Institute, Reykjavik University, Reykjavik, Iceland
Konstantin Popov
Konstantin Popov
Engelhardt Institute of Molecular Biology Russian Academy of Sciences
cell biology
S
Sigridur Sigurdardottir
Reykjavik University Sleep Institute, Reykjavik University, Reykjavik, Iceland
Anna Sigridur Islind
Anna Sigridur Islind
Professor, Reykjavik University
Information SystemsCo-DesignDigital PlatformsData-driven HealthcareDigital Health
E
Erna Sif Arnardóttir
Department of Computer Science, Reykjavik University, Reykjavik, Iceland; Reykjavik University Sleep Institute, Reykjavik University, Reykjavik, Iceland; Department of Engineering, Reykjavik University, Reykjavik, Iceland
María Óskarsdóttir
María Óskarsdóttir
Associate Professor University of Southampton and Reykjavík University
Data ScienceSocial Network AnalyticsBusiness AnalyticsMachine LearningNetwork Science