Staging by the Book: Automatic Sleep Stage Classification Using Scoring Rules

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This study addresses the lack of transparency and explicit adherence to clinical guidelines in existing automatic sleep staging models, despite their near-expert performance. The authors present the first complete formalization of the American Academy of Sleep Medicine (AASM) scoring rules into an executable program, yielding a deterministic, rule-driven system that produces clinically compliant and interpretable sleep stage assignments. The approach generates natural language explanations grounded in the AASM rules for each staging decision and enables auditing and debugging of deep learning models. Evaluated on polysomnographic data from 50 subjects, the system achieves an overall accuracy of 60.5% (Cohen’s κ = 0.42), with a recall of 83.5% for N2 sleep and 68.7% for rapid eye movement (REM) sleep, demonstrating the feasibility and traceability of formalized clinical rule implementation.

📝 Abstract

Automated sleep staging is commonly approached as a supervised machine learning problem, with deep learning methods dominating recent research. While machine learning models achieve near-human level agreement with human-scored reference sleep stages, their decisions are typically opaque and not designed to follow clinical scoring rules. We propose a transparent alternative: a deterministic, rule-based sleep staging method that explicitly operationalizes the American Academy of Sleep Medicine's (AASM) scoring logic as executable code, coupled with epoch-level natural-language justifications derived from an explanation trace. We evaluate the approach on 50 polysomnography recordings with a 10-scorer majority-vote consensus as reference. Across all recordings, the method agreed with the majority-vote reference in 60.5% of epochs ($κ=0.42$), with substantially higher agreement on a dataset used during development (77.1%, $κ=0.61$). Agreement with the reference was highest for sleep stage N2 (recall 83.5%) and moderate for sleep stage R (recall 68.7%), while Wake and N1 recall were low. Despite lower agreement with the reference than contemporary deep learning models, the method provides deterministic decisions and natural language explanations aligned with AASM scoring rules, making it a complementary tool for auditing, debugging, and governing deep learning-based sleep staging.

Problem

Research questions and friction points this paper is trying to address.

sleep staging

AASM scoring rules

interpretability

clinical guidelines

automated sleep classification

Innovation

Methods, ideas, or system contributions that make the work stand out.

rule-based sleep staging

AASM scoring rules

explainable AI