🤖 AI Summary
Frequent road accidents caused by driver fatigue are exacerbated by coarse temporal granularity in existing video-level annotated datasets—where only whole video clips are labeled as containing yawning—introducing significant temporal noise and severely limiting yawning recognition accuracy. To address this, we introduce YawDD+, the first frame-level fine-grained yawning dataset, built via a human-in-the-loop semi-automatic annotation pipeline enabling precise start/end-frame labeling. Leveraging this high-quality data, we design a lightweight MNasNet-based classifier and a YOLOv11-based detector, both optimized for deployment on the NVIDIA Jetson Nano edge platform. Experiments demonstrate state-of-the-art performance: 99.34% frame-level classification accuracy and 95.69% mAP for yawning detection—improving upon video-level supervision by +6.0% accuracy and +5.0% mAP—while achieving real-time inference at 59.8 FPS. These results empirically validate that enhanced annotation granularity is decisive for robust, low-latency driver fatigue monitoring.
📝 Abstract
Driver fatigue remains a leading cause of road accidents, with 24% of crashes involving drowsy drivers. While yawning serves as an early behavioral indicator of fatigue, existing machine learning approaches face significant challenges due to video-annotated datasets that introduce systematic noise from coarse temporal annotations. We develop a semi-automated labeling pipeline with human-in-the-loop verification, which we apply to YawDD, enabling more accurate model training. Training the established MNasNet classifier and YOLOv11 detector architectures on YawDD+ improves frame accuracy by up to 6% and mAP by 5% over video-level supervision, achieving 99.34% classification accuracy and 95.69% detection mAP. The resulting approach deliver up to 59.8 FPS on edge AI hardware (NVIDIA Jetson Nano), confirming that enhanced data quality alone supports on-device yawning monitoring without server-side computation.