🤖 AI Summary
To address low driver fatigue detection accuracy caused by scattered, heterogeneous, and privacy-sensitive facial data in real-world driving scenarios, this paper proposes a lightweight framework integrating spatial self-attention with federated learning. Methodologically: (1) a spatial self-attention module is designed to enhance local feature representation of critical facial regions (e.g., eyelids, mouth); (2) a gradient-similarity-based client selection mechanism is introduced to improve global model aggregation quality; (3) temporal ocular dynamics are modeled using LSTM, integrated with frame sampling, face alignment, and multi-strategy data augmentation. Under standard federated settings, the proposed method achieves 89.9% accuracy—significantly outperforming state-of-the-art approaches—while ensuring high accuracy, strong robustness, and end-to-end privacy preservation, demonstrating practical feasibility for in-vehicle deployment.
📝 Abstract
Driver drowsiness is one of the main causes of road accidents and is recognized as a leading contributor to traffic-related fatalities. However, detecting drowsiness accurately remains a challenging task, especially in real-world settings where facial data from different individuals is decentralized and highly diverse. In this paper, we propose a novel framework for drowsiness detection that is designed to work effectively with heterogeneous and decentralized data. Our approach develops a new Spatial Self-Attention (SSA) mechanism integrated with a Long Short-Term Memory (LSTM) network to better extract key facial features and improve detection performance. To support federated learning, we employ a Gradient Similarity Comparison (GSC) that selects the most relevant trained models from different operators before aggregation. This improves the accuracy and robustness of the global model while preserving user privacy. We also develop a customized tool that automatically processes video data by extracting frames, detecting and cropping faces, and applying data augmentation techniques such as rotation, flipping, brightness adjustment, and zooming. Experimental results show that our framework achieves a detection accuracy of 89.9% in the federated learning settings, outperforming existing methods under various deployment scenarios. The results demonstrate the effectiveness of our approach in handling real-world data variability and highlight its potential for deployment in intelligent transportation systems to enhance road safety through early and reliable drowsiness detection.