Driver-Net: Multi-Camera Fusion for Assessing Driver Take-Over Readiness in Automated Vehicles

📅 2025-07-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address inaccurate and delayed assessment of driver takeover readiness in autonomous driving, this paper proposes a non-intrusive, multi-view fusion method leveraging three synchronized cameras. The approach captures concurrent video streams of head, hand, and full-body pose, and introduces a novel dual-path spatiotemporal network to jointly model whole-body dynamic behavior for the first time. It incorporates context-aware modeling and feature enhancement modules, integrated with a cross-modal fusion strategy to enable end-to-end learning. Evaluated on the University of Leeds driving simulator dataset, the method achieves 95.8% accuracy in classifying takeover readiness—significantly outperforming existing single-modality approaches. It demonstrates real-time inference capability and robustness under varying conditions, fulfilling emerging functional safety requirements—particularly those specified in ISO 21448 (SOTIF)—regarding reliable human–machine handover.

Technology Category

Application Category

📝 Abstract
Ensuring safe transition of control in automated vehicles requires an accurate and timely assessment of driver readiness. This paper introduces Driver-Net, a novel deep learning framework that fuses multi-camera inputs to estimate driver take-over readiness. Unlike conventional vision-based driver monitoring systems that focus on head pose or eye gaze, Driver-Net captures synchronised visual cues from the driver's head, hands, and body posture through a triple-camera setup. The model integrates spatio-temporal data using a dual-path architecture, comprising a Context Block and a Feature Block, followed by a cross-modal fusion strategy to enhance prediction accuracy. Evaluated on a diverse dataset collected from the University of Leeds Driving Simulator, the proposed method achieves an accuracy of up to 95.8% in driver readiness classification. This performance significantly enhances existing approaches and highlights the importance of multimodal and multi-view fusion. As a real-time, non-intrusive solution, Driver-Net contributes meaningfully to the development of safer and more reliable automated vehicles and aligns with new regulatory mandates and upcoming safety standards.
Problem

Research questions and friction points this paper is trying to address.

Assessing driver readiness for safe control transition in automated vehicles
Fusing multi-camera inputs to estimate driver take-over readiness accurately
Enhancing prediction accuracy using spatio-temporal and cross-modal fusion techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep learning fuses multi-camera driver inputs
Triple-camera captures head, hands, body cues
Dual-path architecture enhances spatio-temporal fusion
🔎 Similar Papers
No similar papers found.
Mahdi Rezaei
Mahdi Rezaei
Associate Professor, University of Leeds
AIComputer VisionMachine LearningAutonomous VehiclesLarge Language Models
M
Mohsen Azarmi
Institute for Transport Studies, Computer Vision and Machine Learning Group, University of Leeds, LS2 9JT, United Kingdom