🤖 AI Summary
This study addresses the need for natural, low-latency human–robot interaction for individuals with special communication needs, such as autistic children, by enabling high-fidelity, real-time biomimetic imitation of head motion, blinking, and facial expressions on the NAO robot.
Method: We propose a novel dual-modal (head pose–emotion) real-time feedback framework—the first to integrate MediaPipe (for 6D head pose estimation) and DeepFace (for dynamic emotion recognition) on NAO—combined with PID-based closed-loop control and SDK-level optimizations for millisecond-scale synchronization and online calibration.
Contribution/Results: Experiments demonstrate R² scores of 96.3% (pitch) and 98.9% (yaw) in head pose tracking, end-to-end latency <120 ms, and >92% accuracy in blink detection and basic emotion recognition. The framework significantly enhances interaction naturalness and accessibility, establishing a reusable technical paradigm for special-education robotics.
📝 Abstract
This paper introduces a novel approach for enabling real-time imitation of human head motion by a Nao robot, with a primary focus on elevating human-robot interactions. By using the robust capabilities of the MediaPipe as a computer vision library and the DeepFace as an emotion recognition library, this research endeavors to capture the subtleties of human head motion, including blink actions and emotional expressions, and seamlessly incorporate these indicators into the robot’s responses. The result is a comprehensive framework which facilitates precise head imitation within human-robot interactions, utilizing a closed-loop approach that involves gathering real-time feedback from the robot’s imitation performance. This feedback loop ensures a high degree of accuracy in modeling head motion, as evidenced by an impressive R2 score of 96.3 for pitch and 98.9 for yaw. Notably, the proposed approach holds promise in improving communication for children with autism, offering them a valuable tool for more effective interaction. In essence, proposed work explores the integration of real-time head imitation and real-time emotion recognition to enhance human-robot interactions, with potential benefits for individuals with unique communication needs.