π€ AI Summary
This study addresses the challenge that current social robots struggle to naturally emulate human gaze behavior in complex social scenarios, particularly in responding to non-human stimuli such as falling objects or opening doors. Moving beyond prior work focused solely on interpersonal interactions, this research presents the first systematic model of integrated human gaze responses to both social and non-social events. Using Unity-generated 3D animations and 360Β° real-world videos within a virtual reality environment, eye-tracking data were collected to train LSTM and Transformer models for predicting gaze direction. The models achieved prediction accuracies of 70.4% and 72% in animated and real-world scenes, respectively, significantly outperforming existing approaches. When deployed on a NAO robot, the system received high user ratings from 275 participants, demonstrating a marked improvement in the naturalness of humanβrobot interaction.
π Abstract
Nonverbal behaviors, particularly gaze direction, play a crucial role in enhancing effective communication in social interactions. As social robots increasingly participate in these interactions, they must adapt their gaze based on human activities and remain receptive to all cues, whether human-generated or not, to ensure seamless and effective communication. This study aims to increase the similarity between robot and human gaze behavior across various social situations, including both human and non-human stimuli (e.g., conversations, pointing, door openings, and object drops). A key innovation in this study, is the investigation of gaze responses to non-human stimuli, a critical yet underexplored area in prior research. These scenarios, were simulated in the Unity software as a 3D animation and a 360-degree real-world video. Data on gaze directions from 41 participants were collected via virtual reality (VR) glasses. Preprocessed data, trained two neural networks-LSTM and Transformer-to build predictive models based on individuals'gaze patterns. In the animated scenario, the LSTM and Transformer models achieved prediction accuracies of 67.6% and 70.4%, respectively; In the real-world scenario, the LSTM and Transformer models achieved accuracies of 72% and 71.6%, respectively. Despite the gaze pattern differences among individuals, our models outperform existing approaches in accuracy while uniquely considering non-human stimuli, offering a significant advantage over previous literature. Furthermore, deployed on the NAO robot, the system was evaluated by 275 participants via a comprehensive questionnaire, with results demonstrating high satisfaction during interactions. This work advances social robotics by enabling robots to dynamically mimic human gaze behavior in complex social contexts.