🤖 AI Summary
Robots struggle to perceive subtle nonverbal feedback—such as facial micro-expressions and head movements—from bystanders reacting to their social errors, limiting adaptability in real-world social interactions. To address this, we propose a novel paradigm leveraging a neck-worn camera to capture dynamic facial cues from the chin region. We introduce NeckNet-18, the first 3D facial reconstruction model specifically designed for the chin region, which jointly estimates 3D facial landmarks, models head motion trajectories, and decodes affective expressions to enable real-time detection of robot social errors. Compared to OpenFace and conventional video-based methods, NeckNet-18 achieves significantly higher intra-subject detection accuracy and superior cross-context generalization. This work provides the first empirical validation that bystander chin-view feedback serves as a reliable implicit signal for social error correction, thereby establishing a new implicit perception channel for human–robot interaction.
📝 Abstract
How do humans recognize and rectify social missteps? We achieve social competence by looking around at our peers, decoding subtle cues from bystanders - a raised eyebrow, a laugh - to evaluate the environment and our actions. Robots, however, struggle to perceive and make use of these nuanced reactions. By employing a novel neck-mounted device that records facial expressions from the chin region, we explore the potential of previously untapped data to capture and interpret human responses to robot error. First, we develop NeckNet-18, a 3D facial reconstruction model to map the reactions captured through the chin camera onto facial points and head motion. We then use these facial responses to develop a robot error detection model which outperforms standard methodologies such as using OpenFace or video data, generalizing well especially for within-participant data. Through this work, we argue for expanding human-in-the-loop robot sensing, fostering more seamless integration of robots into diverse human environments, pushing the boundaries of social cue detection and opening new avenues for adaptable robotics.