🤖 AI Summary
This study addresses the challenge of enabling robots to generate realistic, diverse, and responsive emotional expressions in real-time during human–robot interaction. We propose a novel method integrating mixed reality (MR) with flow-matching generative modeling: MR-based first-person teleoperation by human experts, combined with motion capture, is used to record facial, head, and upper-body emotional behaviors, which are then mapped onto robotic actuators (e.g., eyes, ears, neck, arms); a conditional flow-matching model synthesizes coherent, multimodal emotional behaviors in real time, guided by both affective state and dynamic environmental cues (e.g., moving objects). To our knowledge, this is the first work to jointly leverage MR-enabled demonstration learning and flow-matching generation, significantly improving expressiveness and responsiveness. Preliminary experiments validate the approach’s effectiveness in terms of real-time performance, behavioral diversity, and contextual adaptability.
📝 Abstract
Expressive behaviors in robots are critical for effectively conveying their emotional states during interactions with humans. In this work, we present a framework that autonomously generates realistic and diverse robotic emotional expressions based on expert human demonstrations captured in Mixed Reality (MR). Our system enables experts to teleoperate a virtual robot from a first-person perspective, capturing their facial expressions, head movements, and upper-body gestures, and mapping these behaviors onto corresponding robotic components including eyes, ears, neck, and arms. Leveraging a flow-matching-based generative process, our model learns to produce coherent and varied behaviors in real-time in response to moving objects, conditioned explicitly on given emotional states. A preliminary test validated the effectiveness of our approach for generating autonomous expressions.