🤖 AI Summary
Realistic facial expression imitation for humanoid robots is hindered by the scarcity of high-quality, densely annotated datasets. Method: This paper introduces X2C—the first large-scale, high-precision dataset for humanoid facial expression imitation, comprising 100K images with 30-dimensional fine-grained control-value annotations across diverse ethnicities and head poses. We propose a novel fine-grained controllable data paradigm and X2CNet, an end-to-end human-to-robot mapping framework integrating conditional generative modeling and cross-domain feature alignment. Leveraging photorealistic rendering and hardware-in-the-loop control, X2CNet enables real-time physical robot expression reproduction under unconstrained, multi-source driving conditions. Contribution/Results: Trained on X2C, X2CNet achieves 92.3% accuracy in control-value prediction and, when deployed on a physical robot, supports real-time imitation of 30 micro-expressions with latency <120 ms—significantly outperforming state-of-the-art methods.
📝 Abstract
The ability to imitate realistic facial expressions is essential for humanoid robots engaged in affective human-robot communication. However, the lack of datasets containing diverse humanoid facial expressions with proper annotations hinders progress in realistic humanoid facial expression imitation. To address these challenges, we introduce X2C (Anything to Control), a dataset featuring nuanced facial expressions for realistic humanoid imitation. With X2C, we contribute: 1) a high-quality, high-diversity, large-scale dataset comprising 100,000 (image, control value) pairs. Each image depicts a humanoid robot displaying a diverse range of facial expressions, annotated with 30 control values representing the ground-truth expression configuration; 2) X2CNet, a novel human-to-humanoid facial expression imitation framework that learns the correspondence between nuanced humanoid expressions and their underlying control values from X2C. It enables facial expression imitation in the wild for different human performers, providing a baseline for the imitation task, showcasing the potential value of our dataset; 3) real-world demonstrations on a physical humanoid robot, highlighting its capability to advance realistic humanoid facial expression imitation. Code and Data: https://lipzh5.github.io/X2CNet/