SingingBot: An Avatar-Driven System for Robotic Face Singing Performance

📅 2026-01-05

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

254K/year

🤖 AI Summary

This work addresses the challenge that existing robotic facial actuation methods struggle to meet the high demands of continuous emotional expression and facial coherence required in singing performances. To overcome this limitation, the authors propose a virtual-agent-driven framework for robotic facial singing performance: first, a portrait video generation model infused with human priors synthesizes emotionally expressive singing avatars, and then semantic-guided mapping transfers facial expressions from these avatars to the robot. Innovatively leveraging virtual agents as an intermediate medium, the study introduces an “emotional dynamic range” metric to quantify the richness of singing-related affect within the Valence-Arousal space, revealing the critical role of a broad emotional spectrum in expressive performance. Experimental results demonstrate that the proposed approach significantly outperforms existing methods in both emotional expressiveness and lip-sync accuracy.

Technology Category

Application Category

📝 Abstract

Equipping robotic faces with singing capabilities is crucial for empathetic Human-Robot Interaction. However, existing robotic face driving research primarily focuses on conversations or mimicking static expressions, struggling to meet the high demands for continuous emotional expression and coherence in singing. To address this, we propose a novel avatar-driven framework for appealing robotic singing. We first leverage portrait video generation models embedded with extensive human priors to synthesize vivid singing avatars, providing reliable expression and emotion guidance. Subsequently, these facial features are transferred to the robot via semantic-oriented mapping functions that span a wide expression space. Furthermore, to quantitatively evaluate the emotional richness of robotic singing, we propose the Emotion Dynamic Range metric to measure the emotional breadth within the Valence-Arousal space, revealing that a broad emotional spectrum is crucial for appealing performances. Comprehensive experiments prove that our method achieves rich emotional expressions while maintaining lip-audio synchronization, significantly outperforming existing approaches.

Problem

Research questions and friction points this paper is trying to address.

robotic face

singing performance

emotional expression

human-robot interaction

facial animation

Innovation

Methods, ideas, or system contributions that make the work stand out.

avatar-driven framework

portrait video generation

semantic-oriented mapping

Emotion Dynamic Range

robotic singing performance

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

AI Research Scientist, Robotics