ExFace: Expressive Facial Control for Humanoid Robots with Diffusion Transformers and Bootstrap Training

📅 2025-04-20

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of achieving high-precision, real-time mapping from human facial expressions to facial motor control in humanoid robots, aiming to enhance expression naturalness, motion fluency, and interactive responsiveness. We propose the first diffusion Transformer architecture specifically designed for robot facial control, integrating bootstrapped training and a novel blendshape-to-motor mapping model. Additionally, we introduce ExFace—the first benchmark dataset dedicated to face-driven robotic facial motion. Our method achieves state-of-the-art performance in expression reconstruction accuracy, real-time inference speed (>30 FPS), and end-to-end latency (<80 ms), significantly advancing real-time anthropomorphic expressivity. The framework has been successfully deployed on multiple humanoid robot platforms, enabling natural expressive performances and high-fidelity human–robot interaction.

Technology Category

Application Category

📝 Abstract

This paper presents a novel Expressive Facial Control (ExFace) method based on Diffusion Transformers, which achieves precise mapping from human facial blendshapes to bionic robot motor control. By incorporating an innovative model bootstrap training strategy, our approach not only generates high-quality facial expressions but also significantly improves accuracy and smoothness. Experimental results demonstrate that the proposed method outperforms previous methods in terms of accuracy, frame per second (FPS), and response time. Furthermore, we develop the ExFace dataset driven by human facial data. ExFace shows excellent real-time performance and natural expression rendering in applications such as robot performances and human-robot interactions, offering a new solution for bionic robot interaction.

Problem

Research questions and friction points this paper is trying to address.

Precise mapping from human to robot facial expressions

Improving expression accuracy and smoothness via bootstrap training

Enhancing real-time performance in human-robot interaction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion Transformers for facial control

Bootstrap training enhances expression quality

ExFace dataset from human facial data

🔎 Similar Papers

No similar papers found.

Authors to Follow