🤖 AI Summary
Existing facial expression reconstruction methods suffer from limited environmental robustness, privacy concerns, and high power consumption. To address these challenges, this paper proposes a covert 3D facial expression reconstruction system based on ear-worn inertial measurement units (IMUs), which infers facial muscle activity from subtle ear motions—eliminating the need for cameras and ensuring privacy and wearing comfort. Our key contributions are: (1) the first high-accuracy decoding paradigm mapping ear-worn IMU signals to facial motion; (2) IMUTwinTrans, a lightweight transformer-based model integrating temporal modeling and twin attention mechanisms; and (3) support for 5-minute personalized calibration, 30-Hz on-device real-time reconstruction, and ultra-low power consumption of only 58 mW. Evaluated in a 12-subject user study, the system achieves a mean landmark error of 2.21 mm and successfully drives low-latency 3D facial animation, demonstrating feasibility for embedded deployment.
📝 Abstract
The potential of facial expression reconstruction technology is significant, with applications in various fields such as human-computer interaction, affective computing, and virtual reality. Recent studies have proposed using ear-worn devices for facial expression reconstruction to address the environmental limitations and privacy concerns associated with traditional camera-based methods. However, these approaches still require improvements in terms of aesthetics and power consumption. This paper introduces a system called IMUFace. It uses inertial measurement units (IMUs) embedded in wireless earphones to detect subtle ear movements caused by facial muscle activities, allowing for covert and low-power facial reconstruction. A user study involving 12 participants was conducted, and a deep learning model named IMUTwinTrans was proposed. The results show that IMUFace can accurately predict users' facial landmarks with a precision of 2.21 mm, using only five minutes of training data. The predicted landmarks can be utilized to reconstruct a three-dimensional facial model. IMUFace operates at a sampling rate of 30 Hz with a relatively low power consumption of 58 mW. The findings presented in this study demonstrate the real-world applicability of IMUFace and highlight potential directions for further research to facilitate its practical adoption.