🤖 AI Summary
This work proposes a real-time blendshape prediction method for high-fidelity, low-latency facial animation in low-power VR scenarios using only a standard webcam. The approach extracts geometric features through affine transformation and facial region segmentation, followed by regression-based estimation of blendshape coefficients. Temporal consistency is enhanced via smoothing filters and nonlinear post-processing. The system achieves prediction accuracy comparable to ARKit 6 while maintaining minimal computational overhead, thereby fulfilling real-time performance and visual smoothness requirements. By eliminating the need for specialized facial motion capture hardware, this method significantly lowers the barrier to deploying expressive avatars in resource-constrained virtual reality environments.
📝 Abstract
Real-time facial avatar animation is widely used in entertainment, office and other fields where blendshapes have become a common animation method. We independently developed an accurate blend-shape prediction system for low-power VR applications using a webcam. Feature vectors are extracted through affine transformation and segmentation. Further transformation and regression analysis was used to develop statistical models with significant predictive power. Post-processing was used to further improve response stability, including smoothing filtering and nonlinear transformations. The system achieved accuracy similar to ARKit 6.