3DFacePolicy: Speech-Driven 3D Facial Animation with Diffusion Policy

๐Ÿ“… 2024-09-17
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 2
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing speech-driven 3D facial animation methods typically generate mesh vertices frame-by-frame, resulting in discontinuous motion, rigid expressions, and distorted emotional articulation. To address this, we propose the first end-to-end vertex trajectory prediction framework based on Diffusion Policy, wherein a diffusion model is formulated as a temporal policy network that directly maps audio features (Wav2Vec 2.0 embeddings) to long-term, variable-length, and emotionally coherent 3D vertex trajectories. Unlike conventional frame-wise approaches, our method explicitly models spatiotemporal continuity and semantic consistency of vertex dynamics, eliminating independent per-frame predictions. Evaluated on multiple benchmarks, our approach significantly improves motion diversity, temporal coherence, and emotional fidelityโ€”yielding more natural and expressive speech-driven 3D facial animation.

Technology Category

Application Category

๐Ÿ“ Abstract
Audio-driven 3D facial animation has made immersive progress both in research and application developments. The newest approaches focus on Transformer-based methods and diffusion-based methods, however, there is still gap in the vividness and emotional expression between the generated animation and real human face. To tackle this limitation, we propose 3DFacePolicy, a diffusion policy model for 3D facial animation prediction. This method generates variable and realistic human facial movements by predicting the 3D vertex trajectory on the 3D facial template with diffusion policy instead of facial generation for every frame. It takes audio and vertex states as observations to predict the vertex trajectory and imitate real human facial expressions, which keeps the continuous and natural flow of human emotions. The experiments show that our approach is effective in variable and dynamic facial motion synthesizing.
Problem

Research questions and friction points this paper is trying to address.

Generates natural 3D facial animation from audio
Overcomes frame-by-frame vertex movement limitations
Uses action-based control for smooth facial dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Action-based control for vertex trajectory
Diffusion policy for audio-driven animation
Robotic mechanism for smooth facial movements
๐Ÿ”Ž Similar Papers
No similar papers found.