3DiFACE: Synthesizing and Editing Holistic 3D Facial Animation

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Existing speech-driven 3D facial animation methods struggle to simultaneously achieve precise control, natural head motion, and efficient editing, while failing to model the diversity of lip and head movements for identical speech inputs. This paper proposes a fully convolutional diffusion-based framework that introduces a sparse-guided motion diffusion mechanism and viseme-level conditional control to explicitly model phoneme-level motion diversity. The approach supports personalized speaking style generation and localized re-synthesis. It further enables keyframe specification and interpolation-based editing for fine-grained animation control. Quantitative and qualitative evaluations demonstrate that our method significantly outperforms state-of-the-art approaches in animation naturalness, motion diversity, and editing flexibility. By unifying high-fidelity synthesis with intuitive, granular controllability, it establishes a new paradigm for editable, high-quality 3D facial animation generation.

Technology Category

Application Category

📝 Abstract

Creating personalized 3D animations with precise control and realistic head motions remains challenging for current speech-driven 3D facial animation methods. Editing these animations is especially complex and time consuming, requires precise control and typically handled by highly skilled animators. Most existing works focus on controlling style or emotion of the synthesized animation and cannot edit/regenerate parts of an input animation. They also overlook the fact that multiple plausible lip and head movements can match the same audio input. To address these challenges, we present 3DiFACE, a novel method for holistic speech-driven 3D facial animation. Our approach produces diverse plausible lip and head motions for a single audio input and allows for editing via keyframing and interpolation. Specifically, we propose a fully-convolutional diffusion model that can leverage the viseme-level diversity in our training corpus. Additionally, we employ a speaking-style personalization and a novel sparsely-guided motion diffusion to enable precise control and editing. Through quantitative and qualitative evaluations, we demonstrate that our method is capable of generating and editing diverse holistic 3D facial animations given a single audio input, with control between high fidelity and diversity. Code and models are available here: https://balamuruganthambiraja.github.io/3DiFACE

Problem

Research questions and friction points this paper is trying to address.

Creating personalized 3D animations with precise control

Editing animations is complex and requires skilled animators

Generating diverse plausible lip and head motions from audio

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fully-convolutional diffusion model for viseme-level diversity

Speaking-style personalization for precise control

Sparsely-guided motion diffusion to enable editing

🔎 Similar Papers

3DFacePolicy: Speech-Driven 3D Facial Animation with Diffusion Policy