MagicPortrait: Temporally Consistent Face Reenactment with 3D Geometric Guidance

📅 2025-04-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing video face reenactment methods suffer from limited shape consistency and motion controllability. To address this, we propose the first framework that embeds the FLAME 3D facial parametric model as a motion prior into a latent diffusion model (LDM), leveraging multi-modal geometric guidance—namely depth maps, normal maps, and rendered images—to achieve high-fidelity, temporally coherent generation. We introduce a novel multi-level facial motion fusion module and a parameterized identity-action alignment mechanism, enabling explicit disentanglement and joint modeling of identity, expression, and pose. Our method achieves state-of-the-art performance across multiple benchmarks: it supports fine-grained expression control, accurate head pose manipulation, and exhibits strong cross-domain generalization. The source code is publicly available.

Technology Category

Application Category

📝 Abstract
In this paper, we propose a method for video face reenactment that integrates a 3D face parametric model into a latent diffusion framework, aiming to improve shape consistency and motion control in existing video-based face generation approaches. Our approach employs the FLAME (Faces Learned with an Articulated Model and Expressions) model as the 3D face parametric representation, providing a unified framework for modeling face expressions and head pose. This enables precise extraction of detailed face geometry and motion features from driving videos. Specifically, we enhance the latent diffusion model with rich 3D expression and detailed pose information by incorporating depth maps, normal maps, and rendering maps derived from FLAME sequences. A multi-layer face movements fusion module with integrated self-attention mechanisms is used to combine identity and motion latent features within the spatial domain. By utilizing the 3D face parametric model as motion guidance, our method enables parametric alignment of face identity between the reference image and the motion captured from the driving video. Experimental results on benchmark datasets show that our method excels at generating high-quality face animations with precise expression and head pose variation modeling. In addition, it demonstrates strong generalization performance on out-of-domain images. Code is publicly available at https://github.com/weimengting/MagicPortrait.
Problem

Research questions and friction points this paper is trying to address.

Improving shape consistency in video face reenactment
Enhancing motion control via 3D geometric guidance
Achieving precise expression and pose variation modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates 3D face model into diffusion framework
Uses FLAME model for detailed geometry extraction
Enhances diffusion with depth and normal maps
🔎 Similar Papers
No similar papers found.