HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation

📅 2025-03-24

📈 Citations: 2

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the challenge of generating high-fidelity, temporally coherent portrait animations from a single reference portrait image and a driving video. The proposed method leverages a stable video diffusion model with two key innovations: (1) an implicit motion representation that explicitly disentangles identity and motion information from the driving video; and (2) an attention-injection adapter enabling fine-grained spatiotemporal control during the diffusion process. Integrating pretrained encoders, implicit neural representations, and the motion-identity disentanglement architecture, the approach faithfully reconstructs facial expressions and head poses without fine-tuning or additional supervision. Extensive experiments demonstrate significant improvements over state-of-the-art methods in temporal consistency, motion controllability, and cross-style generalization, enabling high-quality portrait animation across diverse artistic styles.

Technology Category

Application Category

📝 Abstract

We introduce HunyuanPortrait, a diffusion-based condition control method that employs implicit representations for highly controllable and lifelike portrait animation. Given a single portrait image as an appearance reference and video clips as driving templates, HunyuanPortrait can animate the character in the reference image by the facial expression and head pose of the driving videos. In our framework, we utilize pre-trained encoders to achieve the decoupling of portrait motion information and identity in videos. To do so, implicit representation is adopted to encode motion information and is employed as control signals in the animation phase. By leveraging the power of stable video diffusion as the main building block, we carefully design adapter layers to inject control signals into the denoising unet through attention mechanisms. These bring spatial richness of details and temporal consistency. HunyuanPortrait also exhibits strong generalization performance, which can effectively disentangle appearance and motion under different image styles. Our framework outperforms existing methods, demonstrating superior temporal consistency and controllability. Our project is available at https://kkakkkka.github.io/HunyuanPortrait.

Problem

Research questions and friction points this paper is trying to address.

Enhancing portrait animation via implicit condition control

Decoupling motion and identity in portrait videos

Improving temporal consistency and controllability in animations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Implicit condition control for portrait animation

Pre-trained encoders decouple motion and identity

Adapter layers inject signals via attention mechanisms

🔎 Similar Papers

LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control