Wan-Animate: Unified Character Animation and Replacement with Holistic Replication

📅 2025-09-17

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This paper addresses the problem of high-fidelity talking-head animation generation and character reenactment from a single reference image and a driving video. Methodologically, it proposes a unified framework featuring: (1) a symbolic representation that jointly models both animation generation and character replacement; (2) spatially aligned skeletal motion sequences to drive pose dynamics, coupled with implicit facial feature warping from the source image for expressive facial reenactment; and (3) an auxiliary relighting LoRA module enabling adaptive matching of target-scene illumination and color tone. Compared to prior work, our approach introduces novelties in joint pose-expression modeling, environment-consistent rendering control, and task-unified architecture design. Quantitative and qualitative evaluations demonstrate state-of-the-art performance in visual fidelity, motion naturalness, lighting coherence, and user controllability. The code and pretrained models will be publicly released.

Technology Category

Application Category

📝 Abstract

We introduce Wan-Animate, a unified framework for character animation and replacement. Given a character image and a reference video, Wan-Animate can animate the character by precisely replicating the expressions and movements of the character in the video to generate high-fidelity character videos. Alternatively, it can integrate the animated character into the reference video to replace the original character, replicating the scene's lighting and color tone to achieve seamless environmental integration. Wan-Animate is built upon the Wan model. To adapt it for character animation tasks, we employ a modified input paradigm to differentiate between reference conditions and regions for generation. This design unifies multiple tasks into a common symbolic representation. We use spatially-aligned skeleton signals to replicate body motion and implicit facial features extracted from source images to reenact expressions, enabling the generation of character videos with high controllability and expressiveness. Furthermore, to enhance environmental integration during character replacement, we develop an auxiliary Relighting LoRA. This module preserves the character's appearance consistency while applying the appropriate environmental lighting and color tone. Experimental results demonstrate that Wan-Animate achieves state-of-the-art performance. We are committed to open-sourcing the model weights and its source code.

Problem

Research questions and friction points this paper is trying to address.

Animate characters by replicating expressions and movements from videos

Replace characters in videos while integrating environmental lighting

Unify animation and replacement tasks into a single framework

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified framework for character animation

Spatially-aligned skeleton signals for motion

Auxiliary Relighting LoRA for environmental integration

🔎 Similar Papers

Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation