🤖 AI Summary
This paper addresses the problem of high-fidelity talking-head animation generation and character reenactment from a single reference image and a driving video. Methodologically, it proposes a unified framework featuring: (1) a symbolic representation that jointly models both animation generation and character replacement; (2) spatially aligned skeletal motion sequences to drive pose dynamics, coupled with implicit facial feature warping from the source image for expressive facial reenactment; and (3) an auxiliary relighting LoRA module enabling adaptive matching of target-scene illumination and color tone. Compared to prior work, our approach introduces novelties in joint pose-expression modeling, environment-consistent rendering control, and task-unified architecture design. Quantitative and qualitative evaluations demonstrate state-of-the-art performance in visual fidelity, motion naturalness, lighting coherence, and user controllability. The code and pretrained models will be publicly released.
📝 Abstract
We introduce Wan-Animate, a unified framework for character animation and replacement. Given a character image and a reference video, Wan-Animate can animate the character by precisely replicating the expressions and movements of the character in the video to generate high-fidelity character videos. Alternatively, it can integrate the animated character into the reference video to replace the original character, replicating the scene's lighting and color tone to achieve seamless environmental integration. Wan-Animate is built upon the Wan model. To adapt it for character animation tasks, we employ a modified input paradigm to differentiate between reference conditions and regions for generation. This design unifies multiple tasks into a common symbolic representation. We use spatially-aligned skeleton signals to replicate body motion and implicit facial features extracted from source images to reenact expressions, enabling the generation of character videos with high controllability and expressiveness. Furthermore, to enhance environmental integration during character replacement, we develop an auxiliary Relighting LoRA. This module preserves the character's appearance consistency while applying the appropriate environmental lighting and color tone. Experimental results demonstrate that Wan-Animate achieves state-of-the-art performance. We are committed to open-sourcing the model weights and its source code.