IM-Animation: An Implicit Motion Representation for Identity-decoupled Character Animation

📅 2026-02-07

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This work addresses the limitations of existing character animation methods, where explicit motion representations struggle with spatial misalignment and scale variations, while implicit approaches often suffer from identity leakage due to entanglement between motion and appearance. To overcome these challenges, we propose a novel implicit motion representation framework that compresses per-frame motion into compact 1D motion tokens, thereby relaxing 2D spatial constraints and effectively disentangling identity information. Additionally, we introduce a mask token–based temporal consistency redirection module to enhance motion coherence during animation transfer. Integrated with a three-stage training strategy and a video diffusion model, our method achieves state-of-the-art or comparable performance across multiple metrics, enabling high-fidelity, identity-disentangled, and temporally coherent character animation generation.

Technology Category

Application Category

📝 Abstract

Recent progress in video diffusion models has markedly advanced character animation, which synthesizes motioned videos by animating a static identity image according to a driving video. Explicit methods represent motion using skeleton, DWPose or other explicit structured signals, but struggle to handle spatial mismatches and varying body scales. %proportions. Implicit methods, on the other hand, capture high-level implicit motion semantics directly from the driving video, but suffer from identity leakage and entanglement between motion and appearance. To address the above challenges, we propose a novel implicit motion representation that compresses per-frame motion into compact 1D motion tokens. This design relaxes strict spatial constraints inherent in 2D representations and effectively prevents identity information leakage from the motion video. Furthermore, we design a temporally consistent mask token-based retargeting module that enforces a temporal training bottleneck, mitigating interference from the source images'motion and improving retargeting consistency. Our methodology employs a three-stage training strategy to enhance the training efficiency and ensure high fidelity. Extensive experiments demonstrate that our implicit motion representation and the propose IM-Animation's generative capabilities are achieve superior or competitive performance compared with state-of-the-art methods.

Problem

Research questions and friction points this paper is trying to address.

character animation

identity leakage

motion representation

spatial mismatch

appearance-motion entanglement

Innovation

Methods, ideas, or system contributions that make the work stand out.

implicit motion representation

1D motion tokens

identity-decoupled animation