IM-Animation: An Implicit Motion Representation for Identity-decoupled Character Animation

πŸ“… 2026-02-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the limitations of existing character animation methods, where explicit motion representations struggle with spatial misalignment and scale variations, while implicit approaches often suffer from identity leakage due to entanglement between motion and appearance. To overcome these challenges, we propose a novel implicit motion representation framework that compresses per-frame motion into compact 1D motion tokens, thereby relaxing 2D spatial constraints and effectively disentangling identity information. Additionally, we introduce a mask token–based temporal consistency redirection module to enhance motion coherence during animation transfer. Integrated with a three-stage training strategy and a video diffusion model, our method achieves state-of-the-art or comparable performance across multiple metrics, enabling high-fidelity, identity-disentangled, and temporally coherent character animation generation.

Technology Category

Application Category

πŸ“ Abstract
Recent progress in video diffusion models has markedly advanced character animation, which synthesizes motioned videos by animating a static identity image according to a driving video. Explicit methods represent motion using skeleton, DWPose or other explicit structured signals, but struggle to handle spatial mismatches and varying body scales. %proportions. Implicit methods, on the other hand, capture high-level implicit motion semantics directly from the driving video, but suffer from identity leakage and entanglement between motion and appearance. To address the above challenges, we propose a novel implicit motion representation that compresses per-frame motion into compact 1D motion tokens. This design relaxes strict spatial constraints inherent in 2D representations and effectively prevents identity information leakage from the motion video. Furthermore, we design a temporally consistent mask token-based retargeting module that enforces a temporal training bottleneck, mitigating interference from the source images'motion and improving retargeting consistency. Our methodology employs a three-stage training strategy to enhance the training efficiency and ensure high fidelity. Extensive experiments demonstrate that our implicit motion representation and the propose IM-Animation's generative capabilities are achieve superior or competitive performance compared with state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

character animation
identity leakage
motion representation
spatial mismatch
appearance-motion entanglement
Innovation

Methods, ideas, or system contributions that make the work stand out.

implicit motion representation
1D motion tokens
identity-decoupled animation
temporal consistency
motion retargeting
πŸ”Ž Similar Papers
No similar papers found.
Z
Zhufeng Xu
Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences
X
Xuan Gao
Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences
F
Feng-Lin Liu
Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences
H
Haoxian Zhang
Kling Team, Kuaishou Technology
Z
Zhixue Fang
Kling Team, Kuaishou Technology
Yu-Kun Lai
Yu-Kun Lai
Professor, Cardiff University
Geometric ModelingGeometry ProcessingComputer GraphicsImage ProcessingComputer Vision
X
Xiaoqiang Liu
Kling Team, Kuaishou Technology
Pengfei Wan
Pengfei Wan
Head of Kling Video Generation Models, Kuaishou Technology
Generative ModelsComputer VisionMultimodal AIComputer Graphics
Lin Gao
Lin Gao
Professor, University of Chinese Academy of Sciences
Computer GraphicsGeometry ProcessingGeometry Learning