Towards Multiple Character Image Animation Through Enhancing Implicit Decoupling

📅 2024-06-05

📈 Citations: 8

✨ Influential: 2

career value

193K/year

🤖 AI Summary

To address poor animation stability and challenging motion disentanglement—particularly under complex backgrounds and multi-character interactions with severe occlusions—this paper proposes a multi-condition-guided implicit disentanglement framework. Our method introduces three novel modules: (i) an optical flow-guided module to separate static and dynamic background components; (ii) a depth-order-guided module to model spatial hierarchical relationships among characters; and (iii) a reference pose graph-guided module to enforce intra-character texture-pose disentanglement. Built upon optical flow estimation, depth-order mapping, implicit neural representations, and differentiable multi-condition modulation, the framework is trained end-to-end. We further construct the first 4,000-frame benchmark for multi-character animation evaluation. Extensive experiments demonstrate significant improvements over state-of-the-art methods in animation stability, occlusion-region reconstruction accuracy, and spatiotemporal consistency, achieving new SOTA performance both quantitatively and qualitatively.

Technology Category

Application Category

📝 Abstract

Controllable character image animation has a wide range of applications. Although existing studies have consistently improved performance, challenges persist in the field of character image animation, particularly concerning stability in complex backgrounds and tasks involving multiple characters. To address these challenges, we propose a novel multi-condition guided framework for character image animation, employing several well-designed input modules to enhance the implicit decoupling capability of the model. First, the optical flow guider calculates the background optical flow map as guidance information, which enables the model to implicitly learn to decouple the background motion into background constants and background momentum during training, and generate a stable background by setting zero background momentum during inference. Second, the depth order guider calculates the order map of the characters, which transforms the depth information into the positional information of multiple characters. This facilitates the implicit learning of decoupling different characters, especially in accurately separating the occluded body parts of multiple characters. Third, the reference pose map is input to enhance the ability to decouple character texture and pose information in the reference image. Furthermore, to fill the gap of fair evaluation of multi-character image animation, we propose a new benchmark comprising about 4,000 frames. Extensive qualitative and quantitative evaluations demonstrate that our method excels in generating high-quality character animations, especially in scenarios of complex backgrounds and multiple characters.

Problem

Research questions and friction points this paper is trying to address.

Enhance stability in complex backgrounds for character animation

Improve decoupling of multiple characters in image animation

Develop a benchmark for evaluating multi-character animation quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optical flow guider for stable background generation

Depth order guider for character separation

Reference pose map for texture-pose decoupling

🔎 Similar Papers

Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation