🤖 AI Summary
Hand-drawn character animation faces a fundamental trade-off between geometric consistency and realistic secondary motion: skeletal animation struggles to model non-rigid deformations (e.g., flowing hair or fluttering skirts), while video diffusion models often introduce geometric distortions in stylized illustrations. To address this, we propose two synergistic strategies—Secondary Dynamics Injection (SDI) and Hair Layer Modeling (HLM)—that embed human motion priors into the denoising process and semantically decouple the hair layer for fine-grained dynamic control. We first generate geometrically coherent coarse frames via skeleton retargeting, then refine them using a domain-adapted video diffusion model for texture enhancement and mask-guided secondary motion synthesis. Experiments demonstrate that our method surpasses state-of-the-art approaches both quantitatively and visually, achieving, for the first time, hand-drawn character animations with consistent artistic style, geometric stability, and natural secondary dynamics.
📝 Abstract
Hand-drawn character animation is a vibrant field in computer graphics, presenting challenges in achieving geometric consistency while conveying expressive motion. Traditional skeletal animation methods maintain geometric consistency but struggle with complex non-rigid elements like flowing hair and skirts, leading to unnatural deformation. Conversely, video diffusion models synthesize realistic dynamics but often create geometric distortions in stylized drawings due to domain gaps. This work proposes a hybrid animation system that combines skeletal animation and video diffusion. Initially, coarse images are generated from characters retargeted with skeletal animations for geometric guidance. These images are then enhanced in texture and secondary dynamics using video diffusion priors, framing this enhancement as an inpainting task. A domain-adapted diffusion model refines user-masked regions needing improvement, especially for secondary dynamics. To enhance motion realism further, we introduce a Secondary Dynamics Injection (SDI) strategy in the denoising process, incorporating features from a pre-trained diffusion model enriched with human motion priors. Additionally, to tackle unnatural deformations from low-poly single-mesh character modeling, we present a Hair Layering Modeling (HLM) technique that uses segmentation maps to separate hair from the body, allowing for more natural animation of long-haired characters. Extensive experiments show that our system outperforms state-of-the-art methods in both quantitative and qualitative evaluations.