How to Train Your Dragon: Automatic Diffusion-Based Rigging for Characters with Diverse Topologies

📅 2025-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of generalizing diffusion models to animate characters with arbitrary skeletal topologies (e.g., non-humanoid). We propose the first diffusion-based framework for universal rigging and animation across diverse skeletal structures: (1) a topology-agnostic skeleton encoding representation; (2) an online procedural synthetic data pipeline enabling few-shot, character-specific rig inference from only 3–5 skeleton-annotated images; and (3) a high-fidelity, 2D keypoint-driven rendering method. Our contributions include: (1) the first 2D keypoint animation benchmark covering both humanoid and non-humanoid characters; (2) state-of-the-art performance on both realistic and cartoon-style characters, significantly outperforming existing methods; and (3) empirical validation of strong cross-topology generalization and robustness. The framework bridges a critical gap in diffusion-based character animation by decoupling motion generation from rigid skeletal assumptions, enabling flexible, data-efficient adaptation to novel anatomies.

Technology Category

Application Category

📝 Abstract
Recent diffusion-based methods have achieved impressive results on animating images of human subjects. However, most of that success has built on human-specific body pose representations and extensive training with labeled real videos. In this work, we extend the ability of such models to animate images of characters with more diverse skeletal topologies. Given a small number (3-5) of example frames showing the character in different poses with corresponding skeletal information, our model quickly infers a rig for that character that can generate images corresponding to new skeleton poses. We propose a procedural data generation pipeline that efficiently samples training data with diverse topologies on the fly. We use it, along with a novel skeleton representation, to train our model on articulated shapes spanning a large space of textures and topologies. Then during fine-tuning, our model rapidly adapts to unseen target characters and generalizes well to rendering new poses, both for realistic and more stylized cartoon appearances. To better evaluate performance on this novel and challenging task, we create the first 2D video dataset that contains both humanoid and non-humanoid subjects with per-frame keypoint annotations. With extensive experiments, we demonstrate the superior quality of our results. Project page: https://traindragondiffusion.github.io/
Problem

Research questions and friction points this paper is trying to address.

Extend diffusion-based animation to diverse skeletal topologies
Infer character rigs from few example frames with skeletal data
Create and evaluate a 2D video dataset for non-humanoid animation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Procedural data generation for diverse topologies
Novel skeleton representation for articulated shapes
Rapid adaptation to unseen characters and poses
🔎 Similar Papers
No similar papers found.