π€ AI Summary
This work addresses the challenge of jointly ensuring musical synchronization and choreographic structure in animal dance video generation. Methodologically, it formulates animal dance as a beat-aligned, choreography-aware graph optimization problem, integrating automatic beat detection, mirror-pose image generation (to explicitly model bilateral symmetry), and text-to-keyframe synthesis driven by GPT-4oβrequiring only six sparse textual prompts or keyframes to construct a structured pose sequence. High-fidelity intermediate frames are then synthesized via a video diffusion model. To our knowledge, this is the first approach enabling choreography-aware modeling and end-to-end music-aligned generation for animal dance. It robustly produces 30-second, high-fidelity dance videos across diverse species and musical genres, significantly lowering the barrier to professional-grade animal dance content creation.
π Abstract
We present a keyframe-based framework for generating music-synchronized, choreography aware animal dance videos. Starting from a few keyframes representing distinct animal poses -- generated via text-to-image prompting or GPT-4o -- we formulate dance synthesis as a graph optimization problem: find the optimal keyframe structure that satisfies a specified choreography pattern of beats, which can be automatically estimated from a reference dance video. We also introduce an approach for mirrored pose image generation, essential for capturing symmetry in dance. In-between frames are synthesized using an video diffusion model. With as few as six input keyframes, our method can produce up to 30 second dance videos across a wide range of animals and music tracks.