InfiniteDance: Scalable 3D Dance Generation Towards in-the-wild Generalization

📅 2026-03-10

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This work addresses the limited generalization of existing 3D dance generation methods to unseen music, which often results in structurally disordered or physically implausible motions. To tackle this issue, the authors construct a large-scale, high-quality 3D dance dataset comprising 100.69 hours of multimodal data and propose the Foot Restoration Diffusion Model to enhance the physical plausibility of generated movements. Furthermore, they introduce ChoreoLLaMA, a novel architecture that integrates retrieval-augmented generation (RAG) with a rhythm-aware mixture-of-experts (MoE) mechanism, significantly improving adaptability to novel musical inputs. Experimental results demonstrate that the proposed approach generates more natural, diverse, and physically realistic dance sequences across various styles, outperforming current state-of-the-art methods in both qualitative and quantitative evaluations.

Technology Category

Application Category

📝 Abstract

Although existing 3D dance generation methods perform well in controlled scenarios, they often struggle to generalize in the wild. When conditioned on unseen music, existing methods often produce unstructured or physically implausible dance, largely due to limited music-to-dance data and restricted model capacity. This work aims to push the frontier of generalizable 3D dance generation by scaling up both data and model design. (1) On the data side, we develop a fully automated pipeline that reconstructs high-fidelity 3D dance motions from monocular videos. To eliminate the physical artifacts prevalent in existing reconstruction methods, we introduce a Foot Restoration Diffusion Model (FRDM) guided by foot-contact and geometric constraints that enforce physical plausibility while preserving kinematic smoothness and expressiveness, resulting in a diverse, high-quality multimodal 3D dance dataset totaling 100.69 hours. (2) On model design, we propose Choreographic LLaMA (ChoreoLLaMA), a scalable LLaMA-based architecture. To enhance robustness under unfamiliar music conditions, we integrate a retrieval-augmented generation (RAG) module that injects reference dance as a prompt. Additionally, we design a slow/fast-cadence Mixture-of-Experts (MoE) module that enables ChoreoLLaMA to smoothly adapt motion rhythms across varying music tempos. Extensive experiments across diverse dance genres show that our approach surpasses existing methods in both qualitative and quantitative evaluations, marking a step toward scalable, real-world 3D dance generation. Code, models, and data will be released.

Problem

Research questions and friction points this paper is trying to address.

3D dance generation

in-the-wild generalization

music-to-dance

physical plausibility

scalability

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D dance generation

retrieval-augmented generation

Mixture-of-Experts