DanceCrafter: Fine-Grained Text-Driven Controllable Dance Generation via Choreographic Syntax

📅 2026-04-19

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This work addresses the challenges of text-driven controllable dance generation, which are primarily hindered by the scarcity of high-quality data and the inherent complexity of dance motion—particularly its spatial dynamics, strong directional constraints, and highly decoupled movements across body parts. To overcome these limitations, the authors propose a theoretical framework termed “choreographic grammar,” integrating principles from dance theory, human anatomy, and biomechanics. They introduce DanceFlow, a novel dataset comprising 41 hours of high-fidelity motion capture paired with 6.34 million words of fine-grained textual descriptions, and develop DanceCrafter, a motion Transformer built upon the Momentum Human Rig skeleton. The model incorporates continuous manifold-based motion representations, hybrid normalization, and an anatomy-aware loss function. Quantitative evaluations and user studies demonstrate that this approach significantly outperforms existing methods in motion quality, fine-grained controllability, and naturalness of generation.

Technology Category

Application Category

📝 Abstract

Text-driven controllable dance generation remains under-explored, primarily due to the severe scarcity of high-quality datasets and the inherent difficulty of articulating complex choreographies. Characterizing dance is particularly challenging owing to its intricate spatial dynamics, strong directionality, and the highly decoupled movements of distinct body parts. To overcome these bottlenecks, we bridge principles from dance studies, human anatomy, and biomechanics to propose \textit{Choreographic Syntax}, a novel theoretical framework with a tailored annotation system. Grounded in this syntax, we combine professional dance archives with high-fidelity motion capture data to construct \textbf{DanceFlow}, the most fine-grained dance dataset to date. It encompasses 41 hours of high-quality motions paired with 6.34 million words of detailed descriptions. At the model level, we introduce \textbf{DanceCrafter}, a tailored motion transformer built upon the Momentum Human Rig. To circumvent optimization instabilities, we construct a continuous manifold motion representation paired with a hybrid normalization strategy. Furthermore, we design an anatomy-aware loss to explicitly regulate the decoupled nature of body parts. Together, these adaptations empower DanceCrafter to achieve the high-fidelity and stable generation of complex dance sequences. Extensive evaluations and user studies demonstrate our state-of-the-art performance in motion quality, fine-grained controllability, and generation naturalness.

Problem

Research questions and friction points this paper is trying to address.

text-driven dance generation

controllable motion synthesis

choreographic representation

fine-grained dance dataset

complex choreography modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Choreographic Syntax

Fine-Grained Dance Generation

Text-Driven Motion Synthesis