TCDiff++: An End-to-end Trajectory-Controllable Diffusion Model for Harmonious Music-Driven Group Choreography

📅 2025-06-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses three key challenges in music-driven group dance generation: inter-dancer collisions, foot sliding in single-dancer motion, and abrupt positional discontinuities in long sequences. To tackle these, we propose an end-to-end controllable trajectory generation framework based on diffusion models. Methodologically: (1) we introduce dancer localization embeddings and a distance-consistency loss to explicitly enforce spatial constraints and suppress collisions; (2) we design swap-pattern embeddings and a foot-motion adapter to improve the physical plausibility of foot trajectories; and (3) we develop a long-sequence diffusion sampling strategy coupled with a sequence decoding layer to enhance spatiotemporal coherence across frames. Extensive experiments demonstrate that our approach significantly outperforms existing methods in long-duration group dance synthesis, achieving state-of-the-art performance in motion fluency, group coordination, and trajectory controllability.

Technology Category

Application Category

📝 Abstract
Music-driven dance generation has garnered significant attention due to its wide range of industrial applications, particularly in the creation of group choreography. During the group dance generation process, however, most existing methods still face three primary issues: multi-dancer collisions, single-dancer foot sliding and abrupt swapping in the generation of long group dance. In this paper, we propose TCDiff++, a music-driven end-to-end framework designed to generate harmonious group dance. Specifically, to mitigate multi-dancer collisions, we utilize a dancer positioning embedding to better maintain the relative positioning among dancers. Additionally, we incorporate a distance-consistency loss to ensure that inter-dancer distances remain within plausible ranges. To address the issue of single-dancer foot sliding, we introduce a swap mode embedding to indicate dancer swapping patterns and design a Footwork Adaptor to refine raw motion, thereby minimizing foot sliding. For long group dance generation, we present a long group diffusion sampling strategy that reduces abrupt position shifts by injecting positional information into the noisy input. Furthermore, we integrate a Sequence Decoder layer to enhance the model's ability to selectively process long sequences. Extensive experiments demonstrate that our TCDiff++ achieves state-of-the-art performance, particularly in long-duration scenarios, ensuring high-quality and coherent group dance generation.
Problem

Research questions and friction points this paper is trying to address.

Prevent multi-dancer collisions in group choreography
Reduce single-dancer foot sliding in dance motions
Minimize abrupt position shifts in long group dances
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dancer positioning embedding prevents collisions
Footwork Adaptor reduces single-dancer foot sliding
Long group diffusion sampling minimizes abrupt shifts
🔎 Similar Papers
No similar papers found.
Yuqin Dai
Yuqin Dai
Tsinghua University
LLMAI4ScienceAvatarGenerative Model
W
Wanlu Zhu
PCA Lab, Key Lab of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China.
Ronghui Li
Ronghui Li
Tsinghua University
Human InteractionMotion GenerationDigital HumanComputer Vision
Xiu Li
Xiu Li
Bytedance Seed
Computer VisionComputer Graphics3D Vision
Z
Zhenyu Zhang
Nanjing University, Suzhou, China.
J
Jun Li
PCA Lab, Key Lab of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China.
J
Jian Yang
PCA Lab, Key Lab of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China.