Not Like Transformers: Drop the Beat Representation for Dance Generation with Mamba-Based Diffusion Model

πŸ“… 2026-03-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing dance generation methods struggle to effectively model the temporal dynamics of dance motions, their rhythmic characteristics, and synchronization with music. To address this, this work proposes MambaDanceβ€”a two-stage diffusion-based generative framework that leverages the Mamba state space model, marking the first application of Mamba in dance generation. The approach introduces a Gaussian beat representation to explicitly guide rhythm alignment, thereby overcoming the computational and memory bottlenecks of Transformers in long-sequence modeling. Evaluated on the AIST++ and FineDance datasets, MambaDance generates rhythmically coherent and structurally consistent dance sequences ranging from short to long durations, outperforming current state-of-the-art methods.

Technology Category

Application Category

πŸ“ Abstract
Dance is a form of human motion characterized by emotional expression and communication, playing a role in various fields such as music, virtual reality, and content creation. Existing methods for dance generation often fail to adequately capture the inherently sequential, rhythmical, and music-synchronized characteristics of dance. In this paper, we propose \emph{MambaDance}, a new dance generation approach that leverages a Mamba-based diffusion model. Mamba, well-suited to handling long and autoregressive sequences, is integrated into our two-stage diffusion architecture, substituting off-the-shelf Transformer. Additionally, considering the critical role of musical beats in dance choreography, we propose a Gaussian-based beat representation to explicitly guide the decoding of dance sequences. Experiments on AIST++ and FineDance datasets for each sequence length show that our proposed method effectively generates plausible dance movements while reflecting essential characteristics, consistently from short to long dances, compared to the previous methods. Additional qualitative results and demo videos are available at \small{https://vision3d-lab.github.io/mambadance}.
Problem

Research questions and friction points this paper is trying to address.

dance generation
sequential modeling
rhythm
music synchronization
human motion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mamba
diffusion model
beat representation
dance generation
music-conditioned motion
πŸ”Ž Similar Papers
No similar papers found.
S
Sangjune Park
Ulsan National Institute of Science and Technology, South Korea
I
Inhyeok Choi
Ulsan National Institute of Science and Technology, South Korea
D
Donghyeon Soon
Daegu Gyeongbuk Institute of Science and Technology, South Korea
Y
Youngwoo Jeon
Ulsan National Institute of Science and Technology, South Korea
Kyungdon Joo
Kyungdon Joo
Associate Professor, UNIST
3D Computer VisionRobot Vision