PF-D2M: A Pose-free Diffusion Model for Universal Dance-to-Music Generation

πŸ“… 2026-01-22
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing approaches to dance-to-music generation rely heavily on single human pose features and are constrained by small-scale datasets, limiting their generalization to complex scenarios such as multiple dancers or non-human performers. This work proposes a pose-free diffusion model that directly extracts visual features from dance videos, bypassing the need for explicit pose estimation. To enhance data efficiency and generalization, the method incorporates a progressive training strategy. By operating directly on raw visual inputs, the model accommodates an arbitrary number and type of dancers without requiring pose annotations. Experimental results demonstrate that the proposed approach achieves state-of-the-art performance in both objective metrics and subjective evaluations, setting new benchmarks for dance–music alignment and audio generation quality.

Technology Category

Application Category

πŸ“ Abstract
Dance-to-music generation aims to generate music that is aligned with dance movements. Existing approaches typically rely on body motion features extracted from a single human dancer and limited dance-to-music datasets, which restrict their performance and applicability to real-world scenarios involving multiple dancers and non-human dancers. In this paper, we propose PF-D2M, a universal diffusion-based dance-to-music generation model that incorporates visual features extracted from dance videos. PF-D2M is trained with a progressive training strategy that effectively addresses data scarcity and generalization challenges. Both objective and subjective evaluations show that PF-D2M achieves state-of-the-art performance in dance-music alignment and music quality.
Problem

Research questions and friction points this paper is trying to address.

dance-to-music generation
multiple dancers
non-human dancers
data scarcity
generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

pose-free
diffusion model
dance-to-music generation
visual features
progressive training
πŸ”Ž Similar Papers
No similar papers found.
Jaekwon Im
Jaekwon Im
KAIST
Music Information RetrievalMachine Learning
N
Natalia Polouliakh
Sony Computer Science Laboratories, Tokyo, Japan
T
Taketo Akama
Sony Computer Science Laboratories, Tokyo, Japan