PF-D2M: A Pose-free Diffusion Model for Universal Dance-to-Music Generation

📅 2026-01-22

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Existing approaches to dance-to-music generation rely heavily on single human pose features and are constrained by small-scale datasets, limiting their generalization to complex scenarios such as multiple dancers or non-human performers. This work proposes a pose-free diffusion model that directly extracts visual features from dance videos, bypassing the need for explicit pose estimation. To enhance data efficiency and generalization, the method incorporates a progressive training strategy. By operating directly on raw visual inputs, the model accommodates an arbitrary number and type of dancers without requiring pose annotations. Experimental results demonstrate that the proposed approach achieves state-of-the-art performance in both objective metrics and subjective evaluations, setting new benchmarks for dance–music alignment and audio generation quality.

Technology Category

Application Category

📝 Abstract

Dance-to-music generation aims to generate music that is aligned with dance movements. Existing approaches typically rely on body motion features extracted from a single human dancer and limited dance-to-music datasets, which restrict their performance and applicability to real-world scenarios involving multiple dancers and non-human dancers. In this paper, we propose PF-D2M, a universal diffusion-based dance-to-music generation model that incorporates visual features extracted from dance videos. PF-D2M is trained with a progressive training strategy that effectively addresses data scarcity and generalization challenges. Both objective and subjective evaluations show that PF-D2M achieves state-of-the-art performance in dance-music alignment and music quality.

Problem

Research questions and friction points this paper is trying to address.

dance-to-music generation

multiple dancers

non-human dancers

data scarcity

generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

pose-free

diffusion model

dance-to-music generation