🤖 AI Summary
This work addresses the challenge of directly identifying semantically similar dance choreographies—referred to as dance fingerprinting—from raw videos. Existing approaches rely on continuous embeddings that are difficult to index and interpret. To overcome this, the authors propose DANCEMATCH, an end-to-end framework that innovatively integrates Skeleton Motion Quantization (SMQ) with a Spatio-Temporal Transformer (STT) to generate compact, discrete, and interpretable structured motion signatures. Built upon these signatures, a sublinear retrieval engine (DRE) is designed, incorporating histogram-based indexing and a re-ranking mechanism to enable efficient large-scale search. The method demonstrates strong generalization across diverse dance styles and introduces the DANCETYPESBENCHMARK dataset, annotated with quantized motion tokens, to foster reproducible research in this domain.
📝 Abstract
We present DANCEMATCH, an end-to-end framework for motion-based dance retrieval, the task of identifying semantically similar choreographies directly from raw video, defined as DANCE FINGERPRINTING. While existing motion analysis and retrieval methods can compare pose sequences, they rely on continuous embeddings that are difficult to index, interpret, or scale. In contrast, DANCEMATCH constructs compact, discrete motion signatures that capture the spatio-temporal structure of dance while enabling efficient large-scale retrieval. Our system integrates Skeleton Motion Quantisation (SMQ) with Spatio-Temporal Transformers (STT) to encode human poses, extracted via Apple CoMotion, into a structured motion vocabulary. We further design DANCE RETRIEVAL ENGINE (DRE), which performs sub-linear retrieval using a histogram-based index followed by re-ranking for refined matching. To facilitate reproducible research, we release DANCETYPESBENCHMARK, a pose-aligned dataset annotated with quantised motion tokens. Experiments demonstrate robust retrieval across diverse dance styles and strong generalisation to unseen choreographies, establishing a foundation for scalable motion fingerprinting and quantitative choreographic analysis.