🤖 AI Summary
This work addresses the challenges of modeling high-dimensional full-body motion interactions in partnered ballroom dancing, which typically require large datasets and complex synthesis pipelines. The authors propose an efficient approach that leverages three-point skeletal trajectories captured via VR from the leader as input and employs a lightweight MLP network to directly predict corresponding trajectories for the follower. Coupled with an autoregressive strategy, this method deterministically reconstructs full-body motions without relying on generative models. The study demonstrates for the first time that three-point trajectories are sufficient to characterize ballroom dance movements and enable high-quality synthesis. The proposed framework is compact, explicit, and data-efficient, achieving robust and computationally lightweight partnered dance generation on benchmark datasets such as Ballroom and LaFAN, thereby significantly reducing both computational and data requirements and advancing immersive co-dancing applications.
📝 Abstract
Ballroom dancing is a structured yet expressive motion category. Its highly diverse movement and complex interactions between leader and follower dancers make the understanding and synthesis challenging. We demonstrate that the three-point trajectory available from a virtual reality (VR) device can effectively serve as a dancer's motion descriptor, simplifying the modeling and synthesis of interplay between dancers'full-body motions down to sparse trajectories. Thanks to the low dimensionality, we can employ an efficient MLP network to predict the follower's three-point trajectory directly from the leader's three-point input for certain types of ballroom dancing, addressing the challenge of modeling high-dimensional full-body interaction. It also prevents our method from overfitting thanks to its compact yet explicit representation. By leveraging the inherent structure of the movements and carefully planning the autoregressive procedure, we show a deterministic neural network is able to translate three-point trajectories into a virtual embodied avatar, which is typically considered under-constrained and requires generative models for common motions. In addition, we demonstrate this deterministic approach generalizes beyond small, structured datasets like ballroom dancing, and performs robustly on larger, more diverse datasets such as LaFAN. Our method provides a computationally- and data-efficient solution, opening new possibilities for immersive paired dancing applications. Code and pre-trained models for this paper are available at https://peizhuoli.github.io/dancing-points.