🤖 AI Summary
This work addresses the challenge of synthesizing realistic piano-playing hand motions that simultaneously achieve high positional accuracy and naturalness. The authors propose the first four-stage cascaded framework that explicitly models the hierarchical structure of hand movements: starting from finger positions determined by key geometry and fingering, it successively optimizes trajectories, estimates wrist poses, and synthesizes full hand gestures. The method integrates statistical fingertip localization, FiLM-based conditional trajectory refinement, wrist pose estimation, and STGCN-driven pose generation. Accompanying this, they release a dataset of expert fingering annotations spanning 153 musical pieces (approximately 10 hours). Experiments demonstrate an F1 score of 0.910—substantially outperforming a diffusion-based baseline (0.121)—and user studies confirm motion quality approaching that of motion capture. Professional pianists note that anticipatory gestures remain a key direction for future improvement.
📝 Abstract
Synthesizing realistic piano hand motions requires both precision and naturalness. Physics-based methods achieve precision but produce stiff motions; data-driven models learn natural dynamics but struggle with positional accuracy. Piano motion exhibits a natural hierarchy: fingertip positions are nearly deterministic given piano geometry and fingering, while wrist and intermediate joints offer stylistic freedom. We present [OURS], a four-stage framework exploiting this hierarchy: (1) statistics-based fingertip positioning, (2) FiLM-conditioned trajectory refinement, (3) wrist estimation, and (4) STGCN-based pose synthesis. We contribute expert-annotated fingerings for the FürElise dataset (153 pieces, ~10 hours). Experiments demonstrate F1 = 0.910, substantially outperforming diffusion baselines (F1 = 0.121), with user study (N=41) confirming quality approaching motion capture. Expert evaluation by professional pianists (N=5) identified anticipatory motion as the key remaining gap, providing concrete directions for future improvement.