PianoFlow: Music-Aware Streaming Piano Motion Generation with Bimanual Coordination

📅 2026-04-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

229K/year
🤖 AI Summary
This work addresses key challenges in audio-driven piano motion generation—namely, inaccurate modeling of musical structure, rigid hand coordination mechanisms, and difficulties in real-time generation of long sequences—by introducing PianoFlow, a novel framework. PianoFlow decouples MIDI prior distillation from audio inference for the first time and incorporates asymmetric role-gated attention to explicitly model dynamic bimanual collaboration. It further enables efficient streaming generation of arbitrary length through an autoregressive flow continuation mechanism. Built upon a flow-matching generative architecture that integrates multimodal MIDI-audio distillation, PianoFlow significantly outperforms existing methods on the PianoMotion10M dataset, achieving over a 9× speedup in inference while preserving semantic fidelity and temporal coherence of the generated motions.

Technology Category

Application Category

📝 Abstract
Audio-driven bimanual piano motion generation requires precise modeling of complex musical structures and dynamic cross-hand coordination. However, existing methods often rely on acoustic-only representations lacking symbolic priors, employ inflexible interaction mechanisms, and are limited to computationally expensive short-sequence generation. To address these limitations, we propose PianoFlow, a flow-matching framework for precise and coordinated bimanual piano motion synthesis. Our approach strategically leverages MIDI as a privileged modality during training, distilling these structured musical priors to achieve deep semantic understanding while maintaining audio-only inference. Furthermore, we introduce an asymmetric role-gated interaction module to explicitly capture dynamic cross-hand coordination through role-aware attention and temporal gating. To enable real-time streaming generation for arbitrarily long sequences, we design an autoregressive flow continuation scheme that ensures seamless cross-chunk temporal coherence. Extensive experiments on the PianoMotion10M dataset demonstrate that PianoFlow achieves superior quantitative and qualitative performance, while accelerating inference by over 9\times compared to previous methods.
Problem

Research questions and friction points this paper is trying to address.

bimanual piano motion generation
audio-driven animation
music-aware synthesis
long-sequence generation
cross-hand coordination
Innovation

Methods, ideas, or system contributions that make the work stand out.

flow-matching
bimanual coordination
MIDI priors
role-gated attention
streaming generation