🤖 AI Summary
To address inaccurate MIDI note velocity estimation in Automatic Music Transcription (AMT), this paper proposes a score-guided, incremental BiLSTM-based correction method. Rather than reconstructing temporal structure, the approach jointly models raw audio features and reference MIDI scores to explicitly estimate and correct velocity errors. Distinct from end-to-end re-generation, it is the first work to incorporate BiLSTM into the AMT post-processing stage, enabling plug-and-play enhancement of mainstream systems such as HPT. Experiments on a high-resolution piano transcription benchmark demonstrate a significant improvement in velocity estimation accuracy—reducing Mean Absolute Error (MAE) by 18.7%—while maintaining compatibility with existing pipelines. Although the method does not surpass current state-of-the-art (SOTA) performance, it validates the effectiveness and generalizability of the “score-aware + sequential modeling” correction paradigm for AMT.
📝 Abstract
MIDI is a modern standard for storing music, recording how musical notes are played. Many piano performances have corresponding MIDI scores available online. Some of these are created by the original performer, recording on an electric piano alongside the audio, while others are through manual transcription. In recent years, automatic music transcription (AMT) has rapidly advanced, enabling machines to transcribe MIDI from audio. However, these transcriptions often require further correction. Assuming a perfect timing correction, we focus on the loudness correction in terms of MIDI velocity (a parameter in MIDI for loudness control). This task can be approached through score-informed MIDI velocity estimation, which has undergone several developments. While previous approaches introduced specifically built models to re-estimate MIDI velocity, thereby replacing AMT estimates, we propose a BiLSTM correction module to refine AMT-estimated velocity. Although we did not reach state-of-the-art performance, we validated our method on the well-known AMT system, the high-resolution piano transcription (HPT), and achieved significant improvements.