🤖 AI Summary
Optical Music Recognition (OMR) for handwritten jazz lead sheets—containing both melodic lines and chord symbols—remains challenging due to the neglect of chord recognition in existing systems, coupled with high handwriting variability and poor image quality. Method: We introduce the first publicly available dataset of real handwritten jazz lead sheets (293 instances), each annotated with precisely aligned Humdrum kern and MusicXML representations, alongside配套 synthetic images. We propose a dedicated OMR model and tokenization strategy designed for the chord–melody coupled structure, integrating pretrained vision models, synthetic data augmentation, and transfer learning for end-to-end image-to-structured-music transcription. Contribution/Results: Our approach achieves significant improvements over baselines in chord recognition accuracy and overall OMR performance. All code, data, and models are open-sourced, establishing a foundational resource for music information retrieval (MIR) and intelligent sheet music generation.
📝 Abstract
In this paper, we address the challenge of Optical Music Recognition (OMR) for handwritten jazz lead sheets, a widely used musical score type that encodes melody and chords. The task is challenging due to the presence of chords, a score component not handled by existing OMR systems, and the high variability and quality issues associated with handwritten images. Our contribution is two-fold. We present a novel dataset consisting of 293 handwritten jazz lead sheets of 163 unique pieces, amounting to 2021 total staves aligned with Humdrum **kern and MusicXML ground truth scores. We also supply synthetic score images generated from the ground truth. The second contribution is the development of an OMR model for jazz lead sheets. We discuss specific tokenisation choices related to our kind of data, and the advantages of using synthetic scores and pretrained models. We publicly release all code, data, and models.