🤖 AI Summary
Severe scarcity of annotated handwritten historical music scores critically limits optical music recognition (OMR) performance. Method: We propose a content-conditioned generative adversarial network (cGAN) that enables, for the first time, semantically controllable synthesis of handwritten musical symbols. The model takes structured score information—including note class, duration, and spatial position—as conditional inputs to generate high-fidelity symbol images; these are then physically consistent with staff layout and rendering via the Smashcima toolkit, forming an end-to-end synthetic pipeline. Contribution/Results: Generated symbols exhibit significantly improved visual fidelity and contextual consistency over prior methods. When used to augment training data, they substantially boost OMR accuracy on real handwritten scores. This work establishes a scalable, interpretable, and high-quality synthetic data paradigm for low-resource OMR tasks.
📝 Abstract
The field of Optical Music Recognition (OMR) is currently hindered by the scarcity of real annotated data, particularly when dealing with handwritten historical musical scores. In similar fields, such as Handwritten Text Recognition, it was proven that synthetic examples produced with image generation techniques could help to train better-performing recognition architectures. This study explores the generation of realistic, handwritten-looking scores by implementing a music symbol-level Generative Adversarial Network (GAN) and assembling its output into a full score using the Smashcima engraving software. We have systematically evaluated the visual fidelity of these generated samples, concluding that the generated symbols exhibit a high degree of realism, marking significant progress in synthetic score generation.