Direct content-based retrieval from music scores images

📅 2026-05-21
📈 Citations: 0
Influential: 0
📄 PDF

career value

174K/year
🤖 AI Summary
This study addresses the limitations of existing music score image retrieval approaches, which predominantly rely on metadata and lack effective content-driven methods. The work systematically investigates visual features that are effective for score retrieval and introduces, for the first time, a generalizable method for constructing query datasets applicable to any annotated music score corpus. It comparatively evaluates three retrieval paradigms: transcription-dependent approaches based on optical music recognition (OMR), end-to-end transformer models that operate without transcription, and large language models enhanced with textual prompts. Experiments across four diverse music score corpora—varying in size, image quality, and layout style—demonstrate that OMR-based methods achieve superior performance in within-domain retrieval, whereas transcription-free models exhibit greater robustness in cross-domain scenarios.
📝 Abstract
The digitization of musical scores plays a crucial role in their preservation and accessibility, yet information retrieval still depends mainly on metadata searches, such as by title or composer. Content based search in music score images remains underexplored compared to text documents, despite its potential value for musicians, musicologists, and educators. This work contributes to the field by first studying which characteristics of a score are most relevant for search and by defining a systematic method to build query datasets from any annotated corpus. We also consider diverse methods for content-based search on music score images, ranging from transcription-based approaches relying on Optical Music Recognition (OMR), to a transcription-free Transformer model trained to recognize queries directly from score images, and a text-prompted Large Language Model. Our experiments evaluate these models on four corpora exhibiting diverse characteristics in terms of dataset size, image quality, and typesetting mechanisms. Overall, each method excels under different conditions: OMR-based pipelines achieve higher in-domain retrieval, whereas transcription-free models handle domain variability more effectively.
Problem

Research questions and friction points this paper is trying to address.

content-based retrieval
music score images
Optical Music Recognition
digital music archives
musical information retrieval
Innovation

Methods, ideas, or system contributions that make the work stand out.

content-based retrieval
music score images
Optical Music Recognition
transcription-free Transformer
query dataset construction
🔎 Similar Papers
2024-03-06IEEE Transactions on Audio, Speech, and Language ProcessingCitations: 1