Music Transcription with (Almost) No Supervision

📅 2026-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High-quality music transcription is hindered by the scarcity of paired audio–score datasets. This work proposes a cycle-consistency–based cross-modal translation framework that leverages abundant unpaired audio and score data, requiring only a small number of paired examples as anchors for effective low-supervision training. It presents the first systematic validation of the critical role of unpaired data in transcription, revealing that unlabeled audio carries more informative value than unpaired scores. The approach further enables zero-shot transcription for previously unseen instruments without any paired supervision. By significantly improving transcription performance under low-resource conditions, this method offers a practical solution for transcribing music from rare or underrepresented instruments.
📝 Abstract
Competitive music transcription models require large amounts of paired audio-score data, which is scarce due to collection costs, alignment difficulty, and copyright restrictions. Meanwhile, vast quantities of unpaired audio recordings and symbolic scores are freely available but have gone unused. We adopt a cycle-consistent translation framework in which a small amount of paired data acts as a minimal anchor, unlocking the full potential of the unpaired pool. We find that: unpaired data yields surprisingly large gains, especially under limited supervision; unpaired audio contributes more than unpaired scores; incorporating unlabeled audio from a new instrument during training improves transcription for that instrument without any paired supervision. Together, these results suggest that scaling unpaired data offers a practical path toward high-quality transcription for instruments where labeled data remains scarce.
Problem

Research questions and friction points this paper is trying to address.

music transcription
paired audio-score data
data scarcity
unpaired data
instrument-specific transcription
Innovation

Methods, ideas, or system contributions that make the work stand out.

music transcription
unsupervised learning
cycle-consistent translation
unpaired data
zero-shot instrument adaptation
🔎 Similar Papers
No similar papers found.