🤖 AI Summary
Automatic music transcription (AMT) for violin has long suffered from severe scarcity of annotated data, leading most prior work to rely on transfer learning from piano-pretrained models. Method: To assess whether end-to-end training on violin-specific data can supplant cross-instrument transfer, we train a standard piano AMT architecture—without pretraining—exclusively on the medium-scale violin annotation dataset (MOSA), and evaluate the resulting model on URMP and Bach10. Contribution/Results: Our violin-only model substantially outperforms fine-tuned piano models, achieving absolute improvements of 3.2–5.7 percentage points in F-measure. This challenges the prevailing assumption that cross-instrument transfer is necessary for high-performance violin AMT. We demonstrate that instrument-specific data, combined with appropriate architectural adaptation, suffices to achieve state-of-the-art performance—establishing a new paradigm for AMT in low-resource instruments.
📝 Abstract
Automatic music transcription (AMT) has achieved remarkable progress for instruments such as the piano, largely due to the availability of large-scale, high-quality datasets. In contrast, violin AMT remains underexplored due to limited annotated data. A common approach is to fine-tune pretrained models for other downstream tasks, but the effectiveness of such transfer remains unclear in the presence of timbral and articulatory differences. In this work, we investigate whether training from scratch on a medium-scale violin dataset can match the performance of fine-tuned piano-pretrained models. We adopt a piano transcription architecture without modification and train it on the MOSA dataset, which contains about 30 hours of aligned violin recordings. Our experiments on URMP and Bach10 show that models trained from scratch achieved competitive or even superior performance compared to fine-tuned counterparts. These findings suggest that strong violin AMT is possible without relying on pretrained piano representations, highlighting the importance of instrument-specific data collection and augmentation strategies.