Is Transfer Learning Necessary for Violin Transcription?

📅 2025-08-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Automatic music transcription (AMT) for violin has long suffered from severe scarcity of annotated data, leading most prior work to rely on transfer learning from piano-pretrained models. Method: To assess whether end-to-end training on violin-specific data can supplant cross-instrument transfer, we train a standard piano AMT architecture—without pretraining—exclusively on the medium-scale violin annotation dataset (MOSA), and evaluate the resulting model on URMP and Bach10. Contribution/Results: Our violin-only model substantially outperforms fine-tuned piano models, achieving absolute improvements of 3.2–5.7 percentage points in F-measure. This challenges the prevailing assumption that cross-instrument transfer is necessary for high-performance violin AMT. We demonstrate that instrument-specific data, combined with appropriate architectural adaptation, suffices to achieve state-of-the-art performance—establishing a new paradigm for AMT in low-resource instruments.

Technology Category

Application Category

📝 Abstract
Automatic music transcription (AMT) has achieved remarkable progress for instruments such as the piano, largely due to the availability of large-scale, high-quality datasets. In contrast, violin AMT remains underexplored due to limited annotated data. A common approach is to fine-tune pretrained models for other downstream tasks, but the effectiveness of such transfer remains unclear in the presence of timbral and articulatory differences. In this work, we investigate whether training from scratch on a medium-scale violin dataset can match the performance of fine-tuned piano-pretrained models. We adopt a piano transcription architecture without modification and train it on the MOSA dataset, which contains about 30 hours of aligned violin recordings. Our experiments on URMP and Bach10 show that models trained from scratch achieved competitive or even superior performance compared to fine-tuned counterparts. These findings suggest that strong violin AMT is possible without relying on pretrained piano representations, highlighting the importance of instrument-specific data collection and augmentation strategies.
Problem

Research questions and friction points this paper is trying to address.

Investigating violin transcription without transfer learning from piano
Evaluating scratch-trained models versus fine-tuned piano-pretrained systems
Assessing violin AMT performance on URMP and Bach10 datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training from scratch on violin data
Using unmodified piano transcription architecture
Achieving competitive performance without transfer learning
🔎 Similar Papers
No similar papers found.
Y
Yueh-Po Peng
Sony Computer Science Laboratories, Tokyo, Japan; Original Creation Center, Taipei, Taiwan
T
Ting-Kang Wang
National Taiwan University, Taiwan; Sony Computer Science Laboratories, Tokyo, Japan
Li Su
Li Su
Institute of Information Science, Academia Sinica
Music information retrievalsignal processingmachine learningcomputational musicology
V
Vincent K. M. Cheung
Sony Computer Science Laboratories, Tokyo, Japan