VioPTT: Violin Technique-Aware Transcription from Synthetic Data Augmentation

📅 2025-09-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing violin automatic music transcription methods suffer from the absence of joint modeling of pitch and playing techniques, heavy reliance on manual annotations, and poor generalization. Method: This paper proposes a lightweight end-to-end multi-task model that simultaneously detects note onsets/offsets, estimates pitch, and classifies six canonical playing techniques (e.g., vibrato, bow change, harmonics) within a unified framework. To address the scarcity of real-world labeled data, we introduce MOSA-VPT—a high-fidelity synthetic dataset—and design a physics-informed data augmentation strategy to generate audio with precise technique annotations. Contribution/Results: The model is optimized via joint multi-task training and achieves state-of-the-art performance on real recordings: 89.3% F1-score for technique classification—significantly outperforming prior approaches—while requiring no manual annotation.

Technology Category

Application Category

📝 Abstract
While automatic music transcription is well-established in music information retrieval, most models are limited to transcribing pitch and timing information from audio, and thus omit crucial expressive and instrument-specific nuances. One example is playing technique on the violin, which affords its distinct palette of timbres for maximal emotional impact. Here, we propose extbf{VioPTT} (Violin Playing Technique-aware Transcription), a lightweight, end-to-end model that directly transcribes violin playing technique in addition to pitch onset and offset. Furthermore, we release extbf{MOSA-VPT}, a novel, high-quality synthetic violin playing technique dataset to circumvent the need for manually labeled annotations. Leveraging this dataset, our model demonstrated strong generalization to real-world note-level violin technique recordings in addition to achieving state-of-the-art transcription performance. To our knowledge, VioPTT is the first to jointly combine violin transcription and playing technique prediction within a unified framework.
Problem

Research questions and friction points this paper is trying to address.

Transcribing violin playing techniques beyond pitch
Overcoming limited manually labeled training data
Unifying violin transcription and technique prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight end-to-end model for violin technique transcription
Synthetic dataset to replace manual annotation requirements
Unified framework combining pitch and technique prediction
🔎 Similar Papers
No similar papers found.
T
Ting-Kang Wang
Graduate Institute of Communication Engineering, National Taiwan University, Taiwan
Y
Yueh-Po Peng
Original Creation Center, Gamania Inc., Taipei, Taiwan
Li Su
Li Su
Institute of Information Science, Academia Sinica
Music information retrievalsignal processingmachine learningcomputational musicology
V
Vincent K. M. Cheung
Sony Computer Science Laboratories, Inc., Tokyo, Japan