ViolinDiff: Enhancing Expressive Violin Synthesis with Pitch Bend Conditioning

📅 2024-09-19
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the lack of expressive F0 contour modeling—particularly pitch bends—in polyphonic violin synthesis, this paper proposes the first two-stage diffusion framework that explicitly models pitch bend. In Stage I, a fine-grained pitch bend curve is regressed from MIDI input; in Stage II, a expressive mel-spectrogram is generated conditioned on this curve. The method jointly leverages time-frequency representations and music-informed priors to achieve precise and controllable modeling of dynamic F0 trajectories. Quantitative evaluation demonstrates significant improvements over baselines in STOI, ESTOI, and F0 RMSE. Subjective listening tests further confirm that the synthesized audio exhibits more natural timbre and stronger performer-specific expressivity. To our knowledge, this is the first work to successfully integrate explicit pitch bend modeling into polyphonic string synthesis, advancing both fidelity and musicality in generative orchestral modeling.

Technology Category

Application Category

📝 Abstract
Modeling the natural contour of fundamental frequency (F0) plays a critical role in music audio synthesis. However, transcribing and managing multiple F0 contours in polyphonic music is challenging, and explicit F0 contour modeling has not yet been explored for polyphonic instrumental synthesis. In this paper, we present ViolinDiff, a two-stage diffusion-based synthesis framework. For a given violin MIDI file, the first stage estimates the F0 contour as pitch bend information, and the second stage generates mel spectrogram incorporating these expressive details. The quantitative metrics and listening test results show that the proposed model generates more realistic violin sounds than the model without explicit pitch bend modeling. Audio samples are available online: daewoung.github.io/ViolinDiff-Demo.
Problem

Research questions and friction points this paper is trying to address.

Enhances violin synthesis realism
Models F0 contour for polyphonic music
Incorporates pitch bend in synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage diffusion-based synthesis
Pitch bend conditioning modeling
Mel spectrogram generation with expressive details
🔎 Similar Papers
No similar papers found.