SynthSOD: Developing an Heterogeneous Dataset for Orchestra Music Source Separation

📅 2024-09-17
🏛️ IEEE Open Journal of Signal Processing
📈 Citations: 0
Influential: 0
📄 PDF

career value

189K/year
🤖 AI Summary
Symphonic recordings suffer from severe acoustic overlap among instruments and lack clean, multi-track annotations, hindering progress in source separation research. To address this, we introduce SynthSOD—the first heterogeneous synthetic multi-track dataset specifically designed for symphonic music—along with a novel orchestral synthesis paradigm integrating MIDI-driven virtual instrument rendering, physics-informed reverberation and spatial field modeling, and dynamic hierarchical composition. Crucially, our approach explicitly models musically meaningful semantics, including performance dynamics, tempo, stylistic articulation, and conditional attributes. SynthSOD fills a critical gap by providing a high-fidelity, semantically rich benchmark for symphonic source separation. Leveraging SynthSOD for transfer learning, we fine-tune models such as Demucs, achieving significant performance gains over the EnsembleSet baseline on both synthetic and real-world symphonic recordings. These results demonstrate that semantically controllable synthetic data substantially enhances model effectiveness and generalization in complex orchestral separation tasks.

Technology Category

Application Category

📝 Abstract
Recent advancements in music source separation have significantly progressed, particularly in isolating vocals, drums, and bass elements from mixed tracks. These developments owe much to the creation and use of large-scale, multitrack datasets dedicated to these specific components. However, the challenge of extracting similarly sounding sources from orchestra recordings has not been extensively explored, largely due to a scarcity of comprehensive and clean (i.e bleed-free) multitrack datasets. In this paper, we introduce a novel multitrack dataset called SynthSOD, developed using a set of simulation techniques to create a realistic, musically motivated, and heterogeneous training set comprising different dynamics, natural tempo changes, styles, and conditions by employing high-quality digital libraries that define virtual instrument sounds for MIDI playback (a.k.a., soundfonts). Moreover, we demonstrate the application of a widely used baseline music separation model trained on our synthesized dataset w.r.t to the well-known EnsembleSet, and evaluate its performance under both synthetic and real-world conditions.
Problem

Research questions and friction points this paper is trying to address.

Developing orchestra music source separation
Creating clean multitrack datasets
Evaluating separation model performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed novel multitrack dataset SynthSOD
Used high-quality soundfonts for realism
Applied baseline music separation model
J
Jaime Garcia-Martinez
Telecommunication Engineering Department, University of Jaen, Linares, Spain
D
David Diaz-Guerra
Audio Research Group, Tampere University, Finland
A
A. Politis
Audio Research Group, Tampere University, Finland
Tuomas Virtanen
Tuomas Virtanen
Tampere University
machine listeningaudio signal processingaudio
J
J. Carabias-Orti
Telecommunication Engineering Department, University of Jaen, Linares, Spain
P
P. Vera-Candeas
Telecommunication Engineering Department, University of Jaen, Linares, Spain