SynthSOD: Developing an Heterogeneous Dataset for Orchestra Music Source Separation

📅 2024-09-17

🏛️ IEEE Open Journal of Signal Processing

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Symphonic recordings suffer from severe acoustic overlap among instruments and lack clean, multi-track annotations, hindering progress in source separation research. To address this, we introduce SynthSOD—the first heterogeneous synthetic multi-track dataset specifically designed for symphonic music—along with a novel orchestral synthesis paradigm integrating MIDI-driven virtual instrument rendering, physics-informed reverberation and spatial field modeling, and dynamic hierarchical composition. Crucially, our approach explicitly models musically meaningful semantics, including performance dynamics, tempo, stylistic articulation, and conditional attributes. SynthSOD fills a critical gap by providing a high-fidelity, semantically rich benchmark for symphonic source separation. Leveraging SynthSOD for transfer learning, we fine-tune models such as Demucs, achieving significant performance gains over the EnsembleSet baseline on both synthetic and real-world symphonic recordings. These results demonstrate that semantically controllable synthetic data substantially enhances model effectiveness and generalization in complex orchestral separation tasks.

Technology Category

Application Category

📝 Abstract

Recent advancements in music source separation have significantly progressed, particularly in isolating vocals, drums, and bass elements from mixed tracks. These developments owe much to the creation and use of large-scale, multitrack datasets dedicated to these specific components. However, the challenge of extracting similarly sounding sources from orchestra recordings has not been extensively explored, largely due to a scarcity of comprehensive and clean (i.e bleed-free) multitrack datasets. In this paper, we introduce a novel multitrack dataset called SynthSOD, developed using a set of simulation techniques to create a realistic, musically motivated, and heterogeneous training set comprising different dynamics, natural tempo changes, styles, and conditions by employing high-quality digital libraries that define virtual instrument sounds for MIDI playback (a.k.a., soundfonts). Moreover, we demonstrate the application of a widely used baseline music separation model trained on our synthesized dataset w.r.t to the well-known EnsembleSet, and evaluate its performance under both synthetic and real-world conditions.

Problem

Research questions and friction points this paper is trying to address.

Developing orchestra music source separation

Creating clean multitrack datasets

Evaluating separation model performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed novel multitrack dataset SynthSOD

Used high-quality soundfonts for realism

Applied baseline music separation model

🔎 Similar Papers

COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations

2024-04-25arXiv.orgCitations: 2

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Multimodal AI (PhD)