The Spheres Dataset: Multitrack Orchestral Recordings for Music Source Separation and Information Retrieval

📅 2025-11-26

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

Classical music source separation and music information retrieval suffer from a lack of high-quality, professionally recorded multitrack datasets. Method: We introduce Orchestral-23, the first professional multitrack orchestral dataset—comprising complete movements, scales, and solo excerpts totaling over one hour—recorded using a 23-channel microphone array (including close, main, and ambient microphones) to simultaneously capture multichannel audio and room impulse responses, enabling controllable crosstalk modeling. The dataset provides high-fidelity isolated stems and spatial acoustic features. Contribution/Results: Leveraging Orchestral-23, we establish orchestral-family-level source separation and microphone crosstalk suppression baselines using the X-UMX model, demonstrating both feasibility and challenges of source separation in highly reverberant classical music scenarios. This work establishes a reproducible benchmark and introduces a new research paradigm for classical music signal processing.

Technology Category

Application Category

📝 Abstract

This paper introduces The Spheres dataset, multitrack orchestral recordings designed to advance machine learning research in music source separation and related MIR tasks within the classical music domain. The dataset is composed of over one hour recordings of musical pieces performed by the Colibrì Ensemble at The Spheres recording studio, capturing two canonical works - Tchaikovsky's Romeo and Juliet and Mozart's Symphony No. 40 - along with chromatic scales and solo excerpts for each instrument. The recording setup employed 23 microphones, including close spot, main, and ambient microphones, enabling the creation of realistic stereo mixes with controlled bleeding and providing isolated stems for supervised training of source separation models. In addition, room impulse responses were estimated for each instrument position, offering valuable acoustic characterization of the recording space. We present the dataset structure, acoustic analysis, and baseline evaluations using X-UMX based models for orchestral family separation and microphone debleeding. Results highlight both the potential and the challenges of source separation in complex orchestral scenarios, underscoring the dataset's value for benchmarking and for exploring new approaches to separation, localization, dereverberation, and immersive rendering of classical music.

Problem

Research questions and friction points this paper is trying to address.

Advancing music source separation in classical orchestral recordings

Providing multitrack data for machine learning in music information retrieval

Addressing separation challenges in complex acoustic orchestral environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multitrack orchestral recordings with 23 microphones setup

Dataset provides isolated stems for supervised source separation

Includes room impulse responses for acoustic characterization

🔎 Similar Papers

SynthSOD: Developing an Heterogeneous Dataset for Orchestra Music Source Separation