The Spheres Dataset: Multitrack Orchestral Recordings for Music Source Separation and Information Retrieval

📅 2025-11-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Classical music source separation and music information retrieval suffer from a lack of high-quality, professionally recorded multitrack datasets. Method: We introduce Orchestral-23, the first professional multitrack orchestral dataset—comprising complete movements, scales, and solo excerpts totaling over one hour—recorded using a 23-channel microphone array (including close, main, and ambient microphones) to simultaneously capture multichannel audio and room impulse responses, enabling controllable crosstalk modeling. The dataset provides high-fidelity isolated stems and spatial acoustic features. Contribution/Results: Leveraging Orchestral-23, we establish orchestral-family-level source separation and microphone crosstalk suppression baselines using the X-UMX model, demonstrating both feasibility and challenges of source separation in highly reverberant classical music scenarios. This work establishes a reproducible benchmark and introduces a new research paradigm for classical music signal processing.

Technology Category

Application Category

📝 Abstract
This paper introduces The Spheres dataset, multitrack orchestral recordings designed to advance machine learning research in music source separation and related MIR tasks within the classical music domain. The dataset is composed of over one hour recordings of musical pieces performed by the Colibrì Ensemble at The Spheres recording studio, capturing two canonical works - Tchaikovsky's Romeo and Juliet and Mozart's Symphony No. 40 - along with chromatic scales and solo excerpts for each instrument. The recording setup employed 23 microphones, including close spot, main, and ambient microphones, enabling the creation of realistic stereo mixes with controlled bleeding and providing isolated stems for supervised training of source separation models. In addition, room impulse responses were estimated for each instrument position, offering valuable acoustic characterization of the recording space. We present the dataset structure, acoustic analysis, and baseline evaluations using X-UMX based models for orchestral family separation and microphone debleeding. Results highlight both the potential and the challenges of source separation in complex orchestral scenarios, underscoring the dataset's value for benchmarking and for exploring new approaches to separation, localization, dereverberation, and immersive rendering of classical music.
Problem

Research questions and friction points this paper is trying to address.

Advancing music source separation in classical orchestral recordings
Providing multitrack data for machine learning in music information retrieval
Addressing separation challenges in complex acoustic orchestral environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multitrack orchestral recordings with 23 microphones setup
Dataset provides isolated stems for supervised source separation
Includes room impulse responses for acoustic characterization
🔎 Similar Papers
No similar papers found.
J
Jaime Garcia-Martinez
Universidad de Jaen, Spain
D
David Diaz-Guerra
Tampere University, Finland
John Anderson
John Anderson
Distinguished Scientist, Google
ML for Geophysical systemsRecommendation SystemsComputer Graphics
Ricardo Falcon-Perez
Ricardo Falcon-Perez
Tampere University, Finland
P
Pablo Cabañas-Molero
Universidad de Jaen, Spain
Tuomas Virtanen
Tuomas Virtanen
Tampere University
machine listeningaudio signal processingaudio
J
Julio J. Carabias-Orti
Universidad de Jaen, Spain
Pedro Vera-Candeas
Pedro Vera-Candeas
Universidad de Jaen, Spain