MSRBench: A Benchmarking Dataset for Music Source Restoration

📅 2025-10-13

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

Existing music source restoration (MSR) benchmarks suffer from two fundamental limitations: synthetic mixtures exhibit unrealistic distortions, and real-world recordings lack clean reference signals, impeding faithful fidelity evaluation. To address this, we introduce MSRBench—the first authoritative, production-grade MSR benchmark. It comprises high-fidelity mixtures generated by professionally mixing original dry recordings of eight instrument classes, augmented with twelve realistic degradations—including analog hardware distortion, acoustic reverberation, and lossy compression. Crucially, MSRBench is the first to provide authentic dry–mix pairs with ground-truth clean references, bridging the gap between synthetic benchmarks and real-world MSR applications. Baseline evaluations using U-Net and BSRNN reveal severe performance bottlenecks: SI-SNR scores reach only −37.8 dB and −23.4 dB, respectively, while FAD and CLAP scores cluster narrowly within 0.7–0.8. These results underscore the benchmark’s critical role in advancing MSR research and evaluation.

Technology Category

Application Category

📝 Abstract

Music Source Restoration (MSR) extends source separation to realistic settings where signals undergo production effects (equalization, compression, reverb) and real-world degradations, with the goal of recovering the original unprocessed sources. Existing benchmarks cannot measure restoration fidelity: synthetic datasets use unprocessed stems but unrealistic mixtures, while real production datasets provide only already-processed stems without clean references. We present MSRBench, the first benchmark explicitly designed for MSR evaluation. MSRBench contains raw stem-mixture pairs across eight instrument classes, where mixtures are produced by professional mixing engineers. These raw-processed pairs enable direct evaluation of both separation accuracy and restoration fidelity. Beyond controlled studio conditions, the mixtures are augmented with twelve real-world degradations spanning analog artifacts, acoustic environments, and lossy codecs. Baseline experiments with U-Net and BSRNN achieve SI-SNR of -37.8 dB and -23.4 dB respectively, with perceptual quality (FAD CLAP) around 0.7-0.8, demonstrating substantial room for improvement and the need for restoration-specific architectures.

Problem

Research questions and friction points this paper is trying to address.

Evaluating music source restoration with realistic degradations

Addressing lack of clean references in existing datasets

Measuring both separation accuracy and restoration fidelity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Professional mixing engineers create raw-processed stem pairs

Augment mixtures with twelve real-world degradation types

U-Net and BSRNN baselines demonstrate restoration performance gaps

🔎 Similar Papers

Are we there yet? A brief survey of Music Emotion Prediction Datasets, Models and Outstanding Challenges