The SJTU X-LANCE Lab System for MSR Challenge 2025

📅 2026-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of multi-instrument separation, denoising, and dereverberation in music source restoration (MSR) by proposing a cascaded BS-RoFormer architecture that sequentially processes these three tasks, optimized specifically for eight instrument classes. The approach innovatively integrates community pre-trained models with diverse fine-tuning strategies, incorporating techniques such as data cleaning and mixing, random musical segment augmentation, and audio length extension. Evaluated in the MSR Challenge 2025, the method achieves state-of-the-art performance, ranking first across six primary objective and subjective metrics, with an MMSNR of 4.4623 and a remarkably low FAD of 0.1988, significantly outperforming existing approaches.

Technology Category

Application Category

📝 Abstract
This report describes the system submitted to the music source restoration (MSR) Challenge 2025. Our approach is composed of sequential BS-RoFormers, each dealing with a single task including music source separation (MSS), denoise and dereverb. To support 8 instruments given in the task, we utilize pretrained checkpoints from MSS community and finetune the MSS model with several training schemes, including (1) mixing and cleaning of datasets; (2) random mixture of music pieces for data augmentation; (3) scale-up of audio length. Our system achieved the first rank in all three subjective and three objective evaluation metrics, including an MMSNR score of 4.4623 and an FAD score of 0.1988. We have open-sourced all the code and checkpoints at https://github.com/ModistAndrew/xlance-msr.
Problem

Research questions and friction points this paper is trying to address.

music source restoration
music source separation
denoise
dereverb
multi-instrument separation
Innovation

Methods, ideas, or system contributions that make the work stand out.

BS-RoFormer
music source restoration
source separation
data augmentation
multi-instrument processing
🔎 Similar Papers
No similar papers found.
J
Jinxuan Zhu
X-LANCE Lab, School of Computer Science, MoE Key Lab of Artificial Intelligence, Jiangsu Key Lab of Language Computing, Shanghai Jiao Tong University
H
Hao Qiu
X-LANCE Lab, School of Computer Science, MoE Key Lab of Artificial Intelligence, Jiangsu Key Lab of Language Computing, Shanghai Jiao Tong University
Haina Zhu
Haina Zhu
Shanghai Jiao Tong University
Music GenerationSelf-Supervised LearningDeep Reinforcement Learning
Jianwei Yu
Jianwei Yu
Tencent AI lab
ASR
K
Kai Yu
X-LANCE Lab, School of Computer Science, MoE Key Lab of Artificial Intelligence, Jiangsu Key Lab of Language Computing, Shanghai Jiao Tong University
Xie Chen
Xie Chen
Shanghai Jiao Tong University <- Microsoft <- Cambridge University
Machine LearningSpeech RecognitionSpeech SynthesisSpeech&Audio Processing