🤖 AI Summary
This work addresses the challenges of multi-instrument separation, denoising, and dereverberation in music source restoration (MSR) by proposing a cascaded BS-RoFormer architecture that sequentially processes these three tasks, optimized specifically for eight instrument classes. The approach innovatively integrates community pre-trained models with diverse fine-tuning strategies, incorporating techniques such as data cleaning and mixing, random musical segment augmentation, and audio length extension. Evaluated in the MSR Challenge 2025, the method achieves state-of-the-art performance, ranking first across six primary objective and subjective metrics, with an MMSNR of 4.4623 and a remarkably low FAD of 0.1988, significantly outperforming existing approaches.
📝 Abstract
This report describes the system submitted to the music source restoration (MSR) Challenge 2025. Our approach is composed of sequential BS-RoFormers, each dealing with a single task including music source separation (MSS), denoise and dereverb. To support 8 instruments given in the task, we utilize pretrained checkpoints from MSS community and finetune the MSS model with several training schemes, including (1) mixing and cleaning of datasets; (2) random mixture of music pieces for data augmentation; (3) scale-up of audio length. Our system achieved the first rank in all three subjective and three objective evaluation metrics, including an MMSNR score of 4.4623 and an FAD score of 0.1988. We have open-sourced all the code and checkpoints at https://github.com/ModistAndrew/xlance-msr.