🤖 AI Summary
This work proposes a two-stage music source restoration method to address the nonlinear mixing artifacts present in full-mix audio, which arise from production effects and distribution-induced distortions. The approach first leverages a BandSplit-RoFormer architecture, enhanced with a three-phase curriculum learning strategy, LoRA-based fine-tuning, and head expansion, to efficiently separate eight instrument stems. Subsequently, it employs both universal and instrument-specific HiFi++ GAN-based waveform restorers to jointly optimize audio quality and fidelity. By moving beyond the conventional linear mixture assumption, the proposed method achieves significant improvements in stem separation quality and audio reconstruction performance on the ICASSP MSR Challenge 2025 benchmark.
📝 Abstract
Music Source Restoration (MSR) targets recovery of original, unprocessed instrument stems from fully mixed and mastered audio, where production effects and distribution artifacts violate common linear-mixture assumptions. This technical report presents the CP-JKU team's system for the MSR ICASSP Challenge 2025. Our approach decomposes MSR into separation and restoration. First, a single BandSplit-RoFormer separator predicts eight stems plus an auxiliary other stem, and is trained with a three-stage curriculum that progresses from 4-stem warm-start fine-tuning (with LoRA) to 8-stem extension via head expansion. Second, we apply a HiFi++ GAN waveform restorer trained as a generalist and then specialized into eight instrument-specific experts.