🤖 AI Summary
This work addresses the joint challenges of source interference and signal degradation—such as compression artifacts and reverberation—in mixed and mastered music recordings. To this end, we propose a novel generative adversarial network (GAN) architecture that, for the first time, integrates Rotary Position Embedding (RoPE) Transformers into the music source restoration task, combined with a lightweight dual-path band-split RNN to simultaneously capture long-range temporal dependencies and enable multi-resolution spectral reconstruction. With only 7.1 million parameters, our method achieved third place in objective metrics and fourth place in subjective evaluation at the ICASSP 2026 Music Source Restoration (MSR) Challenge, demonstrating a strong balance among generation fidelity, semantic consistency, and model efficiency.
📝 Abstract
Music source restoration (MSR) aims to recover unprocessed stems from mixed and mastered recordings. The challenge lies in both separating overlapping sources and reconstructing signals degraded by production effects such as compression and reverberation. We therefore propose DTT-BSR, a hybrid generative adversarial network (GAN) combining rotary positional embeddings (RoPE) transformer for long-term temporal modeling with dual-path band-split recurrent neural network (RNN) for multi-resolution spectral processing. Our model achieved 3rd place on the objective leaderboard and 4th place on the subjective leaderboard on the ICASSP 2026 MSR Challenge, demonstrating exceptional generation fidelity and semantic alignment with a compact size of 7.1M parameters.