DTT-BSR: GAN-based DTTNet with RoPE Transformer Enhancement for Music Source Restoration

📅 2026-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the joint challenges of source interference and signal degradation—such as compression artifacts and reverberation—in mixed and mastered music recordings. To this end, we propose a novel generative adversarial network (GAN) architecture that, for the first time, integrates Rotary Position Embedding (RoPE) Transformers into the music source restoration task, combined with a lightweight dual-path band-split RNN to simultaneously capture long-range temporal dependencies and enable multi-resolution spectral reconstruction. With only 7.1 million parameters, our method achieved third place in objective metrics and fourth place in subjective evaluation at the ICASSP 2026 Music Source Restoration (MSR) Challenge, demonstrating a strong balance among generation fidelity, semantic consistency, and model efficiency.

Technology Category

Application Category

📝 Abstract
Music source restoration (MSR) aims to recover unprocessed stems from mixed and mastered recordings. The challenge lies in both separating overlapping sources and reconstructing signals degraded by production effects such as compression and reverberation. We therefore propose DTT-BSR, a hybrid generative adversarial network (GAN) combining rotary positional embeddings (RoPE) transformer for long-term temporal modeling with dual-path band-split recurrent neural network (RNN) for multi-resolution spectral processing. Our model achieved 3rd place on the objective leaderboard and 4th place on the subjective leaderboard on the ICASSP 2026 MSR Challenge, demonstrating exceptional generation fidelity and semantic alignment with a compact size of 7.1M parameters.
Problem

Research questions and friction points this paper is trying to address.

Music Source Restoration
Source Separation
Signal Degradation
Audio Restoration
Mastered Recordings
Innovation

Methods, ideas, or system contributions that make the work stand out.

RoPE Transformer
Band-Split RNN
Music Source Restoration
Generative Adversarial Network
Multi-resolution Spectral Processing
🔎 Similar Papers
No similar papers found.
S
Shihong Tan
School of Electronic Information, Wuhan University, China
Haoyu Wang
Haoyu Wang
Renmin University of China
Responsible (Interpretable) Infomation Retrieval & Large Language Model
Y
Youran Ni
School of Electronic Information, Wuhan University, China
Y
Yingzhao Hou
School of Electronic Information, Wuhan University, China
J
Jiayue Luo
School of Electronic Information, Wuhan University, China
Z
Zipei Hu
School of Electronic Information, Wuhan University, China
H
Han Dou
School of Electronic Information, Wuhan University, China
Zerui Han
Zerui Han
Xiaomi Corporation
sound reproductionsound analysissound recording
Ningning Pan
Ningning Pan
Assistant Professor of Southwestern University of Finance and Economics
Speech Enhancementbinaural hearingdeep learning
Y
Yuzhu Wang
Tampere University, Finland
Gongping Huang
Gongping Huang
Professor, Wuhan University, Wuhan, China
Acoustic Signal ProcessingMicrophone ArraysSpeech EnhancementNoise Reduction