MAJL: A Model-Agnostic Joint Learning Framework for Music Source Separation and Pitch Estimation

📅 2024-10-28
🏛️ ACM Multimedia
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the dual challenges of label scarcity and optimization difficulty in joint modeling of music source separation and pitch estimation, this paper proposes the first model-agnostic collaborative learning framework. Methodologically, it adopts a two-stage training strategy, incorporates a dynamic weighting mechanism for hard samples (DWHS), and enables end-to-end co-optimization of separation and pitch modules via multi-task loss coupling. The framework is architecture-agnostic—supporting arbitrary combinations of separation and pitch estimation models—thereby ensuring strong generalizability and plug-and-play compatibility. Evaluated on standard public benchmarks, it achieves a 0.92 dB improvement in signal-to-distortion ratio (SDR) and a 2.71% gain in raw pitch accuracy (RPA), significantly outperforming state-of-the-art methods.

Technology Category

Application Category

📝 Abstract
Music source separation and pitch estimation are two vital tasks in music information retrieval. Typically, the input of pitch estimation is obtained from the output of music source separation. Therefore, existing methods have tried to perform these two tasks simultaneously, so as to leverage the mutually beneficial relationship between both tasks. However, these methods still face two critical challenges that limit the improvement of both tasks: the lack of labeled data and joint learning optimization. To address these challenges, we propose a Model-Agnostic Joint Learning (MAJL) framework for both tasks. MAJL is a generic framework and can use variant models for each task. It includes a two-stage training method and a dynamic weighting method named Dynamic Weights on Hard Samples (DWHS), which addresses the lack of labeled data and joint learning optimization, respectively. Experimental results on public music datasets show that MAJL outperforms state-of-the-art methods on both tasks, with significant improvements of 0.92 in Signal-to-Distortion Ratio (SDR) for music source separation and 2.71% in Raw Pitch Accuracy (RPA) for pitch estimation. Furthermore, comprehensive studies not only validate the effectiveness of each component of MAJL, but also indicate the great generality of MAJL in adapting to different model architectures.
Problem

Research questions and friction points this paper is trying to address.

Music Source Separation
Pitch Estimation
Data Annotation
Innovation

Methods, ideas, or system contributions that make the work stand out.

MAJL Framework
Dynamic Weighted Handling Strategy (DWHS)
Joint Optimization of Music Source Separation and Pitch Estimation
🔎 Similar Papers
No similar papers found.
Haojie Wei
Haojie Wei
Ph.D. @ Renmin University of China
Music Understanding
J
Jun Yuan
Huawei Noah’s Ark Lab, Shenzhen, China
R
Rui Zhang
School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
Q
Quanyu Dai
Huawei Noah’s Ark Lab, Shenzhen, China
Y
Yueguo Chen
School of Information, Renmin University of China, Beijing, China