🤖 AI Summary
This work addresses automatic drum transcription by proposing a momentum-accelerated non-negative matrix factorization (NMF) method. To improve drum sound separation and recognition accuracy, we formulate a partially fixed NMF model on the magnitude spectrogram of music signals and replace conventional multiplicative update rules with a momentum-based projected gradient descent algorithm—preserving non-negativity while enhancing convergence stability and speed. Theoretical analysis establishes stronger convergence guarantees for the proposed optimizer. Experiments on the ENST-Drums dataset and real-world band recordings demonstrate that, within equal runtime, our method achieves significantly higher F1 scores and transcription accuracy than baseline approaches, particularly exhibiting superior robustness in complex, highly mixed audio scenarios. The core contribution lies in the systematic integration of momentum optimization into spectrogram-based NMF transcription frameworks, achieving a principled balance among computational efficiency, transcription accuracy, and theoretical rigor.
📝 Abstract
A simple, interpretable way to perform automatic drum transcription is by factoring the magnitude spectrogram of a recorded musical piece using a partially fixed nonnegative matrix factorization. There are two natural ways to optimize the nonnegative matrix factorization, including a multiplicative update rule and projected gradient descent with momentum. The methods differ in their empirical accuracies and theoretical convergence guarantees. This paper summarizes the methods and their time complexities, and it applies the methods to the ENST-Drums data set and an original recording from the author's band, evaluating the empirical accuracy with respect to ground-truth drum annotations. The results indicate that projected gradient descent with momentum leads to higher accuracy for a fixed runtime, and it satisfies stronger convergence guarantees.