Keep the beat going: Automatic drum transcription with momentum

📅 2025-07-16

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This work addresses automatic drum transcription by proposing a momentum-accelerated non-negative matrix factorization (NMF) method. To improve drum sound separation and recognition accuracy, we formulate a partially fixed NMF model on the magnitude spectrogram of music signals and replace conventional multiplicative update rules with a momentum-based projected gradient descent algorithm—preserving non-negativity while enhancing convergence stability and speed. Theoretical analysis establishes stronger convergence guarantees for the proposed optimizer. Experiments on the ENST-Drums dataset and real-world band recordings demonstrate that, within equal runtime, our method achieves significantly higher F1 scores and transcription accuracy than baseline approaches, particularly exhibiting superior robustness in complex, highly mixed audio scenarios. The core contribution lies in the systematic integration of momentum optimization into spectrogram-based NMF transcription frameworks, achieving a principled balance among computational efficiency, transcription accuracy, and theoretical rigor.

Technology Category

Application Category

📝 Abstract

A simple, interpretable way to perform automatic drum transcription is by factoring the magnitude spectrogram of a recorded musical piece using a partially fixed nonnegative matrix factorization. There are two natural ways to optimize the nonnegative matrix factorization, including a multiplicative update rule and projected gradient descent with momentum. The methods differ in their empirical accuracies and theoretical convergence guarantees. This paper summarizes the methods and their time complexities, and it applies the methods to the ENST-Drums data set and an original recording from the author's band, evaluating the empirical accuracy with respect to ground-truth drum annotations. The results indicate that projected gradient descent with momentum leads to higher accuracy for a fixed runtime, and it satisfies stronger convergence guarantees.

Problem

Research questions and friction points this paper is trying to address.

Develops automatic drum transcription via spectrogram factorization

Compares optimization methods for nonnegative matrix factorization

Evaluates accuracy using ENST-Drums and original band recordings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Partially fixed nonnegative matrix factorization

Multiplicative update rule optimization

Projected gradient descent with momentum

🔎 Similar Papers

No similar papers found.