S-KEY: Self-supervised Learning of Major and Minor Keys from Audio

📅 2025-01-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing self-supervised music representation methods (e.g., STONE) capture absolute pitch but struggle to distinguish relative tonality—such as major vs. minor keys—due to insufficient tonal invariance. Method: We propose the first tonality-aware self-supervised learning framework, introducing translation-invariant chroma features as an auxiliary pretraining task. Our approach integrates chroma-based pseudo-labeling, contrastive learning, and transposition-invariant representation learning within an extended STONE architecture to enable autonomous major/minor key discrimination. Crucially, it requires no human annotations and scales to million-track corpora. Contribution/Results: On FMA-Kv2 and GTZAN, our method achieves zero-shot performance on par with fully supervised state-of-the-art models—using comparable parameter counts—without any labeled data. This is the first work to empirically validate the feasibility and effectiveness of large-scale self-supervised tonality learning.

Technology Category

Application Category

📝 Abstract
STONE, the current method in self-supervised learning for tonality estimation in music signals, cannot distinguish relative keys, such as C major versus A minor. In this article, we extend the neural network architecture and learning objective of STONE to perform self-supervised learning of major and minor keys (S-KEY). Our main contribution is an auxiliary pretext task to STONE, formulated using transposition-invariant chroma features as a source of pseudo-labels. S-KEY matches the supervised state of the art in tonality estimation on FMAKv2 and GTZAN datasets while requiring no human annotation and having the same parameter budget as STONE. We build upon this result and expand the training set of S-KEY to a million songs, thus showing the potential of large-scale self-supervised learning in music information retrieval.
Problem

Research questions and friction points this paper is trying to address.

Self-supervised Learning
Music Scale Recognition
Major and Minor Identification
Innovation

Methods, ideas, or system contributions that make the work stand out.

S-KEY
Self-supervised Learning
Chromatic Features
🔎 Similar Papers
No similar papers found.