OMAR-RQ: Open Music Audio Representation Model Trained with Multi-Feature Masked Token Prediction

📅 2025-07-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the lack of general-purpose, open-source foundation models for music audio understanding, this paper introduces the Multi-Feature Masked Token Prediction (MF-MTP) framework, which unifies modeling of multidimensional musical representations—including spectrograms, pitch, and rhythm. The model is pretrained via self-supervised masked classification on 330,000 hours of high-quality music audio, incorporating multi-granularity feature encoding and vector quantization. Evaluated across six core tasks—music tagging, pitch estimation, chord recognition, beat tracking, structural segmentation, and performance difficulty assessment—it consistently outperforms existing open-source self-supervised models. All model weights, training code, and evaluation protocols are fully open-sourced, establishing a reproducible, extensible benchmark foundation model for music information retrieval and understanding research.

Technology Category

Application Category

📝 Abstract
Developing open-source foundation models is essential for advancing research in music audio understanding and ensuring access to powerful, multipurpose representations for music information retrieval. We present OMAR-RQ, a model trained with self-supervision via masked token classification methodologies using a large-scale dataset with over 330,000 hours of music audio. We experiment with different input features and quantization options, and achieve state-of-the-art performance in music tagging, pitch estimation, chord recognition, beat tracking, segmentation, and difficulty estimation among open self-supervised models. We open-source our training and evaluation pipelines and model weights, available at https://github.com/mtg/omar-rq.
Problem

Research questions and friction points this paper is trying to address.

Develop open-source music audio representation model
Improve performance in multiple music information tasks
Provide large-scale self-supervised training methodology
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised masked token classification training
Multi-feature input and quantization experiments
Open-source model with training pipelines
🔎 Similar Papers
No similar papers found.