Scaling Self-Supervised Representation Learning for Symbolic Piano Performance

๐Ÿ“… 2025-06-30
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the limited representation learning capability of symbolic piano music. We propose a two-stage self-supervised learning framework: (1) pretraining an autoregressive Transformer on large-scale solo MIDI data; and (2) introducing SimCLR-style contrastive learning, specifically adapted to symbolic music, to refine universal MIDI embeddings. The method significantly improves structural coherence in piano music continuation and achieves state-of-the-art (SOTA) performance on multiple MIR classification tasks using only linear probing. Moreover, it generalizes effectively to downstream tasks with minimal labeled data. Our core contribution is the first systematic integration of contrastive learning into autoregressive modeling for symbolic musicโ€”enabling simultaneous gains in generation quality, classification accuracy, and embedding generality.

Technology Category

Application Category

๐Ÿ“ Abstract
We study the capabilities of generative autoregressive transformer models trained on large amounts of symbolic solo-piano transcriptions. After first pretraining on approximately 60,000 hours of music, we use a comparatively smaller, high-quality subset, to finetune models to produce musical continuations, perform symbolic classification tasks, and produce general-purpose contrastive MIDI embeddings by adapting the SimCLR framework to symbolic music. When evaluating piano continuation coherence, our generative model outperforms leading symbolic generation techniques and remains competitive with proprietary audio generation models. On MIR classification benchmarks, frozen representations from our contrastive model achieve state-of-the-art results in linear probe experiments, while direct finetuning demonstrates the generalizability of pretrained representations, often requiring only a few hundred labeled examples to specialize to downstream tasks.
Problem

Research questions and friction points this paper is trying to address.

Scaling self-supervised learning for symbolic piano performance
Generating coherent musical continuations from piano transcriptions
Producing general-purpose MIDI embeddings for classification tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative autoregressive transformer models for piano
SimCLR framework adapted for symbolic music embeddings
Pretraining and finetuning with large music datasets
๐Ÿ”Ž Similar Papers
No similar papers found.