RUMAA: Repeat-Aware Unified Music Audio Analysis for Score-Performance Alignment, Transcription, and Mistake Detection

📅 2025-07-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Aligning, transcribing, and detecting pitch errors in musical scores containing repeat symbols remains challenging due to the difficulty of unifying these tasks within a single model—especially when relying on manually expanded MIDI or predefined repeat structures. Method: We propose the first end-to-end joint framework that directly consumes raw MusicXML scores and performance audio, bypassing manual expansion. Our approach employs a Transformer-based three-stream decoder architecture, integrating pretrained score and audio encoders, and introduces auxiliary tasks to explicitly model inter-task dependencies. Contribution/Results: Experiments demonstrate state-of-the-art alignment accuracy on repeat-free scores and substantial improvements over existing methods on real-world scores with complex repeats. Moreover, our framework achieves competitive performance on both transcription and pitch error detection, confirming its effectiveness as a unified solution for score-performance analysis.

Technology Category

Application Category

📝 Abstract
This study introduces RUMAA, a transformer-based framework for music performance analysis that unifies score-to-performance alignment, score-informed transcription, and mistake detection in a near end-to-end manner. Unlike prior methods addressing these tasks separately, RUMAA integrates them using pre-trained score and audio encoders and a novel tri-stream decoder capturing task interdependencies through proxy tasks. It aligns human-readable MusicXML scores with repeat symbols to full-length performance audio, overcoming traditional MIDI-based methods that rely on manually unfolded score-MIDI data with pre-specified repeat structures. RUMAA matches state-of-the-art alignment methods on non-repeated scores and outperforms them on scores with repeats in a public piano music dataset, while also delivering promising transcription and mistake detection results.
Problem

Research questions and friction points this paper is trying to address.

Unifies score-performance alignment, transcription, and mistake detection
Aligns MusicXML scores with repeat symbols to audio
Improves performance on scores with repeats compared to traditional methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based unified music analysis framework
Tri-stream decoder for task interdependencies
Aligns MusicXML scores with repeat symbols
🔎 Similar Papers
No similar papers found.