Exploring System Adaptations For Minimum Latency Real-Time Piano Transcription

📅 2025-09-09

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

Existing online piano transcription methods exhibit latencies of 128–320 ms—far exceeding the <30 ms threshold required for real-time musical interaction. This work presents the first systematic adaptation of state-of-the-art online transcription models to ultra-low-latency regimes. We propose a causally constrained, lightweight architecture that eliminates all non-causal operations: it employs a shared-parameter causal convolutional backbone, efficient real-time preprocessing, and compact label encoding, while explicitly optimizing the inference latency–accuracy trade-off. Evaluated on the MAESTRO dataset, our system achieves end-to-end latency below 30 ms with competitive transcription accuracy. We further quantify the intrinsic trade-off between preprocessing latency and transcription quality. To foster reproducibility and advancement, we release a fully open-source, benchmarked implementation. This work establishes a new paradigm and practical foundation for low-latency, interactive music applications.

Technology Category

Application Category

📝 Abstract

Advances in neural network design and the availability of large-scale labeled datasets have driven major improvements in piano transcription. Existing approaches target either offline applications, with no restrictions on computational demands, or online transcription, with delays of 128-320 ms. However, most real-time musical applications require latencies below 30 ms. In this work, we investigate whether and how the current state-of-the-art online transcription model can be adapted for real-time piano transcription. Specifically, we eliminate all non-causal processing, and reduce computational load through shared computations across core model components and variations in model size. Additionally, we explore different pre- and postprocessing strategies, and related label encoding schemes, and discuss their suitability for real-time transcription. Evaluating the adaptions on the MAESTRO dataset, we find a drop in transcription accuracy due to strictly causal processing as well as a tradeoff between the preprocessing latency and prediction accuracy. We release our system as a baseline to support researchers in designing models towards minimum latency real-time transcription.

Problem

Research questions and friction points this paper is trying to address.

Achieving real-time piano transcription below 30ms latency

Adapting existing models for strictly causal processing requirements

Balancing computational efficiency with transcription accuracy tradeoffs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Eliminate non-causal processing for real-time

Reduce computation via shared model components

Explore preprocessing strategies for low latency

🔎 Similar Papers

No similar papers found.