Evaluating High-Resolution Piano Sustain Pedal Depth Estimation with Musically Informed Metrics

📅 2025-10-04

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Traditional frame-level metrics fail to capture musically salient features—such as direction-change boundaries and contour morphology—in piano sustain pedal depth estimation, resulting in evaluations lacking semantic interpretability. To address this, we propose the first multi-granularity evaluation framework integrating action-level (direction-switch point detection) and gesture-level (contour alignment and shape similarity) assessments, thereby overcoming the limitations of frame-wise accuracy alone. Methodologically, we introduce segmented state detection coupled with dynamic time warping–guided contour alignment, enabling unified comparison across audio-only baselines, MIDI-augmented models, and their binarized variants. Experiments demonstrate that the MIDI-augmented model achieves significant gains at both action and gesture levels, while exhibiting only marginal improvement in frame-level accuracy—validating that our framework is semantically sensitive, highly interpretable, and capable of revealing substantive performance improvements overlooked by conventional metrics.

Technology Category

Application Category

📝 Abstract

Evaluation for continuous piano pedal depth estimation tasks remains incomplete when relying only on conventional frame-level metrics, which overlook musically important features such as direction-change boundaries and pedal curve contours. To provide more interpretable and musically meaningful insights, we propose an evaluation framework that augments standard frame-level metrics with an action-level assessment measuring direction and timing using segments of press/hold/release states and a gesture-level analysis that evaluates contour similarity of each press-release cycle. We apply this framework to compare an audio-only baseline with two variants: one incorporating symbolic information from MIDI, and another trained in a binary-valued setting, all within a unified architecture. Results show that the MIDI-informed model significantly outperforms the others at action and gesture levels, despite modest frame-level gains. These findings demonstrate that our framework captures musically relevant improvements indiscernible by traditional metrics, offering a more practical and effective approach to evaluating pedal depth estimation models.

Problem

Research questions and friction points this paper is trying to address.

Evaluating piano pedal depth estimation using musically informed metrics

Augmenting frame-level metrics with action and gesture analysis

Comparing audio-only and MIDI-informed models for pedal estimation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Augments standard metrics with action-level assessment

Incorporates gesture-level contour similarity analysis

Uses MIDI-informed model for improved performance

🔎 Similar Papers

Are we there yet? A brief survey of Music Emotion Prediction Datasets, Models and Outstanding Challenges