Joint Estimation of Piano Dynamics and Metrical Structure with a Multi-task Multi-Scale Network

📅 2025-10-20

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Estimating dynamic levels in piano performance remains a fundamental challenge in computational music analysis. This paper introduces a novel multi-task, multi-scale neural network that achieves, for the first time, end-to-end joint modeling of four interrelated musical attributes directly from audio: dynamic level (e.g., *p*, *f*), dynamic change points, beat positions, and downbeats. Leveraging Bark-scale loudness features and a shared latent representation architecture, the model significantly reduces parameter count while preserving temporal modeling capacity; it supports 60-second audio sequences, balancing expressiveness and computational efficiency. Evaluated on the MazurkaBL dataset, our approach achieves state-of-the-art performance across all four tasks, compressing model parameters from 14.7M to just 0.5M. This work establishes the first compact, efficient, and structurally coherent benchmark model for piano expressivity analysis.

Technology Category

Application Category

📝 Abstract

Estimating piano dynamic from audio recordings is a fundamental challenge in computational music analysis. In this paper, we propose an efficient multi-task network that jointly predicts dynamic levels, change points, beats, and downbeats from a shared latent representation. These four targets form the metrical structure of dynamics in the music score. Inspired by recent vocal dynamic research, we use a multi-scale network as the backbone, which takes Bark-scale specific loudness as the input feature. Compared to log-Mel as input, this reduces model size from 14.7 M to 0.5 M, enabling long sequential input. We use a 60-second audio length in audio segmentation, which doubled the length of beat tracking commonly used. Evaluated on the public MazurkaBL dataset, our model achieves state-of-the-art results across all tasks. This work sets a new benchmark for piano dynamic estimation and delivers a powerful and compact tool, paving the way for large-scale, resource-efficient analysis of musical expression.

Problem

Research questions and friction points this paper is trying to address.

Joint estimation of piano dynamics and metrical structure from audio

Developing compact multi-task network for dynamic levels and beat tracking

Enabling long-sequence analysis with efficient Bark-scale loudness features

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-task network predicts dynamics and metrical structure

Multi-scale backbone uses Bark-scale loudness input

Compact model enables 60-second audio analysis

🔎 Similar Papers

Can Audio Reveal Music Performance Difficulty? Insights From the Piano Syllabus Dataset