Filling MIDI Velocity using U-Net Image Colorizer

📅 2025-08-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
MIDI files often lack note velocity information, resulting in diminished dynamic expressivity and reduced perceptual naturalness of synthesized performances. To address this, we propose an image-based modeling approach for MIDI velocity prediction: MIDI sequences are encoded as sparse 2D spectrogram-like images; a U-Net architecture—novel in this domain—is employed for pixel-wise velocity estimation; a windowed attention mechanism is introduced to capture local rhythmic–dynamic correlations; and a custom loss function tailored to the sparsity and structural characteristics of MIDI images is designed. Experiments on the MAESTRO v3 and SMD datasets demonstrate that our method significantly outperforms existing baselines across quantitative metrics—including MSE and Pearson correlation—and in subjective listening evaluations. The predicted velocities effectively restore expressive performance dynamics, markedly enhancing the naturalness and musical expressivity of generated MIDI renditions.

Technology Category

Application Category

📝 Abstract
Modern music producers commonly use MIDI (Musical Instrument Digital Interface) to store their musical compositions. However, MIDI files created with digital software may lack the expressive characteristics of human performances, essentially leaving the velocity parameter - a control for note loudness - undefined, which defaults to a flat value. The task of filling MIDI velocity is termed MIDI velocity prediction, which uses regression models to enhance music expressiveness by adjusting only this parameter. In this paper, we introduce the U-Net, a widely adopted architecture in image colorization, to this task. By conceptualizing MIDI data as images, we adopt window attention and develop a custom loss function to address the sparsity of MIDI-converted images. Current dataset availability restricts our experiments to piano data. Evaluated on the MAESTRO v3 and SMD datasets, our proposed method for filling MIDI velocity outperforms previous approaches in both quantitative metrics and qualitative listening tests.
Problem

Research questions and friction points this paper is trying to address.

Predicting MIDI velocity to enhance music expressiveness
Using U-Net for sparse MIDI-converted image data
Improving piano performance data in MAESTRO and SMD datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

U-Net image colorizer for MIDI velocity
Window attention for sparse MIDI images
Custom loss function enhances expressiveness
🔎 Similar Papers
No similar papers found.