Drum Synthesis from Expressive Drum Grids via Neural Audio Codecs

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

246K/year

🤖 AI Summary

This study addresses the challenge of generating high-fidelity, expressive drum audio from symbolic drum notation that includes fine-grained timing and velocity information. The work proposes an end-to-end approach that, for the first time, integrates discrete token prediction from neural audio codecs—such as EnCodec, DAC, and X-Codec—into the task of drum grid-to-audio synthesis. Specifically, a Transformer model maps expressive MIDI representations to sequences of codec tokens, which are then decoded into waveforms using a pretrained neural audio decoder. Experiments on the E-GMD dataset demonstrate that the proposed method significantly outperforms baseline systems in both audio fidelity and musical alignment, thereby validating discrete codec token prediction as a viable and effective paradigm for expressive drum audio synthesis.

📝 Abstract

Generating realistic drum audio directly from symbolic representations is a challenging task at the intersection of music perception and machine learning. We propose a system that transforms an expressive drum grid, a time-aligned MIDI representation with microtiming and velocity information, into drum audio by predicting discrete codes of a neural audio codec. Our approach uses a Transformer-based model to map the drum grid input to a sequence of codec tokens, which are then converted to waveform audio via a pre-trained codec decoder. We experiment with multiple state-of-the-art neural codecs, namely EnCodec, DAC, and X-Codec, to assess how the choice of audio representation impacts the quality of the generated drums. The system is trained and evaluated on the Expanded Groove MIDI Dataset, E-GMD, a large collection of human drum performances with paired MIDI and audio. We evaluate the fidelity and musical alignment of the generated audio using objective metrics. Overall, our results establish codec-token prediction as an effective route for drum grid-to-audio generation and provide practical insights into selecting audio tokenizers for percussive synthesis.

Problem

Research questions and friction points this paper is trying to address.

Drum Synthesis

Expressive Drum Grids

Neural Audio Codecs

Symbolic-to-Audio Generation

Percussive Synthesis

Innovation

Methods, ideas, or system contributions that make the work stand out.

neural audio codecs

drum synthesis

expressive drum grids