Drum Synthesis from Expressive Drum Grids via Neural Audio Codecs

📅 2026-05-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

229K/year
🤖 AI Summary
This study addresses the challenge of generating high-fidelity, expressive drum audio from symbolic drum notation that includes fine-grained timing and velocity information. The work proposes an end-to-end approach that, for the first time, integrates discrete token prediction from neural audio codecs—such as EnCodec, DAC, and X-Codec—into the task of drum grid-to-audio synthesis. Specifically, a Transformer model maps expressive MIDI representations to sequences of codec tokens, which are then decoded into waveforms using a pretrained neural audio decoder. Experiments on the E-GMD dataset demonstrate that the proposed method significantly outperforms baseline systems in both audio fidelity and musical alignment, thereby validating discrete codec token prediction as a viable and effective paradigm for expressive drum audio synthesis.
📝 Abstract
Generating realistic drum audio directly from symbolic representations is a challenging task at the intersection of music perception and machine learning. We propose a system that transforms an expressive drum grid, a time-aligned MIDI representation with microtiming and velocity information, into drum audio by predicting discrete codes of a neural audio codec. Our approach uses a Transformer-based model to map the drum grid input to a sequence of codec tokens, which are then converted to waveform audio via a pre-trained codec decoder. We experiment with multiple state-of-the-art neural codecs, namely EnCodec, DAC, and X-Codec, to assess how the choice of audio representation impacts the quality of the generated drums. The system is trained and evaluated on the Expanded Groove MIDI Dataset, E-GMD, a large collection of human drum performances with paired MIDI and audio. We evaluate the fidelity and musical alignment of the generated audio using objective metrics. Overall, our results establish codec-token prediction as an effective route for drum grid-to-audio generation and provide practical insights into selecting audio tokenizers for percussive synthesis.
Problem

Research questions and friction points this paper is trying to address.

Drum Synthesis
Expressive Drum Grids
Neural Audio Codecs
Symbolic-to-Audio Generation
Percussive Synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

neural audio codecs
drum synthesis
expressive drum grids
Transformer-based modeling
audio tokenization
🔎 Similar Papers
No similar papers found.
K
Konstantinos Soiledis
Dept. of Music Technology and Acoustics, Hellenic Mediterranean University, Rethymno & Athens, Greece; Athena RC, Athens, Greece
Maximos Kaliakatsos-Papakostas
Maximos Kaliakatsos-Papakostas
Hellenic Mediterranean University
AI in Music
Dimos Makris
Dimos Makris
Machine Learning Researcher at Music Tribe, Sweden
Music Information RetrievalAutomated Music GenerationMachine LearningAutomated Mixing
K
Konstantinos Tsamis
Dept. of Music Technology and Acoustics, Hellenic Mediterranean University, Rethymno & Athens, Greece; Athena RC, Athens, Greece