🤖 AI Summary
Existing drum audio generation methods struggle to simultaneously achieve high fidelity and fine-grained control over rhythm and timbre, and lack symbolic-to-audio synthesis frameworks tailored for polyphonic percussion. This work proposes a novel approach based on fine-tuning a pre-trained text-to-audio model, incorporating a content encoder and a hybrid conditioning mechanism to enable, for the first time, high-resolution drum MIDI and reference audio jointly guided generation of controllable drum sounds. To support this, we construct a paired target-reference drum audio dataset. Experimental results demonstrate that the generated outputs excel in audio quality, rhythmic alignment, and beat coherence, effectively addressing the research gap in symbolic-to-audio synthesis for polyphonic percussion and offering music producers an efficient and controllable creative tool.
📝 Abstract
Current methods for creating drum loop audio in digital music production, such as using one-shot samples or resampling, often demand non-trivial efforts of creators. While recent generative models achieve high fidelity and adhere to text, they lack the specific control needed for such a task. Existing symbolic-to-audio research often focuses on single, tonal instruments, leaving the challenge of polyphonic, percussive drum synthesis unaddressed. We address this gap by introducing ``Break-the-Beat!,'' a model capable of rendering a drum MIDI with the timbre of a reference audio. It is built by fine-tuning a pre-trained text-to-audio model with our proposed content encoder and a effective hybrid conditioning mechanism. To enable this, we construct a new dataset of paired target-reference drum audio from existing drum audio datasets. Experiments demonstrate that our model generates high-quality drum audio that follows high-resolution drum MIDI, achieving strong performance across metrics of audio quality, rhythmic alignment, and beat continuity. This offer producers a new, controllable tool for creative production. Demo page: https://ik4sumii.github.io/break-the-beat/