Music Boomerang: Reusing Diffusion Models for Data Augmentation and Audio Manipulation

📅 2025-07-07

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This work addresses the dual challenges of model training and content editing under audio data scarcity. To this end, we introduce Boomerang sampling—previously unexplored in audio generation—to leverage a pre-trained music diffusion model (Stable Audio Open) for rhythm-preserving, controllable audio synthesis. Methodologically, Boomerang sampling enforces temporal consistency in the latent space, thereby preserving the original beat structure while enabling text-guided, single-track instrument replacement. Experiments demonstrate that our approach significantly improves beat tracking performance in low-data regimes (+4.2% F1), and validates high-fidelity, fine-grained audio editing capabilities. The implementation is publicly released, establishing a novel paradigm for low-resource audio modeling and controllable generation.

Technology Category

Application Category

📝 Abstract

Generative models of music audio are typically used to generate output based solely on a text prompt or melody. Boomerang sampling, recently proposed for the image domain, allows generating output close to an existing example, using any pretrained diffusion model. In this work, we explore its application in the audio domain as a tool for data augmentation or content manipulation. Specifically, implementing Boomerang sampling for Stable Audio Open, we augment training data for a state-of-the-art beat tracker, and attempt to replace musical instruments in recordings. Our results show that the rhythmic structure of existing examples is mostly preserved, that it improves performance of the beat tracker, but only in scenarios of limited training data, and that it can accomplish text-based instrument replacement on monophonic inputs. We publish our implementation to invite experiments on data augmentation in other tasks and explore further applications.

Problem

Research questions and friction points this paper is trying to address.

Explore Boomerang sampling for audio data augmentation

Apply diffusion models to manipulate musical instruments

Evaluate rhythmic preservation in audio generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Boomerang sampling for audio data augmentation

Reusing pretrained diffusion models for audio

Text-based instrument replacement in monophonic audio

🔎 Similar Papers

No similar papers found.