π€ AI Summary
Text-to-audio diffusion models suffer from training data memorization, posing privacy and copyright risks. To address this, we propose Anti-Memorization Guidance (AMG), a novel inference-time technique that dynamically modulates pre-trained Stable Audio Open models without altering training procedures or model architecture. We design and systematically evaluate three AMG mechanisms that suppress generation paths in the latent space highly similar to training samples, thereby reducing data reproduction while preserving text-audio semantic alignment and audio fidelity. Experiments demonstrate that AMG reduces data copying rates by an average of 62.3% across multiple metrics, with no statistically significant degradation in generation qualityβas measured by CLAP Score and Mean Opinion Score (MOS). To our knowledge, this is the first method to achieve simultaneous privacy preservation and high-fidelity performance in diffusion-based audio generation.
π Abstract
A persistent challenge in generative audio models is data replication, where the model unintentionally generates parts of its training data during inference. In this work, we address this issue in text-to-audio diffusion models by exploring the use of anti-memorization strategies. We adopt Anti-Memorization Guidance (AMG), a technique that modifies the sampling process of pre-trained diffusion models to discourage memorization. Our study explores three types of guidance within AMG, each designed to reduce replication while preserving generation quality. We use Stable Audio Open as our backbone, leveraging its fully open-source architecture and training dataset. Our comprehensive experimental analysis suggests that AMG significantly mitigates memorization in diffusion-based text-to-audio generation without compromising audio fidelity or semantic alignment.