🤖 AI Summary
This work addresses the distortion of singing vocals caused by artificial reverb plugins—commonly employed in mainstream digital audio workstations (DAWs). To tackle the lack of high-quality training data for singing voice dereverberation under artificial reverb conditions, we introduce the first synthetic RIR dataset specifically designed for singing voice dereverberation, generated via automated, plugin-based RIR acquisition—a paradigm shift from conventional reliance on real-room measurements. Methodologically, we propose an end-to-end dereverberation framework integrating diffusion models and generative adversarial networks (GANs). Experiments demonstrate that models trained on our plugin-generated RIRs significantly outperform those trained on real-room RIRs on artificially reverberant singing speech: PSNR improves by 2.1 dB and STOI by 3.4%. This work bridges two critical gaps—namely, the absence of large-scale, realistic artificial-reverb training data and dedicated modeling approaches for singing voice dereverberation.
📝 Abstract
We present ReverbFX, a new room impulse response (RIR) dataset designed for singing voice dereverberation research. Unlike existing datasets based on real recorded RIRs, ReverbFX features a diverse collection of RIRs captured from various reverb audio effect plugins commonly used in music production. We conduct comprehensive experiments using the proposed dataset to benchmark the challenge of dereverberation of singing voice recordings affected by artificial reverbs. We train two state-of-the-art generative models using ReverbFX and demonstrate that models trained with plugin-derived RIRs outperform those trained on realistic RIRs in artificial reverb scenarios.