Smule Renaissance Small: Efficient General-Purpose Vocal Restoration

📅 2025-10-24

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Consumer-device voice recordings frequently suffer from multiple concurrent degradations, including noise, reverberation, bandwidth limitation, and clipping. This paper proposes an efficient end-to-end speech restoration method that performs single-stage joint modeling in the complex-valued STFT domain, incorporates a phase-aware loss, and supports large analysis windows to enhance frequency resolution. Its lightweight neural architecture achieves 10.5× real-time inference on an iPhone 12 CPU with sub-10 ms latency. Key contributions include: (1) EDB—the first open-source benchmark dataset targeting extreme degradation scenarios; (2) state-of-the-art performance on the DNS 5 blind test set, surpassing strong GAN-based baselines and approaching flow-matching methods; and (3) significant improvements over all open-source models on EDB, matching the quality of commercial systems.

Technology Category

Application Category

📝 Abstract

Vocal recordings on consumer devices commonly suffer from multiple concurrent degradations: noise, reverberation, band-limiting, and clipping. We present Smule Renaissance Small (SRS), a compact single-stage model that performs end-to-end vocal restoration directly in the complex STFT domain. By incorporating phase-aware losses, SRS enables large analysis windows for improved frequency resolution while achieving 10.5x real-time inference on iPhone 12 CPU at 48 kHz. On the DNS 5 Challenge blind set, despite no speech training, SRS outperforms a strong GAN baseline and closely matches a computationally expensive flow-matching system. To enable evaluation under realistic multi-degradation scenarios, we introduce the Extreme Degradation Bench (EDB): 87 singing and speech recordings captured under severe acoustic conditions. On EDB, SRS surpasses all open-source baselines on singing and matches commercial systems, while remaining competitive on speech despite no speech-specific training. We release both SRS and EDB under the MIT License.

Problem

Research questions and friction points this paper is trying to address.

Restoring vocal recordings with multiple concurrent degradations

Developing efficient single-stage model for vocal restoration

Enhancing performance under realistic multi-degradation scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compact single-stage model for vocal restoration

Incorporates phase-aware losses in complex STFT domain

Achieves real-time inference on mobile CPU

🔎 Similar Papers

High-Resolution Speech Restoration with Latent Diffusion Model