AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement

📅 2025-01-26
📈 Citations: 0
Influential: 0
📄 PDF

career value

196K/year
🤖 AI Summary
This work addresses the fragmentation of audio enhancement tasks and poor model generalization by proposing AnyEnhance—the first unified end-to-end generative model for both speech and singing enhancement. Methodologically, it adopts a masked generative modeling framework, introduces a novel prompt-guided mechanism for native reference timbre injection (enabling zero-shot target speaker extraction), and incorporates a self-critique module for iterative quality refinement during generation. Without task-specific fine-tuning, AnyEnhance simultaneously performs denoising, dereverberation, clipping repair, super-resolution, and target speaker extraction. Experiments demonstrate consistent superiority over state-of-the-art methods across objective metrics—PESQ, STOI, SI-SNR—and subjective MOS scores, confirming strong cross-task generalization capability. The code and demonstration audio are publicly available.

Technology Category

Application Category

📝 Abstract
We introduce AnyEnhance, a unified generative model for voice enhancement that processes both speech and singing voices. Based on a masked generative model, AnyEnhance is capable of handling both speech and singing voices, supporting a wide range of enhancement tasks including denoising, dereverberation, declipping, super-resolution, and target speaker extraction, all simultaneously and without fine-tuning. AnyEnhance introduces a prompt-guidance mechanism for in-context learning, which allows the model to natively accept a reference speaker's timbre. In this way, it could boost enhancement performance when a reference audio is available and enable the target speaker extraction task without altering the underlying architecture. Moreover, we also introduce a self-critic mechanism into the generative process for masked generative models, yielding higher-quality outputs through iterative self-assessment and refinement. Extensive experiments on various enhancement tasks demonstrate AnyEnhance outperforms existing methods in terms of both objective metrics and subjective listening tests. Demo audios are publicly available at https://amphionspace.github.io/anyenhance/.
Problem

Research questions and friction points this paper is trying to address.

Audio Quality Enhancement
Self-improving Capability
Sound Feature Imitation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-improvement
Multi-task enhancement
Quality superiority
🔎 Similar Papers
No similar papers found.