AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement

📅 2025-01-26

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the fragmentation of audio enhancement tasks and poor model generalization by proposing AnyEnhance—the first unified end-to-end generative model for both speech and singing enhancement. Methodologically, it adopts a masked generative modeling framework, introduces a novel prompt-guided mechanism for native reference timbre injection (enabling zero-shot target speaker extraction), and incorporates a self-critique module for iterative quality refinement during generation. Without task-specific fine-tuning, AnyEnhance simultaneously performs denoising, dereverberation, clipping repair, super-resolution, and target speaker extraction. Experiments demonstrate consistent superiority over state-of-the-art methods across objective metrics—PESQ, STOI, SI-SNR—and subjective MOS scores, confirming strong cross-task generalization capability. The code and demonstration audio are publicly available.

Technology Category

Application Category

📝 Abstract

We introduce AnyEnhance, a unified generative model for voice enhancement that processes both speech and singing voices. Based on a masked generative model, AnyEnhance is capable of handling both speech and singing voices, supporting a wide range of enhancement tasks including denoising, dereverberation, declipping, super-resolution, and target speaker extraction, all simultaneously and without fine-tuning. AnyEnhance introduces a prompt-guidance mechanism for in-context learning, which allows the model to natively accept a reference speaker's timbre. In this way, it could boost enhancement performance when a reference audio is available and enable the target speaker extraction task without altering the underlying architecture. Moreover, we also introduce a self-critic mechanism into the generative process for masked generative models, yielding higher-quality outputs through iterative self-assessment and refinement. Extensive experiments on various enhancement tasks demonstrate AnyEnhance outperforms existing methods in terms of both objective metrics and subjective listening tests. Demo audios are publicly available at https://amphionspace.github.io/anyenhance/.

Problem

Research questions and friction points this paper is trying to address.

Audio Quality Enhancement

Self-improving Capability

Sound Feature Imitation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-improvement

Multi-task enhancement

Quality superiority

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Member of Technical Staff - Voice Model

xAI

$150,000 - $450,000 USD

Palo Alto, CA / Palo Alto, CA, Palo Alto, California, United States

AI Research Scientist - Voice AI Team, Meta Superintelligence Labs