SpeechRefiner: Towards Perceptual Quality Refinement for Front-End Algorithms

📅 2025-06-16

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Existing speech front-ends (e.g., noise reduction, dereverberation, source separation) often leave residual distortions or introduce perceptually salient artifacts, degrading subjective listening quality—yet conventional objective metrics (e.g., SI-SNR) fail to correlate well with such perceptual impairments. To address this, we propose SpeechRefiner, the first conditional flow matching (CFM)-based post-processing framework explicitly designed for perceptual speech quality enhancement. Its key contributions are: (i) the first application of CFM to waveform-level speech post-processing, enabling end-to-end distortion modeling; (ii) multi-distortion joint training, yielding strong generalization across diverse front-end algorithms and noise types; and (iii) seamless integration into industrial pipelines. Experiments demonstrate significant improvements in PESQ (+1.2), STOI (+0.08), and subjective MOS (+0.8), without retraining for specific front-ends or noise conditions. Code and audio demos are publicly available.

Technology Category

Application Category

📝 Abstract

Speech pre-processing techniques such as denoising, de-reverberation, and separation, are commonly employed as front-ends for various downstream speech processing tasks. However, these methods can sometimes be inadequate, resulting in residual noise or the introduction of new artifacts. Such deficiencies are typically not captured by metrics like SI-SNR but are noticeable to human listeners. To address this, we introduce SpeechRefiner, a post-processing tool that utilizes Conditional Flow Matching (CFM) to improve the perceptual quality of speech. In this study, we benchmark SpeechRefiner against recent task-specific refinement methods and evaluate its performance within our internal processing pipeline, which integrates multiple front-end algorithms. Experiments show that SpeechRefiner exhibits strong generalization across diverse impairment sources, significantly enhancing speech perceptual quality. Audio demos can be found at https://speechrefiner.github.io/SpeechRefiner/.

Problem

Research questions and friction points this paper is trying to address.

Improves perceptual quality of speech post-processing

Addresses residual noise and artifacts from front-end algorithms

Generalizes across diverse speech impairment sources

Innovation

Methods, ideas, or system contributions that make the work stand out.

Post-processing tool using Conditional Flow Matching

Improves perceptual quality of speech

Generalizes across diverse impairment sources

🔎 Similar Papers

No similar papers found.

Anthropic

$350,000—$500,000 USD

San Francisco, CA, USA

AI Research Scientist - Meta Superintelligence Labs (PhD)