🤖 AI Summary
Traditional speech enhancement (SE) methods rely on objective metrics such as SI-SNR, suffering from poor alignment with perceptual quality, weak cross-metric generalization, and dependence on clean reference signals—limiting applicability in real-world scenarios. This paper proposes a novel end-to-end SE training paradigm guided by a learned Speech Quality Assessment (SQA) model: it replaces conventional loss functions with a multi-metric joint-optimized, trainable SQA model as the supervisory signal; introduces a multi-task quality prediction network jointly regressing SI-SNR, STOI, PESQ, and ESTOI; and incorporates an unsupervised adaptation strategy leveraging real-world noisy data. To our knowledge, this is the first work to deeply embed SQA in a closed-loop manner within SE training, overcoming three key bottlenecks: misalignment between optimization objectives and auditory perception, insufficient generalization, and reliance on ideal clean references. Experiments demonstrate significant improvements in PESQ, ESTOI, and SI-SNR under both simulated and real noise, enhanced cross-dataset generalization, and full independence from clean speech references.
📝 Abstract
Speech quality assessment (SQA) aims to predict the perceived quality of speech signals under a wide range of distortions. It is inherently connected to speech enhancement (SE), which seeks to improve speech quality by removing unwanted signal components. While SQA models are widely used to evaluate SE performance, their potential to guide SE training remains underexplored. In this work, we investigate a training framework that leverages a SQA model, trained to predict multiple evaluation metrics from a public SE leaderboard, as a supervisory signal for SE. This approach addresses a key limitation of conventional SE objectives, such as SI-SNR, which often fail to align with perceptual quality and generalize poorly across evaluation metrics. Moreover, it enables training on real-world data where clean references are unavailable. Experiments on both simulated and real-world test sets show that SQA-guided training consistently improves performance across a range of quality metrics.