Loud-loss: A Perceptually Motivated Loss Function for Speech Enhancement Based on Equal-Loudness Contours

📅 2025-11-08

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

Conventional mean squared error (MSE) loss over-emphasizes low-frequency components while neglecting the human auditory system’s heightened sensitivity to high frequencies, resulting in suboptimal perceptual quality in speech enhancement. To address this, we propose a perceptually weighted loss function grounded in equal-loudness contours—the first application of psychoacoustic equal-loudness curves to speech enhancement loss design. This approach enables frequency-adaptive weighting, explicitly prioritizing minimization of reconstruction errors in high-frequency bands. The proposed loss is model-agnostic and highly generalizable. When integrated with the GTCRN architecture, it achieves a substantial 0.76-point improvement in wideband perceptual evaluation of speech quality (WB-PESQ) on the VoiceBank+DEMAND corpus (from 2.17 to 2.93), accompanied by marked gains in subjective listening quality.

Technology Category

Application Category

📝 Abstract

The mean squared error (MSE) is a ubiquitous loss function for speech enhancement, but its problem is that the error cannot reflect the auditory perception quality. This is because MSE causes models to over-emphasize low-frequency components which has high energy, leading to the inadequate modeling of perceptually important high-frequency information. To overcome this limitation, we propose a perceptually-weighted loss function grounded in psychoacoustic principles. Specifically, it leverages equal-loudness contours to assign frequency-dependent weights to the reconstruction error, thereby penalizing deviations in a way aligning with human auditory sensitivity. The proposed loss is model-agnostic and flexible, demonstrating strong generality. Experiments on the VoiceBank+DEMAND dataset show that replacing MSE with our loss in a GTCRN model elevates the WB-PESQ score from 2.17 to 2.93-a significant improvement in perceptual quality.

Problem

Research questions and friction points this paper is trying to address.

MSE loss over-emphasizes low frequencies, neglecting perceptual high-frequency components

Existing loss functions fail to align with human auditory sensitivity characteristics

Speech enhancement requires perceptually-weighted loss based on psychoacoustic principles

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages equal-loudness contours for perceptual weighting

Assigns frequency-dependent weights to reconstruction error

Model-agnostic loss function aligning with human auditory sensitivity

🔎 Similar Papers

No similar papers found.