đ¤ AI Summary
Conventional methods exhibit insufficient noise-masking capability for music in noisy environments. Method: This paper proposes an active masking enhancement approach based on deep spectral envelope reshaping. Leveraging psychoacoustic simultaneous masking, it introduces, for the first time, a differentiable psychoacoustic masking model into an end-to-end neural network training framework, jointly optimizing masking efficacy, musical fidelity, and perceptual loudness. The method employs a CNN-LSTM-based frequency-response prediction network, integrated with a custom perceptual loss function, and is trained on synthetically generated data simulating headphone listening conditions. Results: Experiments demonstrate significant improvements over state-of-the-art methods across multiple objective masking metrics; notably, masking depth in target noise frequency bands is substantially enhanced, while preserving the structural integrity of the original mix and maintaining subjective auditory consistency.
đ Abstract
People often listen to music in noisy environments, seeking to isolate themselves from ambient sounds. Indeed, a music signal can mask some of the noise's frequency components due to the effect of simultaneous masking. In this article, we propose a neural network based on a psychoacoustic masking model, designed to enhance the music's ability to mask ambient noise by reshaping its spectral envelope with predicted filter frequency responses. The model is trained with a perceptual loss function that balances two constraints: effectively masking the noise while preserving the original music mix and the user's chosen listening level. We evaluate our approach on simulated data replicating a user's experience of listening to music with headphones in a noisy environment. The results, based on defined objective metrics, demonstrate that our system improves the state of the art.