Dissecting Performance Degradation in Audio Source Separation under Sampling Frequency Mismatch

📅 2026-01-21

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This study addresses the significant performance degradation of deep neural networks in audio source separation when inference is conducted at a lower sampling rate than that used during training, a phenomenon primarily attributed to the loss of high-frequency components. The authors systematically investigate this degradation mechanism and propose and validate the hypothesis that the mere presence of high-frequency information is more critical than its precise representation. To this end, they introduce two novel resampling strategies: noise-kernel resampling, which injects Gaussian noise into the high-frequency bands, and trainable-kernel resampling, which learns an interpolation kernel. Evaluated across multiple state-of-the-art music source separation models, both methods effectively mitigate performance degradation, with noise-kernel resampling demonstrating consistent robustness across architectures and offering a simple yet practical solution.

Technology Category

Application Category

📝 Abstract

Audio processing methods based on deep neural networks are typically trained at a single sampling frequency (SF). To handle untrained SFs, signal resampling is commonly employed, but it can degrade performance, particularly when the input SF is lower than the trained SF. This paper investigates the causes of this degradation through two hypotheses: (i) the lack of high-frequency components introduced by up-sampling, and (ii) the greater importance of their presence than their precise representation. To examine these hypotheses, we compare conventional resampling with three alternatives: post-resampling noise addition, which adds Gaussian noise to the resampled signal; noisy-kernel resampling, which perturbs the kernel with Gaussian noise to enrich high-frequency components; and trainable-kernel resampling, which adapts the interpolation kernel through training. Experiments on music source separation show that noisy-kernel and trainable-kernel resampling alleviate the degradation observed with conventional resampling. We further demonstrate that noisy-kernel resampling is effective across diverse models, highlighting it as a simple yet practical option.

Problem

Research questions and friction points this paper is trying to address.

audio source separation

sampling frequency mismatch

performance degradation

resampling

deep neural networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

sampling frequency mismatch

audio source separation

noisy-kernel resampling