A Machine Learning Approach for Denoising and Upsampling HRTFs

📅 2025-04-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Personalized HRTF measurement is time-consuming, requires anechoic conditions, and yields sparse, noisy samples. To address these limitations, this paper proposes an end-to-end joint denoising and super-resolution framework. We introduce a novel HRTF-specific Denoisy U-Net cascaded with an Autoencoder-GAN (AE-GAN), jointly optimized using log-spectral distortion (LSD) loss and cosine similarity constraints to achieve noise-robust sparse HRTF reconstruction. Our method reconstructs high-fidelity HRTFs from as few as three sparse measurements, achieving an LSD error of 5.41 dB and a cosine similarity loss of only 0.0070. By eliminating reliance on dense sampling and high-SNR measurements, the approach significantly improves spatial localization accuracy and auditory realism in virtual immersive audio. It establishes a new paradigm for lightweight, widely deployable personalized HRTF modeling.

Technology Category

Application Category

📝 Abstract
The demand for realistic virtual immersive audio continues to grow, with Head-Related Transfer Functions (HRTFs) playing a key role. HRTFs capture how sound reaches our ears, reflecting unique anatomical features and enhancing spatial perception. It has been shown that personalized HRTFs improve localization accuracy, but their measurement remains time-consuming and requires a noise-free environment. Although machine learning has been shown to reduce the required measurement points and, thus, the measurement time, a controlled environment is still necessary. This paper proposes a method to address this constraint by presenting a novel technique that can upsample sparse, noisy HRTF measurements. The proposed approach combines an HRTF Denoisy U-Net for denoising and an Autoencoding Generative Adversarial Network (AE-GAN) for upsampling from three measurement points. The proposed method achieves a log-spectral distortion (LSD) error of 5.41 dB and a cosine similarity loss of 0.0070, demonstrating the method's effectiveness in HRTF upsampling.
Problem

Research questions and friction points this paper is trying to address.

Denoising and upsampling sparse HRTF measurements
Reducing HRTF measurement points and time
Improving HRTF accuracy with machine learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

U-Net for HRTF denoising
AE-GAN for sparse upsampling
Combines denoising and upsampling effectively
🔎 Similar Papers
No similar papers found.
X
Xuyi Hu
Audio Experience Design, Dyson School of Design Engineering, Imperial College London, UK
J
Jian Li
Audio Experience Design, Dyson School of Design Engineering, Imperial College London, UK
Lorenzo Picinali
Lorenzo Picinali
Dyson School of Design Engineering - Imperial College London
spatial audiospatial hearingimmersive vr/araudiology and hearing aids technology
Aidan O. T. Hogg
Aidan O. T. Hogg
Assistant Professor/Lecturer, Queen Mary University of London
Spatial Audio & Virtual Reality