🤖 AI Summary
Addressing the challenges of high acquisition costs for individual anthropometric measurements and scarce labeled data in personalized HRTF modeling, this paper proposes a source-location-conditioned autoencoder framework. The method first learns a disentangled latent representation of HRTF magnitude, enabling cross-domain fusion across multi-source-layout datasets; it then establishes a lightweight mapping from anthropometric features to this low-dimensional latent space, substantially reducing the complexity of high-dimensional HRTF reconstruction. Its key innovation lies in embedding geometric priors—specifically, source azimuth and elevation—into the autoencoder architecture, thereby balancing physical interpretability with data efficiency. Experiments demonstrate that, under limited labeled samples, the proposed approach achieves an average 12.6% improvement in HRTF estimation accuracy over state-of-the-art DNN-based methods, validating the effectiveness of latent-space dimensionality reduction and multi-dataset collaborative modeling.
📝 Abstract
A method for head-related transfer function (HRTF) individualization from the subject's anthropometric parameters is proposed. Due to the high cost of measurement, the number of subjects included in many HRTF datasets is limited, and the number of those that include anthropometric parameters is even smaller. Therefore, HRTF individualization based on deep neural networks (DNNs) is a challenging task. We propose a HRTF individualization method using the latent representation of HRTF magnitude obtained through an autoencoder conditioned on sound source positions, which makes it possible to combine multiple HRTF datasets with different measured source positions, and makes the network training tractable by reducing the number of parameters to be estimated from anthropometric parameters. Experimental evaluation shows that high estimation accuracy is achieved by the proposed method, compared to current DNN-based methods.