🤖 AI Summary
To address the scarcity of physiological parameters and the poor generalization of conventional interpolation and supervised learning in personalized HRTF modeling, this paper proposes the first end-to-end HRIR generation framework based on a conditional denoising diffusion probabilistic model (DDPM). Methodologically, anthropometric features—including pinna and head-torso geometry—are encoded as conditional inputs to guide the diffusion process, enabling direct time-domain synthesis of high-fidelity HRIRs without explicit physical modeling or dense ground-truth supervision. Experiments demonstrate that the generated HRIRs achieve state-of-the-art performance on spatial auditory perception metrics—such as azimuth identification accuracy and front-back confusion rate—significantly outperforming interpolation and regression baselines. This work constitutes the first empirical validation of diffusion models’ expressive power and generalization capability in acoustic personalization tasks, establishing a novel paradigm for lightweight, deployable HRTF customization in immersive audio applications.
📝 Abstract
Head-Related Transfer Functions (HRTFs) have fundamental applications for realistic rendering in immersive audio scenarios. However, they are strongly subject-dependent as they vary considerably depending on the shape of the ears, head and torso. Thus, personalization procedures are required for accurate binaural rendering. Recently, Denoising Diffusion Probabilistic Models (DDPMs), a class of generative learning techniques, have been applied to solve a variety of signal processing-related problems. In this paper, we propose a first approach for using DDPM conditioned on anthropometric measurements to generate personalized Head-Related Impulse Response (HRIR), the time-domain representation of HRTF. The results show the feasibility of DDPMs for HRTF personalization obtaining performance in line with state-of-the-art models.