HRTF-guided Binaural Target Speaker Extraction with Real-World Validation

📅 2026-03-17

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This work proposes a binaural target speaker extraction framework that leverages the listener’s head-related transfer function (HRTF) as a spatial prior, addressing the limitations of conventional methods that rely on direction-of-arrival estimation or enrollment signals and often induce spatial auditory distortions. For the first time, individualized HRTFs are explicitly incorporated into a multichannel deep blind source separation model, enabling HRTF-guided conditional extraction without requiring user-specific calibration while achieving cross-listener generalization. The approach effectively preserves binaural spatial localization cues and substantially enhances both speech quality and intelligibility. Experimental validation using real-measured HRTF data demonstrates superior performance in both simulated and real-recorded scenarios.

Technology Category

Application Category

📝 Abstract

This paper presents a Head-Related Transfer Function (HRTF)-guided framework for binaural Target Speaker Extraction (TSE) from mixtures of concurrent sources. Unlike conventional TSE methods based on Direction of Arrival (DOA) estimation or enrollment signals, which often distort perceived spatial location, the proposed approach leverages the listener's HRTF as an explicit spatial prior. The proposed framework is built upon a multi-channel deep blind source separation backbone, adapted to the binaural TSE setting. It is trained on measured HRTFs from a diverse population, enabling cross-listener generalization rather than subject-specific tuning. By conditioning the extraction on HRTF-derived spatial information, the method preserves binaural cues while enhancing speech quality and intelligibility. The performance of the proposed framework is validated through simulations and real recordings obtained from a head and torso simulator (HATS).

Problem

Research questions and friction points this paper is trying to address.

binaural target speaker extraction

spatial distortion

HRTF

speech separation

spatial audio

Innovation

Methods, ideas, or system contributions that make the work stand out.

HRTF-guided

binaural target speaker extraction

spatial prior