🤖 AI Summary
This study systematically investigates the impact of time-frequency feature design on binaural sound source localization (SSL) performance. Addressing both in-domain and cross-HRTF generalization scenarios, we quantitatively evaluate the combined utility of magnitude features (magnitude spectrogram, interaural level difference—ILD) and phase features (phase spectrogram, interaural phase difference—IPD), using a lightweight CNN architecture. Results demonstrate that strategic feature selection yields significantly greater performance gains than merely increasing model capacity. Our proposed rich-feature input—comprising channel-wise spectrograms, ILD, and IPD—achieves state-of-the-art accuracy in in-domain evaluation while substantially improving cross-HRTF generalization, reducing mean absolute azimuth error by 28.6%. This work establishes that meticulous, domain-informed feature engineering—not just architectural sophistication—is pivotal for enhancing robustness and generalizability in binaural SSL systems.
📝 Abstract
This study presents a systematic evaluation of time-frequency feature design for binaural sound source localization (SSL), focusing on how feature selection influences model performance across diverse conditions. We investigate the performance of a convolutional neural network (CNN) model using various combinations of amplitude-based features (magnitude spectrogram, interaural level difference - ILD) and phase-based features (phase spectrogram, interaural phase difference - IPD). Evaluations on in-domain and out-of-domain data with mismatched head-related transfer functions (HRTFs) reveal that carefully chosen feature combinations often outperform increases in model complexity. While two-feature sets such as ILD + IPD are sufficient for in-domain SSL, generalization to diverse content requires richer inputs combining channel spectrograms with both ILD and IPD. Using the optimal feature sets, our low-complexity CNN model achieves competitive performance. Our findings underscore the importance of feature design in binaural SSL and provide practical guidance for both domain-specific and general-purpose localization.