Systematic evaluation of time-frequency features for binaural sound source localization

📅 2025-11-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically investigates the impact of time-frequency feature design on binaural sound source localization (SSL) performance. Addressing both in-domain and cross-HRTF generalization scenarios, we quantitatively evaluate the combined utility of magnitude features (magnitude spectrogram, interaural level difference—ILD) and phase features (phase spectrogram, interaural phase difference—IPD), using a lightweight CNN architecture. Results demonstrate that strategic feature selection yields significantly greater performance gains than merely increasing model capacity. Our proposed rich-feature input—comprising channel-wise spectrograms, ILD, and IPD—achieves state-of-the-art accuracy in in-domain evaluation while substantially improving cross-HRTF generalization, reducing mean absolute azimuth error by 28.6%. This work establishes that meticulous, domain-informed feature engineering—not just architectural sophistication—is pivotal for enhancing robustness and generalizability in binaural SSL systems.

Technology Category

Application Category

📝 Abstract
This study presents a systematic evaluation of time-frequency feature design for binaural sound source localization (SSL), focusing on how feature selection influences model performance across diverse conditions. We investigate the performance of a convolutional neural network (CNN) model using various combinations of amplitude-based features (magnitude spectrogram, interaural level difference - ILD) and phase-based features (phase spectrogram, interaural phase difference - IPD). Evaluations on in-domain and out-of-domain data with mismatched head-related transfer functions (HRTFs) reveal that carefully chosen feature combinations often outperform increases in model complexity. While two-feature sets such as ILD + IPD are sufficient for in-domain SSL, generalization to diverse content requires richer inputs combining channel spectrograms with both ILD and IPD. Using the optimal feature sets, our low-complexity CNN model achieves competitive performance. Our findings underscore the importance of feature design in binaural SSL and provide practical guidance for both domain-specific and general-purpose localization.
Problem

Research questions and friction points this paper is trying to address.

Evaluating time-frequency features for binaural sound localization
Investigating feature combinations impact on CNN model performance
Optimizing features to enhance generalization across diverse conditions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining amplitude and phase features for localization
Using ILD and IPD features with spectrograms
Low-complexity CNN with optimal feature selection
🔎 Similar Papers
No similar papers found.