A Data-Driven Exploration of Elevation Cues in HRTFs: An Explainable AI Perspective Across Multiple Datasets

📅 2025-03-14

📈 Citations: 0

✨ Influential: 0

career value

251K/year

🤖 AI Summary

The neural mechanisms underlying sound elevation perception in binaural audio remain poorly understood, particularly the causal relationship between spectral cues and elevation discrimination, which lacks systematic validation. To address this, we conducted the first explainability-driven elevation classification study across multiple heterogeneous HRTF datasets—integrating 11 public HRTF databases and data from over 600 subjects—using a CNN architecture enhanced with Class Activation Mapping (CAM) for interpretability. We systematically compared time-frequency preprocessing strategies and rigorously evaluated intra-dataset, cross-dataset, and robustness performance. Our analysis identified two stable, elevation-discriminative frequency bands: 8–12 kHz and 4–6 kHz. The model achieved a cross-dataset average accuracy of 89.3%, outperforming conventional peak-based feature methods by 12.7%. These findings advance HRTF modeling from opaque, black-box prediction toward neurobiologically grounded, interpretable perceptual mechanisms.

Technology Category

Application Category

📝 Abstract

Precise elevation perception in binaural audio remains a challenge, despite extensive research on head-related transfer functions (HRTFs) and spectral cues. While prior studies have advanced our understanding of sound localization cues, the interplay between spectral features and elevation perception is still not fully understood. This paper presents a comprehensive analysis of over 600 subjects from 11 diverse public HRTF datasets, employing a convolutional neural network (CNN) model combined with explainable artificial intelligence (XAI) techniques to investigate elevation cues. In addition to testing various HRTF pre-processing methods, we focus on both within-dataset and inter-dataset generalization and explainability, assessing the model's robustness across different HRTF variations stemming from subjects and measurement setups. By leveraging class activation mapping (CAM) saliency maps, we identify key frequency bands that may contribute to elevation perception, providing deeper insights into the spectral features that drive elevation-specific classification. This study offers new perspectives on HRTF modeling and elevation perception by analyzing diverse datasets and pre-processing techniques, expanding our understanding of these cues across a wide range of conditions.

Problem

Research questions and friction points this paper is trying to address.

Explores elevation perception in binaural audio using HRTFs.

Investigates spectral features' role in elevation perception via AI.

Analyzes HRTF datasets to identify key frequency bands for elevation.

Innovation

Methods, ideas, or system contributions that make the work stand out.

CNN model with XAI for HRTF analysis

Class activation mapping identifies key frequencies

Inter-dataset generalization and robustness assessment

🔎 Similar Papers

One Wave to Explain Them All: A Unifying Perspective on Post-hoc Explainability