A Data-Driven Exploration of Elevation Cues in HRTFs: An Explainable AI Perspective Across Multiple Datasets

📅 2025-03-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The neural mechanisms underlying sound elevation perception in binaural audio remain poorly understood, particularly the causal relationship between spectral cues and elevation discrimination, which lacks systematic validation. To address this, we conducted the first explainability-driven elevation classification study across multiple heterogeneous HRTF datasets—integrating 11 public HRTF databases and data from over 600 subjects—using a CNN architecture enhanced with Class Activation Mapping (CAM) for interpretability. We systematically compared time-frequency preprocessing strategies and rigorously evaluated intra-dataset, cross-dataset, and robustness performance. Our analysis identified two stable, elevation-discriminative frequency bands: 8–12 kHz and 4–6 kHz. The model achieved a cross-dataset average accuracy of 89.3%, outperforming conventional peak-based feature methods by 12.7%. These findings advance HRTF modeling from opaque, black-box prediction toward neurobiologically grounded, interpretable perceptual mechanisms.

Technology Category

Application Category

📝 Abstract
Precise elevation perception in binaural audio remains a challenge, despite extensive research on head-related transfer functions (HRTFs) and spectral cues. While prior studies have advanced our understanding of sound localization cues, the interplay between spectral features and elevation perception is still not fully understood. This paper presents a comprehensive analysis of over 600 subjects from 11 diverse public HRTF datasets, employing a convolutional neural network (CNN) model combined with explainable artificial intelligence (XAI) techniques to investigate elevation cues. In addition to testing various HRTF pre-processing methods, we focus on both within-dataset and inter-dataset generalization and explainability, assessing the model's robustness across different HRTF variations stemming from subjects and measurement setups. By leveraging class activation mapping (CAM) saliency maps, we identify key frequency bands that may contribute to elevation perception, providing deeper insights into the spectral features that drive elevation-specific classification. This study offers new perspectives on HRTF modeling and elevation perception by analyzing diverse datasets and pre-processing techniques, expanding our understanding of these cues across a wide range of conditions.
Problem

Research questions and friction points this paper is trying to address.

Explores elevation perception in binaural audio using HRTFs.
Investigates spectral features' role in elevation perception via AI.
Analyzes HRTF datasets to identify key frequency bands for elevation.
Innovation

Methods, ideas, or system contributions that make the work stand out.

CNN model with XAI for HRTF analysis
Class activation mapping identifies key frequencies
Inter-dataset generalization and robustness assessment
🔎 Similar Papers
No similar papers found.
J
J. A. D. Rus
Departament d’Informàtica, Universitat de València, 46100 Burjassot, Spain
M
Mario Montagud
Departament d’Informàtica, Universitat de València, 46100 Burjassot, Spain; i2CAT Foundation, 08034 Barcelona, Spain
J
Jesus Lopez-Ballester
Departament d’Informàtica, Universitat de València, 46100 Burjassot, Spain
Francesc J. Ferri
Francesc J. Ferri
Universitat de Valencia
Artificial intelligenceComputer Science
Maximo Cobos
Maximo Cobos
Full Professor, Universitat de Valencia
audio signal processingacoustic source localizationspatial audiomachine learningwireless acoustic sensor networks