Binaspect -- A Python Library for Binaural Audio Analysis, Visualization & Feature Generation

📅 2025-10-29

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work addresses the challenge of analyzing spatial cue degradation in binaural audio under blind-source conditions. We propose an azimuth-aware modeling method that requires no prior knowledge of head-related transfer functions (HRTFs). By fusing modified interaural time difference (ITD) and interaural level difference (ILD) spectrograms and applying time-frequency adaptive clustering, we construct a robust and interpretable time–azimuth histogram (Azimuthogram), enabling multi-source azimuth separation and degradation localization. The method eliminates reliance on anatomical head models and supports degradation visualization across standard spatial audio processing pipelines—including codec-based compression (e.g., bitrate reduction), Ambisonic rendering, and vector-base amplitude panning (VBAP) localization. The resulting structured azimuth features significantly improve performance in no-reference quality prediction and spatial audio classification tasks, establishing a novel paradigm for reference-free spatial audio assessment.

Technology Category

Application Category

📝 Abstract

We present Binaspect, an open-source Python library for binaural audio analysis, visualization, and feature generation. Binaspect generates interpretable "azimuth maps" by calculating modified interaural time and level difference spectrograms, and clustering those time-frequency (TF) bins into stable time-azimuth histogram representations. This allows multiple active sources to appear as distinct azimuthal clusters, while degradations manifest as broadened, diffused, or shifted distributions. Crucially, Binaspect operates blindly on audio, requiring no prior knowledge of head models. These visualizations enable researchers and engineers to observe how binaural cues are degraded by codec and renderer design choices, among other downstream processes. We demonstrate the tool on bitrate ladders, ambisonic rendering, and VBAP source positioning, where degradations are clearly revealed. In addition to their diagnostic value, the proposed representations can be exported as structured features suitable for training machine learning models in quality prediction, spatial audio classification, and other binaural tasks. Binaspect is released under an open-source license with full reproducibility scripts at https://github.com/QxLabIreland/Binaspect.

Problem

Research questions and friction points this paper is trying to address.

Analyzes binaural audio degradation through codec and rendering processes

Generates interpretable azimuth maps for multiple sound source localization

Produces structured features for machine learning in spatial audio tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates azimuth maps via modified interaural difference spectrograms

Clusters time-frequency bins into stable azimuth histograms

Operates blindly on audio without requiring head models

🔎 Similar Papers

No similar papers found.