BiFormer3D: Grid-Free Time-Domain Reconstruction of Head-Related Impulse Responses with a Spatially Encoded Transformer

📅 2026-03-29

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses the challenge of reconstructing full-sphere head-related impulse responses (HRIRs) from sparse individualized measurements by proposing the first time-domain, end-to-end, grid-free binaural Transformer model. Departing from conventional frequency-domain approaches that rely on minimum-phase assumptions and fixed directional grids, the proposed method integrates sinusoidal spatial encoding, a Conv1D refinement module, and multi-task auxiliary heads for interaural time and level differences (ITD/ILD) to directly predict complete HRIRs at arbitrary directions from limited measurements. Evaluated on the SONICOM dataset, the model significantly outperforms existing methods in terms of normalized mean squared error (NMSE), cosine similarity, and ITD/ILD estimation accuracy, while simultaneously enhancing temporal fidelity and spatial continuity.

Technology Category

Application Category

📝 Abstract

Individualized head-related impulse responses (HRIRs) enable binaural rendering, but dense per-listener measurements are costly. We address HRIR spatial up-sampling from sparse per-listener measurements: given a few measured HRIRs for a listener, predict HRIRs at unmeasured target directions. Prior learning methods often work in the frequency domain, rely on minimum-phase assumptions or separate timing models, and use a fixed direction grid, which can degrade temporal fidelity and spatial continuity. We propose BiFormer3D, a time-domain, grid-free binaural Transformer for reconstructing HRIRs at arbitrary directions from sparse inputs. It uses sinusoidal spatial features, a Conv1D refinement module, and auxiliary interaural time difference (ITD) and interaural level difference (ILD) heads. On SONICOM, it improves normalized mean squared error (NMSE), cosine distance, and ITD/ILD errors over prior methods; ablations validate modules and show minimum-phase pre-processing is unnecessary.

Problem

Research questions and friction points this paper is trying to address.

HRIR reconstruction

spatial up-sampling

individualized binaural rendering

time-domain modeling

grid-free prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

BiFormer3D

time-domain HRIR reconstruction

grid-free spatial up-sampling