🤖 AI Summary
This work addresses real-time binaural rendering for continuously moving speakers in dynamic acoustic environments. We propose an implicit spatial tracking method that bypasses explicit sound source localization and Ambisonics-domain processing. Our approach employs a signal-dependent mixture-of-experts architecture to dynamically fuse multiple direction-specific binaural filters online, augmented by field-of-view enhancement and real-time weighting mechanisms, enabling adaptive binauralization for arbitrary microphone array inputs. It supports interactive user control—emphasizing or suppressing sounds from specific directions—while fully preserving natural binaural cues. Experiments demonstrate significant improvements in speech intelligibility and spatial immersion under complex noise conditions, validating applicability to AR/VR voice focus and consumer-grade personalized audio systems. The core innovation lies in an implicit localization-driven dynamic filter scheduling paradigm, circumventing structural and computational constraints inherent in conventional direction-of-arrival estimation and high-order Ambisonics.
📝 Abstract
We propose a novel mixture of experts framework for field-of-view enhancement in binaural signal matching. Our approach enables dynamic spatial audio rendering that adapts to continuous talker motion, allowing users to emphasize or suppress sounds from selected directions while preserving natural binaural cues. Unlike traditional methods that rely on explicit direction-of-arrival estimation or operate in the Ambisonics domain, our signal-dependent framework combines multiple binaural filters in an online manner using implicit localization. This allows for real-time tracking and enhancement of moving sound sources, supporting applications such as speech focus, noise reduction, and world-locked audio in augmented and virtual reality. The method is agnostic to array geometry offering a flexible solution for spatial audio capture and personalized playback in next-generation consumer audio devices.