Adaptive Rotary Steering with Joint Autoregression for Robust Extraction of Closely Moving Speakers in Dynamic Scenarios

๐Ÿ“… 2026-01-18
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge of robust speech tracking and separation in dynamic acoustic environments, where conventional spatial filtering methods fail due to degraded spatial cues when multiple speakers are in close proximity or their trajectories intersect. To overcome this limitation, the authors propose a joint autoregressive framework that integrates the initial directional information of the target speaker with processed multichannel audio signals. By employing an adaptive rotation-guided mechanism, the method simultaneously optimizes sound source tracking and speech enhancement. The approach combines deep spatial filtering, adaptive rotational sound field modeling, and an interlinked tracking algorithm leveraging speech spectro-temporal correlations. Evaluated on both synthetic and real-world complex scenarios, the proposed method significantly outperforms non-autoregressive baselines, achieving substantial improvements in tracking accuracy and speech enhancement performance for closely spaced or crossing-moving speakers.

Technology Category

Application Category

๐Ÿ“ Abstract
Latest advances in deep spatial filtering for Ambisonics demonstrate strong performance in stationary multi-speaker scenarios by rotating the sound field toward a target speaker prior to multi-channel enhancement. For applicability in dynamic acoustic conditions with moving speakers, we propose to automate this rotary steering using an interleaved tracking algorithm conditioned on the target's initial direction. However, for nearby or crossing speakers, robust tracking becomes difficult and spatial cues less effective for enhancement. By incorporating the processed recording as additional guide into both algorithms, our novel joint autoregressive framework leverages temporal-spectral correlations of speech to resolve spatially challenging speaker constellations. Consequently, our proposed method significantly improves tracking and enhancement of closely spaced speakers, consistently outperforming comparable non-autoregressive methods on a synthetic dataset. Real-world recordings complement these findings in complex scenarios with multiple speaker crossings and varying speaker-to-array distances.
Problem

Research questions and friction points this paper is trying to address.

moving speakers
dynamic scenarios
closely spaced speakers
spatial filtering
speaker tracking
Innovation

Methods, ideas, or system contributions that make the work stand out.

joint autoregression
adaptive rotary steering
spatial filtering
moving speaker separation
Ambisonics
๐Ÿ”Ž Similar Papers
No similar papers found.