🤖 AI Summary
Addressing the acoustic echo cancellation (AEC) challenge arising from nonlinear distortion of low-cost loudspeakers and complex room acoustics in real-world scenarios, this paper proposes a dual-microphone architecture: an auxiliary microphone is placed in the loudspeaker’s near-field to directly capture the distorted reference signal. A Wiener filter is first applied to pre-suppress near-end speech interference, enabling high-fidelity linear echo estimation. Subsequently, time-frequency masking, adaptive filtering, and deep neural networks are jointly leveraged to suppress residual nonlinear echo and background noise. Evaluated on matched test sets, the method significantly outperforms baseline approaches. Crucially, it maintains robust performance gains even on mismatched datasets exhibiting strong nonlinear distortion—demonstrating both effectiveness in practical deployment and strong generalization capability across diverse acoustic conditions.
📝 Abstract
Acoustic echo cancellation (AEC) remains challenging in real-world environments due to nonlinear distortions caused by low-cost loudspeakers and complex room acoustics. To mitigate these issues, we introduce a dual-microphone configuration, where an auxiliary reference microphone is placed near the loudspeaker to capture the nonlinearly distorted far-end signal. Although this reference signal is contaminated by near-end speech, we propose a preprocessing module based on Wiener filtering to estimate a compressed time-frequency mask to suppress near-end components. This purified reference signal enables a more effective linear AEC stage, whose residual error signal is then fed to a deep neural network for joint residual echo and noise suppression. Evaluation results show that our method outperforms baseline approaches on matched test sets. To evaluate its robustness under strong nonlinearities, we further test it on a mismatched dataset and observe that it achieves substantial performance gains. These results demonstrate its effectiveness in practical scenarios where the nonlinear distortions are typically unknown.