🤖 AI Summary
This work addresses the dual imbalance challenge in real-world sound source localization, arising from long-tailed directional distributions within tasks and data shift or overlap across tasks. To tackle this without relying on replay mechanisms, the authors propose a unified incremental learning framework. The approach leverages the peak characteristics of GCC-PHAT for data augmentation to mitigate intra-task imbalance and introduces an analytical dynamic imbalance corrector that enables task-adaptive parameter updates and regularization. The method effectively avoids catastrophic forgetting while achieving state-of-the-art performance on the SSLR benchmark, with 89.0% accuracy, a mean absolute error of 5.3°, and a backward transfer metric of 1.6, significantly outperforming existing approaches.
📝 Abstract
Sound source localization (SSL) demonstrates remarkable results in controlled settings but struggles in real-world deployment due to dual imbalance challenges: intra-task imbalance arising from long-tailed direction-of-arrival (DoA) distributions, and inter-task imbalance induced by cross-task skews and overlaps. These often lead to catastrophic forgetting, significantly degrading the localization accuracy. To mitigate these issues, we propose a unified framework with two key innovations. Specifically, we design a GCC-PHAT-based data augmentation (GDA) method that leverages peak characteristics to alleviate intra-task distribution skews. We also propose an Analytic dynamic imbalance rectifier (ADIR) with task-adaption regularization, which enables analytic updates that adapt to inter-task dynamics. On the SSLR benchmark, our proposal achieves state-of-the-art (SoTA) results of 89.0% accuracy, 5.3{\deg} mean absolute error, and 1.6 backward transfer, demonstrating robustness to evolving imbalances without exemplar storage.