Analytic Incremental Learning For Sound Source Localization With Imbalance Rectification

📅 2026-01-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the dual imbalance challenge in real-world sound source localization, arising from long-tailed directional distributions within tasks and data shift or overlap across tasks. To tackle this without relying on replay mechanisms, the authors propose a unified incremental learning framework. The approach leverages the peak characteristics of GCC-PHAT for data augmentation to mitigate intra-task imbalance and introduces an analytical dynamic imbalance corrector that enables task-adaptive parameter updates and regularization. The method effectively avoids catastrophic forgetting while achieving state-of-the-art performance on the SSLR benchmark, with 89.0% accuracy, a mean absolute error of 5.3°, and a backward transfer metric of 1.6, significantly outperforming existing approaches.

Technology Category

Application Category

📝 Abstract
Sound source localization (SSL) demonstrates remarkable results in controlled settings but struggles in real-world deployment due to dual imbalance challenges: intra-task imbalance arising from long-tailed direction-of-arrival (DoA) distributions, and inter-task imbalance induced by cross-task skews and overlaps. These often lead to catastrophic forgetting, significantly degrading the localization accuracy. To mitigate these issues, we propose a unified framework with two key innovations. Specifically, we design a GCC-PHAT-based data augmentation (GDA) method that leverages peak characteristics to alleviate intra-task distribution skews. We also propose an Analytic dynamic imbalance rectifier (ADIR) with task-adaption regularization, which enables analytic updates that adapt to inter-task dynamics. On the SSLR benchmark, our proposal achieves state-of-the-art (SoTA) results of 89.0% accuracy, 5.3{\deg} mean absolute error, and 1.6 backward transfer, demonstrating robustness to evolving imbalances without exemplar storage.
Problem

Research questions and friction points this paper is trying to address.

sound source localization
class imbalance
catastrophic forgetting
direction-of-arrival
incremental learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

analytic incremental learning
imbalance rectification
sound source localization
GCC-PHAT-based data augmentation
task-adaptive regularization
🔎 Similar Papers
No similar papers found.
Z
Zexia Fan
University of Science and Technology Beijing, Beijing, China
Y
Yu Chen
University of Science and Technology Beijing, Beijing, China; The Chinese University of Hong Kong, Shenzhen, China
Qiquan Zhang
Qiquan Zhang
UNSW, Australia | NUS, Singapore | HIT, China
speech processingspeech enhancementaudio-visual learningNLPcomputer vision
K
Kainan Chen
Eigenspace GmbH, Germany
Xinyuan Qian
Xinyuan Qian
Associate Professor, University of Science and Technology Beijing, China
speech processingmultimediahuman robot interaction