π€ AI Summary
In real-world scenarios, long-tailed class distributions severely degrade the performance of deep learning-based sound event localization and detection (SELD), as standard regression losses inherently bias optimization toward high-frequency classes, undermining modeling fidelity for rare events. To address this, we propose MAGENTAβa novel method that, for the first time, unifies magnitude (radial) and direction (angular) regression errors within an interpretable vector space. MAGENTA introduces a rarity-aware geometric decomposition loss, grounded in physical principles, to explicitly guide optimization and enhance model sensitivity and robustness to infrequent events. Extensive experiments on realistic long-tailed SELD benchmarks demonstrate that MAGENTA significantly improves both localization accuracy and event detection F1-score. This work establishes the first geometric error-decoupling optimization framework tailored for long-tailed acoustic perception tasks.
π Abstract
Deep learning-based Sound Event Localization and Detection (SELD) systems degrade significantly on real-world, long-tailed datasets. Standard regression losses bias learning toward frequent classes, causing rare events to be systematically under-recognized. To address this challenge, we introduce MAGENTA (Magnitude And Geometry-ENhanced Training Approach), a unified loss function that counteracts this bias within a physically interpretable vector space. MAGENTA geometrically decomposes the regression error into radial and angular components, enabling targeted, rarity-aware penalties and strengthened directional modeling. Empirically, MAGENTA substantially improves SELD performance on imbalanced real-world data, providing a principled foundation for a new class of geometry-aware SELD objectives. Code is available at: https://github.com/itsjunwei/MAGENTA_ICASSP