🤖 AI Summary
To address the limitations of sound event localization and detection (SELD) in overlapping multi-source scenarios—namely, reliance on a fixed number of output tracks and poor generalization to arbitrary event counts—this paper proposes the Spatial Mapping and Regression-based Localization framework (SMRL-SELD). SMRL-SELD introduces a novel location-guided modeling paradigm that abandons conventional multi-track assumptions. It employs a geometric mapping from 3D spatial coordinates to a 2D polar plane, integrates direction-aware feature learning, and adopts regression-based direction-of-arrival (DOA) estimation. A joint SED-DOA optimization loss function enables end-to-end detection and localization of an arbitrary number of overlapping sound events. Evaluated on the STARSS23 and STARSS22 datasets, SMRL-SELD achieves significant performance gains over state-of-the-art methods, particularly in high-order overlapping scenarios, effectively overcoming the generalization bottleneck inherent in existing SELD approaches.
📝 Abstract
Sound Event Localization and Detection (SELD) combines the Sound Event Detection (SED) with the corresponding Direction Of Arrival (DOA). Recently, adopted event oriented multi-track methods affect the generality in polyphonic environments due to the limitation of the number of tracks. To enhance the generality in polyphonic environments, we propose Spatial Mapping and Regression Localization for SELD (SMRL-SELD). SMRL-SELD segments the 3D spatial space, mapping it to a 2D plane, and a new regression localization loss is proposed to help the results converge toward the location of the corresponding event. SMRL-SELD is location-oriented, allowing the model to learn event features based on orientation. Thus, the method enables the model to process polyphonic sounds regardless of the number of overlapping events. We conducted experiments on STARSS23 and STARSS22 datasets and our proposed SMRL-SELD outperforms the existing SELD methods in overall evaluation and polyphony environments.