Location-Oriented Sound Event Localization and Detection with Spatial Mapping and Regression Localization

📅 2025-04-11

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

To address the limitations of sound event localization and detection (SELD) in overlapping multi-source scenarios—namely, reliance on a fixed number of output tracks and poor generalization to arbitrary event counts—this paper proposes the Spatial Mapping and Regression-based Localization framework (SMRL-SELD). SMRL-SELD introduces a novel location-guided modeling paradigm that abandons conventional multi-track assumptions. It employs a geometric mapping from 3D spatial coordinates to a 2D polar plane, integrates direction-aware feature learning, and adopts regression-based direction-of-arrival (DOA) estimation. A joint SED-DOA optimization loss function enables end-to-end detection and localization of an arbitrary number of overlapping sound events. Evaluated on the STARSS23 and STARSS22 datasets, SMRL-SELD achieves significant performance gains over state-of-the-art methods, particularly in high-order overlapping scenarios, effectively overcoming the generalization bottleneck inherent in existing SELD approaches.

Technology Category

Application Category

📝 Abstract

Sound Event Localization and Detection (SELD) combines the Sound Event Detection (SED) with the corresponding Direction Of Arrival (DOA). Recently, adopted event oriented multi-track methods affect the generality in polyphonic environments due to the limitation of the number of tracks. To enhance the generality in polyphonic environments, we propose Spatial Mapping and Regression Localization for SELD (SMRL-SELD). SMRL-SELD segments the 3D spatial space, mapping it to a 2D plane, and a new regression localization loss is proposed to help the results converge toward the location of the corresponding event. SMRL-SELD is location-oriented, allowing the model to learn event features based on orientation. Thus, the method enables the model to process polyphonic sounds regardless of the number of overlapping events. We conducted experiments on STARSS23 and STARSS22 datasets and our proposed SMRL-SELD outperforms the existing SELD methods in overall evaluation and polyphony environments.

Problem

Research questions and friction points this paper is trying to address.

Enhancing sound event localization in polyphonic environments

Mapping 3D spatial space to 2D for improved detection

Overcoming track limitations in overlapping sound events

Innovation

Methods, ideas, or system contributions that make the work stand out.

Segments 3D space into 2D plane

Uses regression localization loss

Location-oriented polyphonic sound processing

🔎 Similar Papers

No similar papers found.