Location-Oriented Sound Event Localization and Detection with Spatial Mapping and Regression Localization

📅 2025-04-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limitations of sound event localization and detection (SELD) in overlapping multi-source scenarios—namely, reliance on a fixed number of output tracks and poor generalization to arbitrary event counts—this paper proposes the Spatial Mapping and Regression-based Localization framework (SMRL-SELD). SMRL-SELD introduces a novel location-guided modeling paradigm that abandons conventional multi-track assumptions. It employs a geometric mapping from 3D spatial coordinates to a 2D polar plane, integrates direction-aware feature learning, and adopts regression-based direction-of-arrival (DOA) estimation. A joint SED-DOA optimization loss function enables end-to-end detection and localization of an arbitrary number of overlapping sound events. Evaluated on the STARSS23 and STARSS22 datasets, SMRL-SELD achieves significant performance gains over state-of-the-art methods, particularly in high-order overlapping scenarios, effectively overcoming the generalization bottleneck inherent in existing SELD approaches.

Technology Category

Application Category

📝 Abstract
Sound Event Localization and Detection (SELD) combines the Sound Event Detection (SED) with the corresponding Direction Of Arrival (DOA). Recently, adopted event oriented multi-track methods affect the generality in polyphonic environments due to the limitation of the number of tracks. To enhance the generality in polyphonic environments, we propose Spatial Mapping and Regression Localization for SELD (SMRL-SELD). SMRL-SELD segments the 3D spatial space, mapping it to a 2D plane, and a new regression localization loss is proposed to help the results converge toward the location of the corresponding event. SMRL-SELD is location-oriented, allowing the model to learn event features based on orientation. Thus, the method enables the model to process polyphonic sounds regardless of the number of overlapping events. We conducted experiments on STARSS23 and STARSS22 datasets and our proposed SMRL-SELD outperforms the existing SELD methods in overall evaluation and polyphony environments.
Problem

Research questions and friction points this paper is trying to address.

Enhancing sound event localization in polyphonic environments
Mapping 3D spatial space to 2D for improved detection
Overcoming track limitations in overlapping sound events
Innovation

Methods, ideas, or system contributions that make the work stand out.

Segments 3D space into 2D plane
Uses regression localization loss
Location-oriented polyphonic sound processing
🔎 Similar Papers
No similar papers found.
X
Xueping Zhang
School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan 430070, China
Yaxiong Chen
Yaxiong Chen
Wuhan University of Technology
deep hashing、deep learning
R
Ruilin Yao
School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan 430070, China
Y
Yunfei Zi
School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan 430070, China
Shengwu Xiong
Shengwu Xiong
Wuhan University of Technology
Artificial Intelligence