Metric Analysis for Spatial Semantic Segmentation of Sound Scenes

📅 2025-11-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing evaluation metrics for spatial soundscapes semantic segmentation (S5), particularly class-aware signal-to-distortion ratio (CA-SDR), fail to disentangle joint errors from audio source separation and sound event classification—especially under pure classification errors. To address this, we propose class-agnostic signal-to-distortion ratio (CI-SDR), a novel metric framework that introduces cross-contamination-aware error penalization without relying on class priors, thereby distinguishing separation fidelity from annotation bias. Through systematic analysis of CI-SDR’s response to typical degradation scenarios—including label noise and source overlap—we demonstrate its sensitivity and orthogonality to both separation and classification errors. Experiments across diverse S5 systems show that CI-SDR significantly improves the accuracy, robustness, and interpretability of inter-system performance comparisons. This work establishes a principled, unified evaluation paradigm for S5, advancing fair and transparent benchmarking.

Technology Category

Application Category

📝 Abstract
Spatial semantic segmentation of sound scenes (S5) consists of jointly performing audio source separation and sound event classification from a multichannel audio mixture. To evaluate S5 systems, one can consider two individual metrics, i.e., one for source separation and another for sound event classification, but this approach makes it challenging to compare S5 systems. Thus, a joint class-aware signal-to-distortion ratio (CA-SDR) metric was proposed to evaluate S5 systems. In this work, we first compare the CA-SDR with the classical SDR on scenarios with only classification errors. We then analyze the cases where the metric might not allow proper comparison of the systems. To address this problem, we propose a modified version of the CA-SDR which first focuses on class-agnostic SDR and then accounts for the wrongly labeled sources. We also analyze the performance of the two metrics under cross-contamination between separated audio sources. Finally, we propose a first set of penalties in an attempt to make the metric more reflective of the labeling and separation errors.
Problem

Research questions and friction points this paper is trying to address.

Proposing a modified joint metric for evaluating spatial sound segmentation systems
Analyzing limitations of existing metrics in system comparison scenarios
Addressing cross-contamination and labeling errors in audio separation evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposed modified CA-SDR metric for evaluation
Introduced class-agnostic SDR with labeling correction
Added penalty set for labeling and separation errors
🔎 Similar Papers
No similar papers found.
M
Mayank Mishra
Université de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France
Paul Magron
Paul Magron
Researcher
Audio signal processing
R
Romain Serizel
Université de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France