๐ค AI Summary
This work addresses a critical gap in anomaly sound detection for real-world multi-machine monitoring scenarios, where machine identity labels are often unavailable during testingโa condition overlooked by current methods that rely on such labels, thereby limiting practical deployment. To bridge this gap, the authors propose a minimally invasive evaluation protocol: while preserving the original training data and evaluation metrics, test recordings from multiple machines are mixed, and machine identity is deliberately ignored during inference to better reflect real-world conditions. Experimental results demonstrate a significant performance drop under this more realistic setting, with the extent of degradation strongly correlated with a modelโs implicit capacity to identify machine identities. This finding reveals substantial differences in robustness that are masked by conventional machine-specific evaluation protocols.
๐ Abstract
Anomalous sound detection (ASD) benchmarks typically assume that the identity of the monitored machine is known at test time and that recordings are evaluated in a machine-wise manner. However, in realistic monitoring scenarios with multiple known machines operating concurrently, test recordings may not be reliably attributable to a specific machine, and requiring machine identity imposes deployment constraints such as dedicated sensors per machine. To reveal performance degradations and method-specific differences in robustness that are hidden under standard machine-wise evaluation, we consider a minimal modification of the ASD evaluation protocol in which test recordings from multiple machines are merged and evaluated jointly without access to machine identity at inference time. Training data and evaluation metrics remain unchanged, and machine identity labels are used only for post hoc evaluation. Experiments with representative ASD methods show that relaxing this assumption reveals performance degradations and method-specific differences in robustness that are hidden under standard machine-wise evaluation, and that these degradations are strongly related to implicit machine identification accuracy.