From Vision to Sound: Advancing Audio Anomaly Detection with Vision-Based Algorithms

📅 2025-02-25

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This study addresses the critical limitations of audio anomaly detection (AAD)—namely, the lack of fine-grained localization and interpretability—by pioneering the adaptation of vision pretraining-driven anomaly detection to the audio domain. Methodologically, input audio is converted into time-frequency spectrograms, which are processed by vision backbones (e.g., ViT or ResNet) to extract multi-scale features; these features are then integrated with established unsupervised visual anomaly detection algorithms (e.g., PatchCore, SPADE) to achieve pixel-level, spectrogram-based anomaly localization. Unlike conventional AAD methods that yield only global binary labels, our approach enables interpretable, localized anomaly heatmaps—substantially enhancing diagnostic transparency and real-world deployability. Evaluated on industrial and environmental audio benchmarks, the method achieves state-of-the-art detection accuracy, empirically validating both the efficacy and cross-modal generalizability of vision-to-audio transfer.

Technology Category

Application Category

📝 Abstract

Recent advances in Visual Anomaly Detection (VAD) have introduced sophisticated algorithms leveraging embeddings generated by pre-trained feature extractors. Inspired by these developments, we investigate the adaptation of such algorithms to the audio domain to address the problem of Audio Anomaly Detection (AAD). Unlike most existing AAD methods, which primarily classify anomalous samples, our approach introduces fine-grained temporal-frequency localization of anomalies within the spectrogram, significantly improving explainability. This capability enables a more precise understanding of where and when anomalies occur, making the results more actionable for end users. We evaluate our approach on industrial and environmental benchmarks, demonstrating the effectiveness of VAD techniques in detecting anomalies in audio signals. Moreover, they improve explainability by enabling localized anomaly identification, making audio anomaly detection systems more interpretable and practical.

Problem

Research questions and friction points this paper is trying to address.

Adapting vision-based algorithms for audio anomaly detection

Enhancing temporal-frequency localization in spectrograms

Improving explainability and interpretability of audio anomaly systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-based audio anomaly detection

Temporal-frequency spectrogram localization

Enhanced explainability and interpretability

🔎 Similar Papers

No similar papers found.