Exploring Self-Supervised Audio Models for Generalized Anomalous Sound Detection

📅 2025-08-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient generalization in machine Anomalous Sound Detection (ASD) caused by data scarcity, complex acoustic environments, and missing attribute labels, this paper proposes a self-supervised learning framework for generalized ASD. Methodologically, it (1) jointly fine-tunes a large-scale self-supervised audio model using a Machine Perception Grouping Adapter (MPGA) and Low-Rank Adaptation (LoRA); (2) introduces a vector-quantization-driven two-level contrastive learning objective to explicitly model hierarchical relationships between machine categories and anomaly semantics, mitigating the label scarcity challenge; and (3) incorporates dynamic clustering to enhance feature discriminability. Evaluated on all five DCASE 2020–2024 ASD benchmarks, the framework achieves significant performance gains over state-of-the-art methods, demonstrating robust generalization across diverse machines and under low-label regimes.

Technology Category

Application Category

📝 Abstract
Machine anomalous sound detection (ASD) is a valuable technique across various applications. However, its generalization performance is often limited due to challenges in data collection and the complexity of acoustic environments. Inspired by the success of large pre-trained models in numerous fields, this paper introduces a robust ASD model that leverages self-supervised pre-trained models trained on large-scale speech and audio datasets. Although there are inconsistencies between the pre-training datasets and the ASD task, our findings indicate that pre-training still provides substantial benefits for ASD. To mitigate overfitting and retain learned knowledge when fine-tuning with limited data, we explore Fully-Connected Low-Rank Adaptation (LoRA) as an alternative to full fine-tuning. Additionally, we propose a Machine-aware Group Adapter module, which enables the model to capture differences between various machines within a unified framework, thereby enhancing the generalization performance of ASD systems. To address the challenge of missing attribute labels, we design a novel objective function that dynamically clusters unattributed data using vector quantization and optimizes through a dual-level contrastive learning loss. The proposed methods are evaluated on all benchmark datasets, including the DCASE 2020-2024 five ASD challenges, and the experimental results show significant improvements of our new approach and demonstrate the effectiveness of our proposed strategies.
Problem

Research questions and friction points this paper is trying to address.

Improving generalization in anomalous sound detection using self-supervised models
Addressing data scarcity and overfitting in audio anomaly detection
Enhancing machine-specific feature learning without attribute labels
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages self-supervised pre-trained audio models
Uses Fully-Connected Low-Rank Adaptation (LoRA)
Introduces Machine-aware Group Adapter module
🔎 Similar Papers
No similar papers found.
B
Bing Han
AudioCC Lab, Department of Computer Science and Engineering & MoE Key Laboratory of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, 200240 P. R. China
Anbai Jiang
Anbai Jiang
Tsinghua University
Anomaly DetectionAudio Processing
Xinhu Zheng
Xinhu Zheng
Assistant Professor, The Hong Kong University of Science and Technology (Guangzhou)
W
Wei-Qiang Zhang
Department of Electronic Engineering, Tsinghua University, Beijing, 100084 P. R. China
J
Jia Liu
Department of Electronic Engineering, Tsinghua University, Beijing, 100084 P. R. China
Pingyi Fan
Pingyi Fan
Professor of Electronic Engineering, Tsinghua University
Wireless CommunicationsInformation TheoryComputer Science
Yanmin Qian
Yanmin Qian
Professor, Shanghai Jiao Tong University
Speech and Language ProcessingSignal ProcessingMachine Learning