ESTM: An Enhanced Dual-Branch Spectral-Temporal Mamba for Anomalous Sound Detection

📅 2025-09-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Industrial acoustic anomaly detection (ASD) faces two key challenges: difficulty in modeling time-frequency coupling in acoustic features and weak capture of long-range temporal dependencies. To address these, we propose a novel dual-branch Mamba architecture—the first to integrate selective state space models (SSMs) into industrial ASD. Our design employs a time-frequency decoupled dual-path structure to separately model temporal dynamics and spectral-band interactions, augmented by a TriStat-Gating (TSG) module for adaptive feature gating. Furthermore, we fuse enhanced Mel-spectrogram representations with raw waveform features to improve sensitivity to subtle anomalies. Evaluated on the DCASE 2020 Task 2 benchmark, our method achieves significant improvements over existing state-of-the-art approaches, demonstrating superior effectiveness and robustness under complex industrial noise conditions.

Technology Category

Application Category

📝 Abstract
The core challenge in industrial equipment anoma lous sound detection (ASD) lies in modeling the time-frequency coupling characteristics of acoustic features. Existing modeling methods are limited by local receptive fields, making it difficult to capture long-range temporal patterns and cross-band dynamic coupling effects in machine acoustic features. In this paper, we propose a novel framework, ESTM, which is based on a dual-path Mamba architecture with time-frequency decoupled modeling and utilizes Selective State-Space Models (SSM) for long-range sequence modeling. ESTM extracts rich feature representations from different time segments and frequency bands by fusing enhanced Mel spectrograms and raw audio features, while further improving sensitivity to anomalous patterns through the TriStat-Gating (TSG) module. Our experiments demonstrate that ESTM improves anomalous detection performance on the DCASE 2020 Task 2 dataset, further validating the effectiveness of the proposed method.
Problem

Research questions and friction points this paper is trying to address.

Modeling time-frequency coupling in acoustic features
Capturing long-range temporal patterns in machine sounds
Detecting cross-band dynamic coupling effects in audio
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-path Mamba architecture for time-frequency decoupling
Selective State-Space Models for long-range sequence modeling
TriStat-Gating module enhances anomalous pattern sensitivity
🔎 Similar Papers
No similar papers found.
C
Chengyuan Ma
Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China
P
Peng Jia
Collaborative Innovation Center for Transport Studies, Dalian Maritime University, Dalian 116026, China
H
Hongyue Guo
Collaborative Innovation Center for Transport Studies, Dalian Maritime University, Dalian 116026, China
Wenming Yang
Wenming Yang
Tsinghua University
Computer VisionImage Processing