🤖 AI Summary
Industrial acoustic anomaly detection (ASD) faces two key challenges: difficulty in modeling time-frequency coupling in acoustic features and weak capture of long-range temporal dependencies. To address these, we propose a novel dual-branch Mamba architecture—the first to integrate selective state space models (SSMs) into industrial ASD. Our design employs a time-frequency decoupled dual-path structure to separately model temporal dynamics and spectral-band interactions, augmented by a TriStat-Gating (TSG) module for adaptive feature gating. Furthermore, we fuse enhanced Mel-spectrogram representations with raw waveform features to improve sensitivity to subtle anomalies. Evaluated on the DCASE 2020 Task 2 benchmark, our method achieves significant improvements over existing state-of-the-art approaches, demonstrating superior effectiveness and robustness under complex industrial noise conditions.
📝 Abstract
The core challenge in industrial equipment anoma lous sound detection (ASD) lies in modeling the time-frequency coupling characteristics of acoustic features. Existing modeling methods are limited by local receptive fields, making it difficult to capture long-range temporal patterns and cross-band dynamic coupling effects in machine acoustic features. In this paper, we propose a novel framework, ESTM, which is based on a dual-path Mamba architecture with time-frequency decoupled modeling and utilizes Selective State-Space Models (SSM) for long-range sequence modeling. ESTM extracts rich feature representations from different time segments and frequency bands by fusing enhanced Mel spectrograms and raw audio features, while further improving sensitivity to anomalous patterns through the TriStat-Gating (TSG) module. Our experiments demonstrate that ESTM improves anomalous detection performance on the DCASE 2020 Task 2 dataset, further validating the effectiveness of the proposed method.