🤖 AI Summary
Industrial acoustic analysis suffers from poor generalization, scarce labeled data, and the absence of large-scale open benchmarks. To address these challenges, we introduce DINOS—the first large-scale, open industrial audio dataset—and IMPACT, a self-supervised foundational model built upon a Transformer architecture. IMPACT jointly optimizes sentence-level and frame-level objectives to integrate global semantic understanding with fine-grained temporal modeling, enabling robust cross-device and cross-task transfer. Evaluated on 30 downstream tasks spanning four categories of industrial equipment, IMPACT outperforms state-of-the-art methods on 24 tasks, significantly advancing performance in anomaly detection and predictive maintenance. DINOS and IMPACT collectively establish the first reproducible, scalable, and extensible benchmark for industrial audio analysis, facilitating standardized evaluation and future research.
📝 Abstract
Acoustic signals from industrial machines offer valuable insights for anomaly detection, predictive maintenance, and operational efficiency enhancement. However, existing task-specific, supervised learning methods often scale poorly and fail to generalize across diverse industrial scenarios, whose acoustic characteristics are distinct from general audio. Furthermore, the scarcity of accessible, large-scale datasets and pretrained models tailored for industrial audio impedes community-driven research and benchmarking. To address these challenges, we introduce DINOS (Diverse INdustrial Operation Sounds), a large-scale open-access dataset. DINOS comprises over 74,149 audio samples (exceeding 1,093 hours) collected from various industrial acoustic scenarios. We also present IMPACT (Industrial Machine Perception via Acoustic Cognitive Transformer), a novel foundation model for industrial machine sound analysis. IMPACT is pretrained on DINOS in a self-supervised manner. By jointly optimizing utterance and frame-level losses, it captures both global semantics and fine-grained temporal structures. This makes its representations suitable for efficient fine-tuning on various industrial downstream tasks with minimal labeled data. Comprehensive benchmarking across 30 distinct downstream tasks (spanning four machine types) demonstrates that IMPACT outperforms existing models on 24 tasks, establishing its superior effectiveness and robustness, while providing a new performance benchmark for future research.