IMPACT: Industrial Machine Perception via Acoustic Cognitive Transformer

📅 2025-07-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Industrial acoustic analysis suffers from poor generalization, scarce labeled data, and the absence of large-scale open benchmarks. To address these challenges, we introduce DINOS—the first large-scale, open industrial audio dataset—and IMPACT, a self-supervised foundational model built upon a Transformer architecture. IMPACT jointly optimizes sentence-level and frame-level objectives to integrate global semantic understanding with fine-grained temporal modeling, enabling robust cross-device and cross-task transfer. Evaluated on 30 downstream tasks spanning four categories of industrial equipment, IMPACT outperforms state-of-the-art methods on 24 tasks, significantly advancing performance in anomaly detection and predictive maintenance. DINOS and IMPACT collectively establish the first reproducible, scalable, and extensible benchmark for industrial audio analysis, facilitating standardized evaluation and future research.

Technology Category

Application Category

📝 Abstract
Acoustic signals from industrial machines offer valuable insights for anomaly detection, predictive maintenance, and operational efficiency enhancement. However, existing task-specific, supervised learning methods often scale poorly and fail to generalize across diverse industrial scenarios, whose acoustic characteristics are distinct from general audio. Furthermore, the scarcity of accessible, large-scale datasets and pretrained models tailored for industrial audio impedes community-driven research and benchmarking. To address these challenges, we introduce DINOS (Diverse INdustrial Operation Sounds), a large-scale open-access dataset. DINOS comprises over 74,149 audio samples (exceeding 1,093 hours) collected from various industrial acoustic scenarios. We also present IMPACT (Industrial Machine Perception via Acoustic Cognitive Transformer), a novel foundation model for industrial machine sound analysis. IMPACT is pretrained on DINOS in a self-supervised manner. By jointly optimizing utterance and frame-level losses, it captures both global semantics and fine-grained temporal structures. This makes its representations suitable for efficient fine-tuning on various industrial downstream tasks with minimal labeled data. Comprehensive benchmarking across 30 distinct downstream tasks (spanning four machine types) demonstrates that IMPACT outperforms existing models on 24 tasks, establishing its superior effectiveness and robustness, while providing a new performance benchmark for future research.
Problem

Research questions and friction points this paper is trying to address.

Existing methods scale poorly for diverse industrial acoustic scenarios
Lack of large datasets and models hinders industrial audio research
Proposes IMPACT model and DINOS dataset for better sound analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised pretraining on industrial audio dataset
Joint optimization of utterance and frame-level losses
Efficient fine-tuning with minimal labeled data
🔎 Similar Papers
No similar papers found.
C
Changheon Han
School of Mechanical Engineering, Purdue University
Y
Yuseop Sim
School of Mechanical Engineering, Purdue University
H
Hoin Jung
Elmore Family School of Electrical and Computer Engineering, Purdue University
Jiho Lee
Jiho Lee
School of Mechanical Engineering, Purdue University
Hojun Lee
Hojun Lee
Co-founder & CEO at Xperty Corp.
Deep LearningMachine LearningPattern RecognitionObject DetectionUncertainty
Y
Yun Seok Kang
Department of Mechanical Engineering, UNIST
S
Sucheol Woo
Polytechnic Institute, Purdue University
G
Garam Kim
The School of Aviation and Transportation Technology, Purdue University
H
Hyung Wook Park
Department of Mechanical Engineering, UNIST
Martin Byung-Guk Jun
Martin Byung-Guk Jun
Professor, Purdue University
ManufacturingMicro machiningFemtosecond laser machiningElectrospinningNanoparticle coating