🤖 AI Summary
Industrial acoustic analysis faces dual bottlenecks: deep learning models require extensive labeled data and struggle with real-time deployment on resource-constrained edge devices. To address this, we propose the first kilobyte-scale (KB-scale) lightweight industrial sound foundation model. Our method integrates knowledge distillation, extreme Transformer architecture compression, and co-optimization for edge computing—reducing model size to a few KB and inference latency to milliseconds, enabling end-to-end real-time execution on low-power embedded hardware. The model exhibits strong cross-task generalization: with only minimal labeled data, it achieves performance comparable to large teacher models on downstream tasks such as anomaly detection and predictive maintenance. Validated in real manufacturing workshops, it delivers high accuracy, ultra-low computational overhead, and plug-and-play deployability. This work establishes a practical, deployable paradigm for edge-intelligent auditory perception.
📝 Abstract
Deep learning-based machine listening is broadening the scope of industrial acoustic analysis for applications like anomaly detection and predictive maintenance, thereby improving manufacturing efficiency and reliability. Nevertheless, its reliance on large, task-specific annotated datasets for every new task limits widespread implementation on shop floors. While emerging sound foundation models aim to alleviate data dependency, they are too large and computationally expensive, requiring cloud infrastructure or high-end hardware that is impractical for on-site, real-time deployment. We address this gap with LISTEN (Lightweight Industrial Sound-representable Transformer for Edge Notification), a kilobyte-sized industrial sound foundation model. Using knowledge distillation, LISTEN runs in real-time on low-cost edge devices. On benchmark downstream tasks, it performs nearly identically to its much larger parent model, even when fine-tuned with minimal datasets and training resource. Beyond the model itself, we demonstrate its real-world utility by integrating LISTEN into a complete machine monitoring framework on an edge device with an Industrial Internet of Things (IIoT) sensor and system, validating its performance and generalization capabilities on a live manufacturing shop floor.