LISTEN: Lightweight Industrial Sound-representable Transformer for Edge Notification

📅 2025-07-10

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

Industrial acoustic analysis faces dual bottlenecks: deep learning models require extensive labeled data and struggle with real-time deployment on resource-constrained edge devices. To address this, we propose the first kilobyte-scale (KB-scale) lightweight industrial sound foundation model. Our method integrates knowledge distillation, extreme Transformer architecture compression, and co-optimization for edge computing—reducing model size to a few KB and inference latency to milliseconds, enabling end-to-end real-time execution on low-power embedded hardware. The model exhibits strong cross-task generalization: with only minimal labeled data, it achieves performance comparable to large teacher models on downstream tasks such as anomaly detection and predictive maintenance. Validated in real manufacturing workshops, it delivers high accuracy, ultra-low computational overhead, and plug-and-play deployability. This work establishes a practical, deployable paradigm for edge-intelligent auditory perception.

Technology Category

Application Category

📝 Abstract

Deep learning-based machine listening is broadening the scope of industrial acoustic analysis for applications like anomaly detection and predictive maintenance, thereby improving manufacturing efficiency and reliability. Nevertheless, its reliance on large, task-specific annotated datasets for every new task limits widespread implementation on shop floors. While emerging sound foundation models aim to alleviate data dependency, they are too large and computationally expensive, requiring cloud infrastructure or high-end hardware that is impractical for on-site, real-time deployment. We address this gap with LISTEN (Lightweight Industrial Sound-representable Transformer for Edge Notification), a kilobyte-sized industrial sound foundation model. Using knowledge distillation, LISTEN runs in real-time on low-cost edge devices. On benchmark downstream tasks, it performs nearly identically to its much larger parent model, even when fine-tuned with minimal datasets and training resource. Beyond the model itself, we demonstrate its real-world utility by integrating LISTEN into a complete machine monitoring framework on an edge device with an Industrial Internet of Things (IIoT) sensor and system, validating its performance and generalization capabilities on a live manufacturing shop floor.

Problem

Research questions and friction points this paper is trying to address.

Reducing reliance on large annotated datasets for industrial sound analysis

Overcoming computational limitations of large sound foundation models

Enabling real-time edge deployment for industrial acoustic monitoring

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight sound-representable transformer for edge

Knowledge distillation enables real-time edge deployment

Integrates with IIoT for live manufacturing monitoring

🔎 Similar Papers

No similar papers found.