🤖 AI Summary
This work addresses the limitations of existing deep learning approaches, which lack a unified framework for effectively modeling multi-scale temporal dependencies in industrial IoT settings and exhibit insufficient robustness under data scarcity. To this end, we propose MsFormer, a lightweight multi-scale Transformer architecture that integrates a multi-scale sampling module, tailored positional encoding, and a streamlined attention mechanism—replacing conventional self-attention with pooling operations—to jointly capture complex temporal correlations across heterogeneous streaming data sources. MsFormer achieves a favorable balance between modeling capacity and service efficiency, significantly outperforming state-of-the-art methods on multiple real-world industrial equipment datasets. It demonstrates strong cross-device and cross-operating-condition generalization while maintaining high-quality service (QoS) reliability even under limited data availability.
📝 Abstract
Providing reliable predictive maintenance is a critical industrial AI service essential for ensuring the high availability of manufacturing devices. Existing deep-learning methods present competitive results on such tasks but lack a general service-oriented framework to capture complex dependencies in industrial IoT sensor data. While Transformer-based models show strong sequence modeling capabilities, their direct deployment as robust AI services faces significant bottlenecks. Specifically, streaming sensor data collected in real-world service environments often exhibits multi-scale temporal correlations driven by machine working principles. Besides, the datasets available for training time-to-failure predictive services are typically limited in size. These issues pose significant challenges for directly applying existing models as robust predictive services. To address these challenges, we propose MsFormer, a lightweight Multi-scale Transformer designed as a unified AI service model for reliable industrial predictive maintenance. MsFormer incorporates a Multi-scale Sampling (MS) module and a tailored position encoding mechanism to capture sequential correlations across multi-streaming service data. Additionally, to accommodate data-scarce service environments, MsFormer adopts a lightweight attention mechanism with straightforward pooling operations instead of self-attention. Extensive experiments on real-world datasets demonstrate that the proposed framework achieves significant performance improvements over state-of-the-art methods. Furthermore, MsFormer outperforms across industrial devices and operating conditions, demonstrating strong generalizability while maintaining a highly reliable Quality of Service (QoS).