🤖 AI Summary
To address the weak cross-dataset generalization, limited-sample learning, and heterogeneous-data modeling challenges in mechanical fault diagnosis, this paper introduces the first foundation model specifically designed for bearing vibration signal analysis. We propose a multivariate-to-univariate local-relation-preserving normalization method to achieve unified representation of heterogeneous time-series signals from diverse sources; and design a cross-domain temporal fusion strategy to mitigate distribution shift while enhancing robustness and sample diversity. Leveraging a data harmonization pipeline, temporal feature encoding, self-supervised pretraining, and few-shot fine-tuning, the model achieves state-of-the-art performance on real-world industrial datasets: it surpasses existing task-specific models using only a small number of labeled samples, and its pretraining encompasses over 9 billion vibration data points.
📝 Abstract
Machine fault diagnosis (FD) is a critical task for predictive maintenance, enabling early fault detection and preventing unexpected failures. Despite its importance, existing FD models are operation-specific with limited generalization across diverse datasets. Foundation models (FM) have demonstrated remarkable potential in both visual and language domains, achieving impressive generalization capabilities even with minimal data through few-shot or zero-shot learning. However, translating these advances to FD presents unique hurdles. Unlike the large-scale, cohesive datasets available for images and text, FD datasets are typically smaller and more heterogeneous, with significant variations in sampling frequencies and the number of channels across different systems and applications. This heterogeneity complicates the design of a universal architecture capable of effectively processing such diverse data while maintaining robust feature extraction and learning capabilities. In this paper, we introduce UniFault, a foundation model for fault diagnosis that systematically addresses these issues. Specifically, the model incorporates a comprehensive data harmonization pipeline featuring two key innovations. First, a unification scheme transforms multivariate inputs into standardized univariate sequences while retaining local inter-channel relationships. Second, a novel cross-domain temporal fusion strategy mitigates distribution shifts and enriches sample diversity and count, improving the model generalization across varying conditions. UniFault is pretrained on over 9 billion data points spanning diverse FD datasets, enabling superior few-shot performance. Extensive experiments on real-world FD datasets demonstrate that UniFault achieves SoTA performance, setting a new benchmark for fault diagnosis models and paving the way for more scalable and robust predictive maintenance solutions.