Towards a more realistic evaluation of machine learning models for bearing fault diagnosis

šŸ“… 2025-09-26
šŸ“ˆ Citations: 0
✨ Influential: 0
šŸ“„ PDF
šŸ¤– AI Summary
Data leakage—pervasive in bearing fault diagnosis—artificially inflates the generalization performance of machine learning models in controlled experiments, severely undermining industrial deployment credibility. To address this, we propose a bearing-level physically isolated data splitting paradigm that eliminates cross-bearing information leakage at its source. We further introduce the first unbiased evaluation framework integrating bearing-level splitting with multi-label classification, employing macro-averaged AUROC and other metrics to rigorously quantify generalization capability. Our key finding is that the number of independent bearings in training—not just sample count—is the dominant factor governing true model generalization. The methodology spans vibration signal modeling, leakage detection and mitigation, and cross-dataset validation, establishing a ā€œleakage-immuneā€ evaluation pipeline. Experiments on three major benchmarks—CWRU, PU, and UORED-VAFCLS—demonstrate that our framework substantially suppresses inflated performance, significantly enhancing evaluation fidelity and cross-study comparability.

Technology Category

Application Category

šŸ“ Abstract
Reliable detection of bearing faults is essential for maintaining the safety and operational efficiency of rotating machinery. While recent advances in machine learning (ML), particularly deep learning, have shown strong performance in controlled settings, many studies fail to generalize to real-world applications due to methodological flaws, most notably data leakage. This paper investigates the issue of data leakage in vibration-based bearing fault diagnosis and its impact on model evaluation. We demonstrate that common dataset partitioning strategies, such as segment-wise and condition-wise splits, introduce spurious correlations that inflate performance metrics. To address this, we propose a rigorous, leakage-free evaluation methodology centered on bearing-wise data partitioning, ensuring no overlap between the physical components used for training and testing. Additionally, we reformulate the classification task as a multi-label problem, enabling the detection of co-occurring fault types and the use of prevalence-independent metrics such as Macro AUROC. Beyond preventing leakage, we also examine the effect of dataset diversity on generalization, showing that the number of unique training bearings is a decisive factor for achieving robust performance. We evaluate our methodology on three widely adopted datasets: CWRU, Paderborn University (PU), and University of Ottawa (UORED-VAFCLS). This study highlights the importance of leakage-aware evaluation protocols and provides practical guidelines for dataset partitioning, model selection, and validation, fostering the development of more trustworthy ML systems for industrial fault diagnosis applications.
Problem

Research questions and friction points this paper is trying to address.

Addresses data leakage in bearing fault diagnosis model evaluation
Proposes leakage-free methodology using bearing-wise data partitioning
Examines dataset diversity impact on model generalization performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bearing-wise data partitioning prevents data leakage
Multi-label classification detects co-occurring fault types
Dataset diversity analysis improves model generalization performance
šŸ”Ž Similar Papers
No similar papers found.
J
João Paulo Vieira
Department of Electrical and Electronic Engineering, Federal University of Santa Catarina, Florianópolis, Brazil
V
Victor Afonso Bauler
Department of Mechanical Engineering, Federal University of Santa Catarina, Florianópolis, Brazil
R
Rodrigo Kobashikawa Rosa
Department of Electrical and Electronic Engineering, Federal University of Santa Catarina, Florianópolis, Brazil
Danilo Silva
Danilo Silva
Associate Professor, Federal University of Santa Catarina
Machine LearningDeep LearningInformation Theory