🤖 AI Summary
This work addresses the lack of a standardized evaluation framework across tasks, datasets, and domains in prognostics and health management (PHM) research, which has hindered reproducibility and comparability of results. The authors propose a modular and reproducible evaluation infrastructure that formalizes the PHM pipeline into explicit protocols, uniformly supporting fault detection, diagnosis (classification), and prediction (regression) tasks. By standardizing data contracts and evaluation boundaries, the framework ensures fair inter-task comparisons and enables flexible extension under protocol invariance. It incorporates deterministic, leakage-proof data construction mechanisms and integrates modules for preprocessing, time-window segmentation, label alignment, and metric computation. Comprehensive evaluations of 13 models across 12 datasets—spanning batteries, bearings, and turbofan engines—demonstrate the framework’s generality, fairness, and reproducibility.
📝 Abstract
Progress in Prognostics and Health Management (PHM) is hindered by the lack of standardized and reusable evaluation practices across tasks, datasets, and application domains. Reported results are often difficult to reproduce and compare, as key protocol choices, such as data splits, preprocessing, label alignment, temporal windowing, and metrics, are often implicit or implemented ad hoc. We introduce \picid, a modular evaluation infrastructure that formalizes the PHM evaluation pipeline as an explicit, executable, and reproducible protocol. Through well-defined abstractions, \picid enforces deterministic, leakage-safe dataset construction while remaining flexible across diverse PHM settings. The framework supports fault detection, diagnostics, and prognostics through a unified interface and can be extended to new datasets and model classes without violating protocol invariants. By standardizing data contracts and evaluation boundaries, \picid also enables fair cross-task comparisons across diagnostics (classification) and prognostics (regression), allowing identical model families to be evaluated consistently across heterogeneous settings. We demonstrate \picid through an empirical evaluation of thirteen models on twelve datasets spanning batteries, bearings, turbofan engines, hydraulics, filtration systems, and buildings. This work establishes a reusable foundation for standardized, fair and reproducible evaluation in PHM.