🤖 AI Summary
In industrial Cyber-Physical Systems (CPS), massive heterogeneous time-series data frequently suffer from incompleteness and structural ambiguity, while existing preprocessing methods lack both generality and domain adaptability. To address this, we propose CPSLint—a domain-specific language (DSL) tailored for CPS data. CPSLint supports column-type inference, constraint validation, and missing-value imputation, and uniquely integrates row-level (e.g., execution-phase identification) and column-level structural inference to enable semantic-aware data cleaning and structuring. By unifying validation rules, adaptive imputation strategies, and pattern extraction within its DSL design, CPSLint significantly enhances data utility for downstream machine learning tasks. A proof-of-concept evaluation demonstrates that CPSLint achieves efficient end-to-end cleaning and structuring in representative industrial scenarios, outperforming generic tools by 23.6% in accuracy and 1.8× in processing speed.
📝 Abstract
Raw datasets are often too large and unstructured to work with directly, and require a data preparation process. The domain of industrial Cyber-Physical Systems (CPS) is no exception, as raw data typically consists of large amounts of time-series data logging the system's status in regular time intervals. Such data has to be sanity checked and preprocessed to be consumable by data-centric workflows. We introduce CPSLint, a Domain-Specific Language designed to provide data preparation for industrial CPS. We build up on the fact that many raw data collections in the CPS domain require similar actions to render them suitable for Machine-Learning (ML) solutions, e.g., Fault Detection and Identification (FDI) workflows, yet still vary enough to hope for one universally applicable solution.
CPSLint's main features include type checking and enforcing constraints through validation and remediation for data columns, such as imputing missing data from surrounding rows. More advanced features cover inference of extra CPS-specific data structures, both column-wise and row-wise. For instance, as row-wise structures, descriptive execution phases are an effective method of data compartmentalisation are extracted and prepared for ML-assisted FDI workflows. We demonstrate CPSLint's features through a proof of concept implementation.