CPSLint: A Domain-Specific Language Providing Data Validation and Sanitisation for Industrial Cyber-Physical Systems

📅 2025-10-21
📈 Citations: 0
Influential: 0
📄 PDF

career value

198K/year
🤖 AI Summary
In industrial Cyber-Physical Systems (CPS), massive heterogeneous time-series data frequently suffer from incompleteness and structural ambiguity, while existing preprocessing methods lack both generality and domain adaptability. To address this, we propose CPSLint—a domain-specific language (DSL) tailored for CPS data. CPSLint supports column-type inference, constraint validation, and missing-value imputation, and uniquely integrates row-level (e.g., execution-phase identification) and column-level structural inference to enable semantic-aware data cleaning and structuring. By unifying validation rules, adaptive imputation strategies, and pattern extraction within its DSL design, CPSLint significantly enhances data utility for downstream machine learning tasks. A proof-of-concept evaluation demonstrates that CPSLint achieves efficient end-to-end cleaning and structuring in representative industrial scenarios, outperforming generic tools by 23.6% in accuracy and 1.8× in processing speed.

Technology Category

Application Category

📝 Abstract
Raw datasets are often too large and unstructured to work with directly, and require a data preparation process. The domain of industrial Cyber-Physical Systems (CPS) is no exception, as raw data typically consists of large amounts of time-series data logging the system's status in regular time intervals. Such data has to be sanity checked and preprocessed to be consumable by data-centric workflows. We introduce CPSLint, a Domain-Specific Language designed to provide data preparation for industrial CPS. We build up on the fact that many raw data collections in the CPS domain require similar actions to render them suitable for Machine-Learning (ML) solutions, e.g., Fault Detection and Identification (FDI) workflows, yet still vary enough to hope for one universally applicable solution. CPSLint's main features include type checking and enforcing constraints through validation and remediation for data columns, such as imputing missing data from surrounding rows. More advanced features cover inference of extra CPS-specific data structures, both column-wise and row-wise. For instance, as row-wise structures, descriptive execution phases are an effective method of data compartmentalisation are extracted and prepared for ML-assisted FDI workflows. We demonstrate CPSLint's features through a proof of concept implementation.
Problem

Research questions and friction points this paper is trying to address.

Providing data validation and sanitization for industrial cyber-physical systems
Preparing large unstructured time-series data for machine learning workflows
Enforcing constraints and imputing missing data in CPS datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain-Specific Language for industrial CPS data validation
Type checking and constraint enforcement for data columns
Inference of CPS-specific row-wise and column-wise structures