Implementing CPSLint: A Data Validation and Sanitisation Tool for Industrial Cyber-Physical Systems

๐Ÿ“… 2026-04-20
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

196K/year
๐Ÿค– AI Summary
This work addresses the challenges posed by the massive and unstructured time-series data generated in industrial cyber-physical systems (CPS), where existing preprocessing approaches rely on ad hoc scripts that suffer from poor readability, reusability, and maintainability. To overcome these limitations, the authors propose and implement CPSLint, the first domain-specific language (DSL) tailored for industrial CPS data preprocessing. CPSLint abstracts common data cleaning and validation operations into a concise and expressive syntax, enabling cross-scenario reuse and significantly improving both data preparation efficiency and team collaboration. The DSL has been open-sourced, and experimental results demonstrate that complex preprocessing tasks can be accomplished in just a few lines of code, substantially reducing redundant development efforts. CPSLint thus establishes a scalable and standardized paradigm for industrial time-series data processing.

Technology Category

Application Category

๐Ÿ“ Abstract
Raw datasets are often too large and unstructured to work with directly, and require a data preparation phase. The domain of industrial Cyber-Physical Systems (CPSs) is no exception, as raw data typically consists of large time-series data collections that log the system's status at regular time intervals. The processing of such raw data is often carried out using ad hoc, case-specific, one-off Python scripts, often neglecting aspects of readability, reusability, and maintainability. In practice, this can cause professionals such as data scientists to write similar data preparation scripts for each case, requiring them to do much repetitive work. We introduce CPSLint, a Domain-Specific Language (DSL) designed to support the data preparation process for industrial CPS. CPSLint raises the level of abstraction to the point where both data scientists and domain experts can perform the data preparation task. We leverage the fact that many raw data collections in the industrial CPS domain require similar actions to render them suitable for data-centric workflows. In our DSL one can express the data preparation process in just a few lines of code. CPSLint is a publicly available tool applicable for any case involving time-series data collections in need of sanitisation.
Problem

Research questions and friction points this paper is trying to address.

Cyber-Physical Systems
data preparation
time-series data
data validation
data sanitisation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain-Specific Language
Data Sanitisation
Cyber-Physical Systems
Time-Series Data
Data Preparation
๐Ÿ”Ž Similar Papers
๐Ÿ’ผ Related Jobs
U
Uraz Odyurt
Dynamics Based Maintenance, ET Faculty, University of Twente, The Netherlands
ร–
ร–mer Sayilir
Formal Methods and Tools, EEMCS Faculty, University of Twente, The Netherlands
M
Mariรซlle Stoelinga
Formal Methods and Tools, EEMCS Faculty, University of Twente, The Netherlands
Vadim Zaytsev
Vadim Zaytsev
Associate Professor, University of Twente
Software Language EngineeringGrammarwareDomain Specific LanguagesAutomationLegacy