🤖 AI Summary
This work addresses the scarcity of large-scale, well-annotated time-series datasets for deep anomaly detection in chemical processes. The authors propose an automated workflow that pioneers the translation of experimental logs into simulation scenarios, leveraging a tailored model-order reduction strategy to efficiently solve differential-algebraic equations and enable fully automatic, consistent simulations across diverse operating conditions. By aligning experimental and simulated data and establishing structured anomaly mappings, the framework integrates real and synthetic data to construct the first hybrid anomaly detection dataset specifically for distillation processes. This dataset encompasses a variety of actuator and control anomalies and supports simulation-to-experiment style transfer and pseudo-experimental data generation, thereby providing a high-quality benchmark for advancing anomaly detection in chemical engineering.
📝 Abstract
Anomaly detection (AD) in chemical processes based on deep learning offers significant opportunities but requires large, diverse, and well-annotated training datasets that are rarely available from industrial operations. In a recent work, we introduced a large, fully annotated experimental dataset for batch distillation under normal and anomalous operating conditions. In the present study, we augment this dataset with a corresponding simulation dataset, creating a novel hybrid dataset. The simulation data is generated in an automated workflow with a novel Python-based process simulator that employs a tailored index-reduction strategy for the underlying differential-algebraic equations. Leveraging the rich metadata and structured anomaly annotations of the experimental database, experimental records are automatically translated into simulation scenarios. After calibration to a single reference experiment, the dynamics of the other experiments are well predicted. This enabled the fully automated, consistent generation of time-series data for a large number of experimental runs, covering both normal operation and a wide range of actuator- and control-related anomalies. The resulting hybrid dataset is released openly. From a process simulation perspective, this work demonstrates the automated, consistent simulation of large-scale experimental campaigns, using batch distillation as an example. From a data-driven AD perspective, the hybrid dataset provides a unique basis for simulation-to-experiment style transfer, the generation of pseudo-experimental data, and future research on deep AD methods in chemical process monitoring.