Batch Distillation Data for Developing Machine Learning Anomaly Detection Methods

📅 2025-10-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Chemical process anomaly detection is hindered by the scarcity of publicly available, high-quality experimental datasets. To address this, we developed a laboratory-scale batch distillation platform and collected 119 multimodal time-series datasets encompassing normal operations and human-introduced anomalies, including multi-source sensor signals, online benchtop NMR concentration spectra, audiovisual recordings, and uncertainty estimates for measurements. We propose the first ontology-based, structured anomaly annotation framework, employing a paired design (anomalous–normal) to facilitate root-cause analysis and model interpretability. Data were standardized via metadata harmonization and rigorously modeled for measurement uncertainty. The resulting database is publicly released on Zenodo (DOI: 10.5281/zenodo.17395544). This dataset represents the first open-source chemical process anomaly benchmark featuring expert-curated annotations, quantitative uncertainty characterization, and multimodal integration—significantly advancing research in interpretable AI modeling and intelligent process monitoring.

Technology Category

Application Category

📝 Abstract
Machine learning (ML) holds great potential to advance anomaly detection (AD) in chemical processes. However, the development of ML-based methods is hindered by the lack of openly available experimental data. To address this gap, we have set up a laboratory-scale batch distillation plant and operated it to generate an extensive experimental database, covering fault-free experiments and experiments in which anomalies were intentionally induced, for training advanced ML-based AD methods. In total, 119 experiments were conducted across a wide range of operating conditions and mixtures. Most experiments containing anomalies were paired with a corresponding fault-free one. The database that we provide here includes time-series data from numerous sensors and actuators, along with estimates of measurement uncertainty. In addition, unconventional data sources -- such as concentration profiles obtained via online benchtop NMR spectroscopy and video and audio recordings -- are provided. Extensive metadata and expert annotations of all experiments are included. The anomaly annotations are based on an ontology developed in this work. The data are organized in a structured database and made freely available via doi.org/10.5281/zenodo.17395544. This new database paves the way for the development of advanced ML-based AD methods. As it includes information on the causes of anomalies, it further enables the development of interpretable and explainable ML approaches, as well as methods for anomaly mitigation.
Problem

Research questions and friction points this paper is trying to address.

Addressing the lack of experimental data for machine learning anomaly detection
Providing extensive batch distillation data with intentional anomalies and metadata
Enabling development of interpretable ML methods for chemical process safety
Innovation

Methods, ideas, or system contributions that make the work stand out.

Established laboratory-scale batch distillation plant
Generated extensive experimental database with anomalies
Included unconventional data sources like NMR spectroscopy
🔎 Similar Papers
No similar papers found.
J
Justus Arweiler
Laboratory of Engineering Thermodynamics, RPTU Kaiserslautern, Erwin-Schrödinger-Straße 44, 67663 Kaiserslautern, Germany
I
Indra Jungjohann
Laboratory of Engineering Thermodynamics, RPTU Kaiserslautern, Erwin-Schrödinger-Straße 44, 67663 Kaiserslautern, Germany
A
Aparna Muraleedharan
Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Laboratory for Chemical Process Engineering, Uferstraße 53, 94315 Straubing, Germany
Heike Leitte
Heike Leitte
Professor of Computer Science, TU Kaiserslautern
VisualizationVisual AnalyticsData Science
Jakob Burger
Jakob Burger
Technical University of Munich
Synthetic FuelsOptimisationRaw Material ChangeC1 chemistryBiotechnology
K
Kerstin Münnemann
Laboratory of Engineering Thermodynamics, RPTU Kaiserslautern, Erwin-Schrödinger-Straße 44, 67663 Kaiserslautern, Germany
Fabian Jirasek
Fabian Jirasek
Laboratory of Engineering Themodynamics (LTD), RPTU Kaiserslautern
Chemical EngineeringBioprocess EngineeringThermodynamicsMachine Learning
Hans Hasse
Hans Hasse
University of Kaiserslautern
Chemical Engineering