🤖 AI Summary
In predictive maintenance (PdM) for automotive systems, poor-quality maintenance logs—exhibiting misspellings, missing fields, near-duplicates, erroneous dates, and other noise—severely hinder real-world deployment. This paper presents the first systematic investigation of large language model (LLM)-based agents for industrial log cleaning. We propose an instruction-guided, task-decomposed LLM agent framework that handles six representative log corruption types end-to-end. Experimental results demonstrate high cleaning accuracy and strong generalization across diverse data quality issues, significantly reducing reliance on domain experts and labeled training data. Our approach establishes a scalable, low-cost, and easily deployable data preprocessing paradigm for PdM, bridging the gap between academic research and industrial implementation.
📝 Abstract
Economic constraints, limited availability of datasets for reproducibility and shortages of specialized expertise have long been recognized as key challenges to the adoption and advancement of predictive maintenance (PdM) in the automotive sector. Recent progress in large language models (LLMs) presents an opportunity to overcome these barriers and speed up the transition of PdM from research to industrial practice. Under these conditions, we explore the potential of LLM-based agents to support PdM cleaning pipelines. Specifically, we focus on maintenance logs, a critical data source for training well-performing machine learning (ML) models, but one often affected by errors such as typos, missing fields, near-duplicate entries, and incorrect dates. We evaluate LLM agents on cleaning tasks involving six distinct types of noise. Our findings show that LLMs are effective at handling generic cleaning tasks and offer a promising foundation for future industrial applications. While domain-specific errors remain challenging, these results highlight the potential for further improvements through specialized training and enhanced agentic capabilities.