🤖 AI Summary
This work addresses the challenge of LLM-driven autonomous health monitoring in sensor-intensive industrial environments. We propose a multi-LLM collaborative diagnostic framework that processes statistical features—rather than raw time-series sensor data—as input. Lightweight, specialized LLMs jointly perform anomaly detection, fault classification, and natural-language diagnostic reasoning, thereby enhancing classification sensitivity and decision interpretability. Experiments demonstrate that the statistical-feature-based multi-LLM architecture outperforms both single-LLM baselines and raw-sequence-input approaches in fault detection accuracy and F1 score; moreover, its generated diagnostic reports incorporate causal reasoning chains and explicit uncertainty quantification. Our primary contribution is the first instantiation of a statistical-feature-guided multi-LLM collaborative diagnosis paradigm, empirically validated for effectiveness and interpretability in industrial edge deployment. However, prediction calibration remains limited and requires integration with continual learning mechanisms.
📝 Abstract
Large Language Model (LLM)-based systems present new opportunities for autonomous health monitoring in sensor-rich industrial environments. This study explores the potential of LLMs to detect and classify faults directly from sensor data, while producing inherently explainable outputs through natural language reasoning. We systematically evaluate how LLM-system architecture (single-LLM vs. multi-LLM), input representations (raw vs. descriptive statistics), and context window size affect diagnostic performance. Our findings show that LLM systems perform most effectively when provided with summarized statistical inputs, and that systems with multiple LLMs using specialized prompts offer improved sensitivity for fault classification compared to single-LLM systems. While LLMs can produce detailed and human-readable justifications for their decisions, we observe limitations in their ability to adapt over time in continual learning settings, often struggling to calibrate predictions during repeated fault cycles. These insights point to both the promise and the current boundaries of LLM-based systems as transparent, adaptive diagnostic tools in complex environments.