🤖 AI Summary
Detecting physics-law-violating attacks in industrial-scale cyber-physical systems (CPS) remains challenging: existing invariant-based detection methods either lack semantic interpretability (when data-driven) or suffer from poor scalability due to reliance on expert-crafted physical models.
Method: This paper proposes the first large language model (LLM)-driven framework for automated physical invariant extraction. It integrates retrieval-augmented generation (RAG) with domain-specific prompting to enable multimodal semantic parsing of design documentation; further, it introduces an invariant-constrained statistical learning mechanism that injects domain-specific physical knowledge into training data to mitigate LLM hallucination and concept drift.
Contribution/Results: Evaluated on a real-world CPS security dataset, the framework achieves 0.923 detection accuracy across 58 attack classes while significantly reducing false positives. It establishes a novel paradigm for interpretable and scalable CPS anomaly detection.
📝 Abstract
Modern industrial infrastructures rely heavily on Cyber-Physical Systems (CPS), but these are vulnerable to cyber-attacks with potentially catastrophic effects. To reduce these risks, anomaly detection methods based on physical invariants have been developed. However, these methods often require domain-specific expertise to manually define invariants, making them costly and difficult to scale. To address this limitation, we propose a novel approach to extract physical invariants from CPS testbeds for anomaly detection. Our insight is that CPS design documentation often contains semantically rich descriptions of physical procedures, which can profile inter-correlated dynamics among system components. Leveraging the built-in physics and engineering knowledge of recent generative AI models, we aim to automate this traditionally manual process, improving scalability and reducing costs. This work focuses on designing and optimizing a Retrieval-Augmented-Generation (RAG) workflow with a customized prompting system tailored for CPS documentation, enabling accurate extraction of semantic information and inference of physical invariants from complex, multimodal content. Then, rather than directly applying the inferred invariants for anomaly detection, we introduce an innovative statistics-based learning approach that integrates these invariants into the training dataset. This method addresses limitations such as hallucination and concept drift, enhancing the reliability of the model. We evaluate our approach on real-world public CPS security dataset which contains 86 data points and 58 attacking cases. The results show that our approach achieves a high precision of 0.923, accurately detecting anomalies while minimizing false alarms.