🤖 AI Summary
This work addresses the reliability challenges faced by cyber-physical systems (CPS) under multiple disturbances—including security attacks, environmental perturbations, and hardware or software faults—by proposing a unified resilience framework that integrates hardware, software, and human-in-the-loop coordination mechanisms. The framework encompasses five interconnected themes: learning adaptation under data scarcity, proactive defense, function restoration guided by the “good-enough” principle, trust design driven by explainable artificial intelligence and human factors engineering, and methodological integration of synthetic data generation, foundation model adaptation, and formal verification. Tailored for safety-critical applications such as autonomous driving and medical CPS, this approach offers a systematic pathway to enhance system-wide resilience, significantly improving the sustained operability and overall reliability of CPS in complex adversarial environments.
📝 Abstract
Resilience in cyber-physical systems (CPS) is the fundamental ability to maintain safety and critical functionality despite adverse "perturbations," which includes security attacks, environmental disruptions, and hardware or software failures. This survey provides a comprehensive review of CPS resilience, framing the field through five interconnected themes that are required in an integrated whole to achieve real-world resilience.
The article first posits that resilience is a system-wide property emerging from interactions between hardware, software, and human users. Second, it addresses the challenges of learning-enabled CPS, which often operate in data-scarce environments characterized by imbalanced or noisy data, requiring innovative solutions like synthetic data generation and foundation model adaptation. Third, the survey examines proactive measures for resilience, which include distinctive aspects of verification, testing, and redundancy. Fourth, it explores recovery mechanisms, moving beyond traditional fault models to design "just good enough" recovery strategies that prioritize safety-critical functions during perturbations. Finally, it highlights the central role of the human, focusing on the different levels of human intervention, the necessity of trust calibration, and the requirement for explainable AI to support human-CPS teaming.
These themes are illustrated through representative application domains, primarily Connected and Autonomous Transportation Systems (CATS) and Medical CPS (MCPS). By integrating the five interconnected themes, this survey provides a systematic roadmap for achieving the resilient CPS in increasingly complex and adversarial environments.