🤖 AI Summary
This work addresses the problem of detecting man-in-the-middle attacks in model-free reinforcement learning for cyber-physical systems by proposing a novel detection framework based on extended Bellman residual analysis. The approach introduces a new MDP-based attack model in which the reward function depends on both the current and next states, effectively capturing reward perturbations induced by the attacker’s transition estimation errors. An optimal system identification policy is designed to facilitate detection, and theoretical analysis demonstrates that the required learning time for detection scales linearly with the attacker’s learning time—achieving the information-theoretic lower bound and thus order-optimal detection efficiency. The method remains effective under asynchronous and intermittent attack scenarios, exhibiting high efficiency, robustness, and theoretical optimality across diverse attack patterns.
📝 Abstract
We consider the problem of learning-based man-in-the-middle (MITM) attacks in cyber-physical systems (CPS), and extend our previously proposed Bellman Deviation Detection (BDD) framework for model-free reinforcement learning (RL). We refine the standard MDP attack model by allowing the reward function to depend on both the current and subsequent states, thereby capturing reward variations induced by errors in the adversary's transition estimate. We also derive an optimal system-identification strategy for the adversary that minimizes detectable value deviations. Further, we prove that the agent's asymptotic learning time required to secure the system scales linearly with the adversary's learning time, and that this matches the optimal lower bound. Hence, the proposed detection scheme is order-optimal in detection efficiency. Finally, we extend the framework to asynchronous and intermittent attack scenarios, where reliable detection is preserved.