🤖 AI Summary
Existing reinforcement learning approaches to safety are often confined to simulated environments and struggle to generalize effectively in real-world systems. This work proposes the CSLE platform, which innovatively integrates virtualized simulation with Markov decision process modeling to establish a closed-loop “simulation-to-deployment” architecture, thereby bridging the performance gap between theoretical policies and real-world execution. The platform enables autonomous learning and validation of security policies under conditions closely resembling actual network environments. Empirical evaluations across four representative use cases—traffic control, replication, segmentation, and recovery—demonstrate near-optimal security management performance, significantly enhancing the deployability and effectiveness of reinforcement learning in practical cybersecurity applications.
📝 Abstract
Reinforcement learning is a promising approach to autonomous and adaptive security management in networked systems. However, current reinforcement learning solutions for security management are mostly limited to simulation environments and it is unclear how they generalize to operational systems. In this paper, we address this limitation by presenting CSLE: a reinforcement learning platform for autonomous security management that enables experimentation under realistic conditions. Conceptually, CSLE encompasses two systems. First, it includes an emulation system that replicates key components of the target system in a virtualized environment. We use this system to gather measurements and logs, based on which we identify a system model, such as a Markov decision process. Second, it includes a simulation system where security strategies are efficiently learned through simulations of the system model. The learned strategies are then evaluated and refined in the emulation system to close the gap between theoretical and operational performance. We demonstrate CSLE through four use cases: flow control, replication control, segmentation control, and recovery control. Through these use cases, we show that CSLE enables near-optimal security management in an environment that approximates an operational system.