🤖 AI Summary
Cybersecurity policies must frequently adapt to dynamic threats and environmental changes, yet existing reinforcement learning approaches lack theoretical guarantees and exhibit slow adaptation. To address this, we propose an efficient, adaptive policy adjustment framework with provable performance guarantees. Methodologically, it integrates particle-filter-based belief estimation, feature-driven offline policy aggregation, and online rollout optimization. Theoretically, we establish the first verifiable performance bound for feature-based policy aggregation. Engineering-wise, the framework significantly improves scalability and responsiveness of policy updates. Evaluated on benchmark environments—including CAGE-2—and realistic simulation platforms, our approach outperforms current state-of-the-art methods in convergence speed, security assurance, and generalization capability.
📝 Abstract
Evolving security vulnerabilities and shifting operational conditions require frequent updates to network security policies. These updates include adjustments to incident response procedures and modifications to access controls, among others. Reinforcement learning methods have been proposed for automating such policy adaptations, but most of the methods in the research literature lack performance guarantees and adapt slowly to changes. In this paper, we address these limitations and present a method for computing security policies that is scalable, offers theoretical guarantees, and adapts quickly to changes. It assumes a model or simulator of the system and comprises three components: belief estimation through particle filtering, offline policy computation through aggregation, and online policy adaptation through rollout. Central to our method is a new feature-based aggregation technique, which improves scalability and flexibility. We analyze the approximation error of aggregation and show that rollout efficiently adapts policies to changes under certain conditions. Simulations and testbed results demonstrate that our method outperforms state-of-the-art methods on several benchmarks, including CAGE-2.