A Scalable Approach to Solving Simulation-Based Network Security Games

πŸ“… 2026-02-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the computational redundancy and scalability limitations of conventional multi-agent reinforcement learning approaches in large-scale network security games. To overcome these challenges, the authors propose MetaDOARβ€”a lightweight meta-controller that leverages graph-structured embeddings to learn compressed state representations. MetaDOAR integrates quantized state caching based on a k-hop conservative failure policy, partition-aware filtering, and hierarchical action selection to substantially reduce redundant computation. Its cooperative architecture combines top-level top-k partition selection with bottom-level focused beam search, while incorporating the Double Oracle/PSRO framework, batched critics, and an LRU cache. Experimental results demonstrate that MetaDOAR outperforms state-of-the-art methods on large-scale network topologies, achieving superior scalability in terms of player utility, memory consumption, and training time.

Technology Category

Application Category

πŸ“ Abstract
We introduce MetaDOAR, a lightweight meta-controller that augments the Double Oracle / PSRO paradigm with a learned, partition-aware filtering layer and Q-value caching to enable scalable multi-agent reinforcement learning on very large cyber-network environments. MetaDOAR learns a compact state projection from per node structural embeddings to rapidly score and select a small subset of devices (a top-k partition) on which a conventional low-level actor performs focused beam search utilizing a critic agent. Selected candidate actions are evaluated with batched critic forwards and stored in an LRU cache keyed by a quantized state projection and local action identifiers, dramatically reducing redundant critic computation while preserving decision quality via conservative k-hop cache invalidation. Empirically, MetaDOAR attains higher player payoffs than SOTA baselines on large network topologies, without significant scaling issues in terms of memory usage or training time. This contribution provide a practical, theoretically motivated path to efficient hierarchical policy learning for large-scale networked decision problems.
Problem

Research questions and friction points this paper is trying to address.

network security games
scalable multi-agent reinforcement learning
large cyber-network environments
simulation-based decision making
hierarchical policy learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

MetaDOAR
partition-aware filtering
Q-value caching
scalable multi-agent reinforcement learning
hierarchical policy learning
πŸ”Ž Similar Papers
No similar papers found.