ACDZero: MCTS Agent for Mastering Automated Cyber Defense

📅 2026-01-05
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of automated cyber defense in complex environments, where large state and action spaces coupled with inefficient exploration hinder traditional deep reinforcement learning approaches due to poor sample efficiency. The problem is formalized as a context-dependent partially observable Markov decision process, and a planning-centric policy is proposed that integrates graph neural networks (GNNs) with Monte Carlo tree search (MCTS). Specifically, the method employs GNNs to generate permutation-invariant embeddings of network topologies and leverages graph-editing action priors to guide MCTS toward more efficient exploration. Policy distillation is further utilized to jointly enable model-free generalization and forward-looking planning. Evaluated on diverse scenarios in CAGE-4, the approach significantly outperforms existing reinforcement learning baselines in both defensive reward and robustness.

Technology Category

Application Category

📝 Abstract
Automated cyber defense (ACD) seeks to protect computer networks with minimal or no human intervention, reacting to intrusions by taking corrective actions such as isolating hosts, resetting services, deploying decoys, or updating access controls. However, existing approaches for ACD, such as deep reinforcement learning (RL), often face difficult exploration in complex networks with large decision/state spaces and thus require an expensive amount of samples. Inspired by the need to learn sample-efficient defense policies, we frame ACD in CAGE Challenge 4 (CAGE-4 / CC4) as a context-based partially observable Markov decision problem and propose a planning-centric defense policy based on Monte Carlo Tree Search (MCTS). It explicitly models the exploration-exploitation tradeoff in ACD and uses statistical sampling to guide exploration and decision making. We make novel use of graph neural networks (GNNs) to embed observations from the network as attributed graphs, to enable permutation-invariant reasoning over hosts and their relationships. To make our solution practical in complex search spaces, we guide MCTS with learned graph embeddings and priors over graph-edit actions, combining model-free generalization and policy distillation with look-ahead planning. We evaluate the resulting agent on CC4 scenarios involving diverse network structures and adversary behaviors, and show that our search-guided, graph-embedding-based planning improves defense reward and robustness relative to state-of-the-art RL baselines.
Problem

Research questions and friction points this paper is trying to address.

automated cyber defense
sample efficiency
large decision spaces
complex networks
exploration difficulty
Innovation

Methods, ideas, or system contributions that make the work stand out.

Monte Carlo Tree Search
Graph Neural Networks
Automated Cyber Defense
Graph Embedding
Policy Distillation
🔎 Similar Papers
No similar papers found.
Y
Yu Li
Dept of ECE, George Washington University, Washington, D.C., USA
S
Sizhe Tang
Dept of ECE, George Washington University, Washington, D.C., USA
Rongqian Chen
Rongqian Chen
George Washington University
F
Fei Xu Yu
Dept of ECE, George Washington University, Washington, D.C., USA
G
Guangyu Jiang
Dept of ECE, George Washington University, Washington, D.C., USA
Mahdi Imani
Mahdi Imani
Assistant Professor of Electrical and Computer Engineering, Northeastern University
Machine LearningReinforcement LearningBayesian StatisticsReasoning Under Uncertainty
Nathaniel D. Bastian
Nathaniel D. Bastian
United States Military Academy
artificial intelligenceoperations researchdata sciencesystems engineeringapplied economics
Tian Lan
Tian Lan
George Washington University
Machine LearningOptimizationCyber Security