🤖 AI Summary
This study addresses the significant degradation in generalization performance of autonomous cyber-attack agents when confronted with previously unseen IP address reassignments. Focusing on IP space variation—a minimal yet critical form of distributional shift—the work presents the first systematic evaluation within the NetSecGame environment, training agents on five IP variants and testing on a sixth. It assesses the generalization capabilities of reinforcement learning, meta-learning, and large language model (LLM)-driven agents, complemented by behavioral analysis and explainable AI techniques to uncover failure modes. Experimental results show that pretrained LLM-based agents achieve the highest success rates but suffer from high inference overhead, low transparency, and frequent invalid actions. While some adaptive methods exhibit limited transferability, their performance still degrades substantially under IP space shifts.
📝 Abstract
Autonomous offensive agents often fail to transfer beyond the networks on which they are trained. We isolate a minimal but fundamental shift -- unseen host/subnet IP reassignment in an otherwise fixed enterprise scenario -- and evaluate attacker generalization in the NetSecGame environment. Agents are trained on five IP-range variants and tested on a sixth unseen variant; only the meta-learning agent may adapt at test time. We compare three agent families (traditional RL, adaptation agents, and LLM-based agents) and use action-distribution-based behavioral/XAI analyses to localize failure modes. Some adaptation methods show partial transfer but significant degradation under unseen reassignment, indicating that even address-space changes can break long-horizon attack policies. Under our evaluation protocol and agent-specific assumptions, prompt-driven pretrained LLM agents achieve the highest success on the held-out reassignment, but at the cost of increased inference-time compute, reduced transparency, and practical failure modes such as repetition/invalid-action loops.