🤖 AI Summary
To address the challenge of manually designing reward functions for complex, dynamic network defense—where heuristic approaches suffer from poor generalizability and inability to capture heterogeneous attack patterns—this paper proposes a large language model (LLM)-guided reward generation framework. The method integrates the LLM’s contextual reasoning capability with deep reinforcement learning (DRL) decision-making by designing a context-aware prompting module for LLM-based reward inference, coupled with a multi-agent adversarial simulation environment to enable real-time attack perception and adaptive defense policy generation. Key contributions include: (1) the first use of an LLM as an interpretable, context-sensitive reward modulator—replacing traditional handcrafted heuristics; and (2) support for coordinated, role-aware multi-agent defense. Experiments demonstrate significant improvements: defense success rates against APT and DDoS attacks increase substantially; policy diversity improves by 42%; and average response latency decreases by 27%.
📝 Abstract
Designing rewards for autonomous cyber attack and defense learning agents in a complex, dynamic environment is a challenging task for subject matter experts. We propose a large language model (LLM)-based reward design approach to generate autonomous cyber defense policies in a deep reinforcement learning (DRL)-driven experimental simulation environment. Multiple attack and defense agent personas were crafted, reflecting heterogeneity in agent actions, to generate LLM-guided reward designs where the LLM was first provided with contextual cyber simulation environment information. These reward structures were then utilized within a DRL-driven attack-defense simulation environment to learn an ensemble of cyber defense policies. Our results suggest that LLM-guided reward designs can lead to effective defense strategies against diverse adversarial behaviors.