CyberEvolver: Structured Self-Evolution for Cybersecurity Agents On the Fly

📅 2026-05-25

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

This work addresses the limitations of existing large language model–driven cybersecurity agents, which rely on rigid, handcrafted frameworks and struggle to adapt to diverse attack scenarios and failure modes. To overcome these challenges, the authors propose a self-evolving agent framework featuring a four-layer evolvable architecture, a log-to-diagnosis feedback mechanism, and a population-based beam search strategy. This design effectively tackles issues of structural disorder, sparse feedback, and insufficient behavioral diversity, enabling online self-evolution. Evaluated on capture-the-flag (CTF), exploit development, and penetration testing tasks, the proposed approach achieves an average success rate improvement of 13.6%, significantly outperforming six manually designed agent baselines and two cross-domain self-improvement methods.

📝 Abstract

LLM-based agents are increasingly used for cybersecurity tasks, but most existing systems rely on fixed, human-designed scaffolds that struggle to adapt across diverse targets and failure modes. We introduce \textsc{CyberEvolver}, a self-evolving cybersecurity agent framework that iteratively revises its own scaffold based on experience from failed execution attempts. Self-evolution in cybersecurity is challenging because the space of possible scaffold changes is largely unstructured, execution feedback is sparse and often obscured by the environment, and low-diversity updates can cause errors to compound over repeated iterations. \textsc{CyberEvolver} addresses these challenges with a four-layer evolvable agent architecture that decomposes scaffold optimization into structured components, a trace-to-diagnosis mechanism that converts noisy execution logs into actionable revision signals, and a population-based beam search strategy that preserves diverse agent variants during evolution. We evaluate \textsc{CyberEvolver} on CTF challenges, vulnerability exploitation, and penetration-testing tasks using four open-source LLMs. Across these settings, \textsc{CyberEvolver} improves the seed agent's success rate by $13.6$\,\% on average, and outperforms six human-designed cybersecurity agents as well as two self-improvement methods adapted from other domains. These results suggest that scaffold self-evolution is a promising direction for building adaptive LLM agents for security testing.

Problem

Research questions and friction points this paper is trying to address.

cybersecurity agents

self-evolution

scaffold adaptation

LLM-based agents

adaptive security testing

Innovation

Methods, ideas, or system contributions that make the work stand out.

self-evolution

structured scaffold

trace-to-diagnosis