CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios

📅 2026-05-08
📈 Citations: 0
Influential: 0
📄 PDF

career value

238K/year
🤖 AI Summary
This work investigates the pronounced attack selection bias exhibited by large language models (LLMs) when deployed as autonomous agents in offensive cybersecurity, revealing a tendency to over-concentrate on specific attack families while neglecting other viable strategies. The authors introduce CyBiasBench, a benchmark comprising 630 dialogue sessions across three targets, four prompting conditions, and ten attack categories, to systematically evaluate the behavioral distributions of five prominent LLM agents. They formally characterize the intrinsic bias of LLMs in cyber attacks and identify a “bias momentum” phenomenon—where forced redirection toward conflicting attack families fails to enhance effectiveness. Results demonstrate that each agent exhibits a stable yet heterogeneous dominant attack family and distributional entropy, with no direct correlation observed between bias and attack success rate. The complete toolchain and an interactive dashboard are publicly released.
📝 Abstract
Large language models (LLMs) are increasingly deployed as autonomous agents in offensive cybersecurity. In this paper, we reveal an interesting phenomenon: different agents exhibit distinct attack patterns. Specifically, each agent exhibits an attack-selection bias, disproportionately concentrating its efforts on a narrow subset of attack families regardless of prompt variations. To systematically quantify this behavior, we introduce CyBiasBench, a comprehensive 630-session benchmark that evaluates five agents on three targets and four prompt conditions with ten attack families. We identify explicit bias across agents, with different dominant attack families and varying entropy levels in their attack-family allocation distributions. Such bias is better characterized as a trait of the agents, rather than a factor associated with the attack success rate. Furthermore, our experiments reveal a bias momentum effect, where agents resist explicit steering toward attack families that conflict with their bias. This forced distribution shift does not yield measurable improvements in attack performance. To ensure reproducibility and facilitate future research, we release an interactive result dashboard at https://trustworthyai.co.kr/CyBiasBench/ and a reproducibility artifact with aggregated session-level statistics and full evaluation scripts at https://github.com/Harry24k/CyBiasBench.
Problem

Research questions and friction points this paper is trying to address.

LLM agents
attack-selection bias
cybersecurity
bias characterization
autonomous agents
Innovation

Methods, ideas, or system contributions that make the work stand out.

attack-selection bias
CyBiasBench
LLM agents
bias momentum effect
cybersecurity benchmarking
T
Taein Lim
Chung-Ang University
S
Seongyong Ju
Chung-Ang University
M
Munhyeok Kim
Chung-Ang University
H
Hyunjun Kim
Myongji University
Hoki Kim
Hoki Kim
Professor at Industrial Security, Chung-Ang University
Artificial IntelligenceTrustworthy AIAdversarial RobustnessGeneralizationExplainable AI