π€ AI Summary
Large language models (LLMs) exhibit limited autonomy in executing multi-stage cyberattacks, frequently failing at cross-host reconnaissance, lateral movement, and data exfiltration.
Method: We propose Incalmoβa model-agnostic, high-level attack abstraction framework that decouples attack logic from low-level commands via attack graph modeling and environment-state awareness, and integrates LLM-tool co-execution for generalized attack planning on multi-host simulation platforms.
Contribution/Results: Evaluated across ten high-fidelity simulated networks, Incalmo-enabled LLMs achieved successful end-to-end multi-stage attacks in nine cases. Notably, a small-parameter LLM augmented with Incalmo attained 100% success across five environments, whereas equivalent large models without Incalmo achieved zero success under identical conditions. This work introduces the first high-order task abstraction layer specifically designed for multi-stage cyberattacks, substantially enhancing LLM autonomy and robustness in complex cybersecurity tasks.
π Abstract
LLMs have shown preliminary promise in some security tasks and CTF challenges. However, it is unclear whether LLMs are able to realize multistage network attacks, which involve executing a wide variety of actions across multiple hosts such as conducting reconnaissance, exploiting vulnerabilities to gain initial access, leveraging internal hosts to move laterally, and using multiple compromised hosts to exfiltrate data. We evaluate LLMs across 10 multistage networks and find that popular LLMs are unable to realize these attacks. To enable LLMs to realize these attacks, we introduce Incalmo, an LLM-agnostic high-level attack abstraction layer that sits between an LLM and the environment. Rather than LLMs issuing low-level command-line instructions, which can lead to incorrect implementations, Incalmo allows LLMs to specify high-level tasks (e.g., infect a host, scan a network), which are then carried out by Incalmo. Incalmo realizes these tasks by translating them into low-level primitives (e.g., commands to exploit tools). Incalmo also provides an environment state service and an attack graph service to provide structure to LLMs in selecting actions relevant to a multistage attack. Across 9 out of 10 realistic emulated networks (from 25 to 50 hosts), LLMs using Incalmo can successfully autonomously execute multistage attacks. We also conduct an ablation analysis to show the key role the high-level abstractions play. For instance, we find that both Incalmo's high-level tasks and services are crucial. Furthermore, even smaller-parameter LLMs with Incalmo can fully succeed in 5 of 10 environments, while larger-parameter LLMs without Incalmo do not fully succeed in any.