ARACNE: An LLM-Based Autonomous Shell Pentesting Agent

📅 2025-02-24

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

Existing LLM-driven penetration testing agents struggle to perform end-to-end autonomous attacks in real shell environments. Method: This paper proposes a multi-LLM collaborative agent architecture specifically designed for SSH services, integrating reasoning, action generation, and feedback-driven dynamic planning to enable fully autonomous command execution, vulnerability identification, and privilege escalation within native Linux shells. Contribution/Results: It is the first work to achieve closed-loop autonomous penetration testing directly in unmodified shell environments, significantly improving action accuracy and task success rates. Empirical evaluation on the ShelLM defensive environment and OverTheWire Bandit CTF demonstrates attack success rates of 60.0% and 57.58%, respectively, with an average of only 4.8 steps required to achieve objectives—validating both effectiveness and practicality.

Technology Category

Application Category

📝 Abstract

We introduce ARACNE, a fully autonomous LLM-based pentesting agent tailored for SSH services that can execute commands on real Linux shell systems. Introduces a new agent architecture with multi-LLM model support. Experiments show that ARACNE can reach a 60% success rate against the autonomous defender ShelLM and a 57.58% success rate against the Over The Wire Bandit CTF challenges, improving over the state-of-the-art. When winning, the average number of actions taken by the agent to accomplish the goals was less than 5. The results show that the use of multi-LLM is a promising approach to increase accuracy in the actions.

Problem

Research questions and friction points this paper is trying to address.

Autonomous SSH pentesting agent

Multi-LLM model architecture

Improving success rates in CTF challenges

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based autonomous pentesting agent

Multi-LLM model support architecture

Improved accuracy in shell commands

🔎 Similar Papers

No similar papers found.