xOffense: An AI-driven autonomous penetration testing framework with offensive knowledge-enhanced LLMs and multi agent systems

📅 2025-09-16

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

To address the scalability limitations of manual penetration testing, this paper proposes a knowledge-enhanced multi-agent reinforcement learning framework that integrates the Qwen3-32B large language model—fine-tuned on chain-of-thought penetration testing data—with a tool-aware multi-agent coordination mechanism. The framework enables autonomous reasoning and automated execution across reconnaissance, scanning, and exploitation phases through fine-grained task decomposition and closed-loop workflow orchestration. Its key innovation lies in deeply embedding domain-specific cybersecurity knowledge into the multi-agent decision-making pipeline, thereby significantly improving reproducibility and extensibility. Evaluated on AutoPenBench and AI-Pentest-Benchmark, the framework achieves a subtask completion rate of 79.17%, outperforming state-of-the-art baselines including VulnBot and PentestGPT.

Technology Category

Application Category

📝 Abstract

This work introduces xOffense, an AI-driven, multi-agent penetration testing framework that shifts the process from labor-intensive, expert-driven manual efforts to fully automated, machine-executable workflows capable of scaling seamlessly with computational infrastructure. At its core, xOffense leverages a fine-tuned, mid-scale open-source LLM (Qwen3-32B) to drive reasoning and decision-making in penetration testing. The framework assigns specialized agents to reconnaissance, vulnerability scanning, and exploitation, with an orchestration layer ensuring seamless coordination across phases. Fine-tuning on Chain-of-Thought penetration testing data further enables the model to generate precise tool commands and perform consistent multi-step reasoning. We evaluate xOffense on two rigorous benchmarks: AutoPenBench and AI-Pentest-Benchmark. The results demonstrate that xOffense consistently outperforms contemporary methods, achieving a sub-task completion rate of 79.17%, decisively surpassing leading systems such as VulnBot and PentestGPT. These findings highlight the potential of domain-adapted mid-scale LLMs, when embedded within structured multi-agent orchestration, to deliver superior, cost-efficient, and reproducible solutions for autonomous penetration testing.

Problem

Research questions and friction points this paper is trying to address.

Automating penetration testing to replace manual expert efforts

Enhancing LLMs for precise vulnerability scanning and exploitation

Coordinating multi-agent systems for scalable security assessment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent system for automated penetration testing

Fine-tuned LLM for security reasoning

Orchestration layer coordinating specialized agents

🔎 Similar Papers

No similar papers found.