xOffense: An AI-driven autonomous penetration testing framework with offensive knowledge-enhanced LLMs and multi agent systems

📅 2025-09-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scalability limitations of manual penetration testing, this paper proposes a knowledge-enhanced multi-agent reinforcement learning framework that integrates the Qwen3-32B large language model—fine-tuned on chain-of-thought penetration testing data—with a tool-aware multi-agent coordination mechanism. The framework enables autonomous reasoning and automated execution across reconnaissance, scanning, and exploitation phases through fine-grained task decomposition and closed-loop workflow orchestration. Its key innovation lies in deeply embedding domain-specific cybersecurity knowledge into the multi-agent decision-making pipeline, thereby significantly improving reproducibility and extensibility. Evaluated on AutoPenBench and AI-Pentest-Benchmark, the framework achieves a subtask completion rate of 79.17%, outperforming state-of-the-art baselines including VulnBot and PentestGPT.

Technology Category

Application Category

📝 Abstract
This work introduces xOffense, an AI-driven, multi-agent penetration testing framework that shifts the process from labor-intensive, expert-driven manual efforts to fully automated, machine-executable workflows capable of scaling seamlessly with computational infrastructure. At its core, xOffense leverages a fine-tuned, mid-scale open-source LLM (Qwen3-32B) to drive reasoning and decision-making in penetration testing. The framework assigns specialized agents to reconnaissance, vulnerability scanning, and exploitation, with an orchestration layer ensuring seamless coordination across phases. Fine-tuning on Chain-of-Thought penetration testing data further enables the model to generate precise tool commands and perform consistent multi-step reasoning. We evaluate xOffense on two rigorous benchmarks: AutoPenBench and AI-Pentest-Benchmark. The results demonstrate that xOffense consistently outperforms contemporary methods, achieving a sub-task completion rate of 79.17%, decisively surpassing leading systems such as VulnBot and PentestGPT. These findings highlight the potential of domain-adapted mid-scale LLMs, when embedded within structured multi-agent orchestration, to deliver superior, cost-efficient, and reproducible solutions for autonomous penetration testing.
Problem

Research questions and friction points this paper is trying to address.

Automating penetration testing to replace manual expert efforts
Enhancing LLMs for precise vulnerability scanning and exploitation
Coordinating multi-agent systems for scalable security assessment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent system for automated penetration testing
Fine-tuned LLM for security reasoning
Orchestration layer coordinating specialized agents
🔎 Similar Papers
No similar papers found.
P
Phung Duc Luong
Information Security Lab, University of Information Technology, Ho Chi Minh City, Vietnam
L
Le Tran Gia Bao
Information Security Lab, University of Information Technology, Ho Chi Minh City, Vietnam
N
Nguyen Vu Khai Tam
Information Security Lab, University of Information Technology, Ho Chi Minh City, Vietnam
D
Dong Huu Nguyen Khoa
Information Security Lab, University of Information Technology, Ho Chi Minh City, Vietnam
N
Nguyen Huu Quyen
Information Security Lab, University of Information Technology, Ho Chi Minh City, Vietnam
Van-Hau Pham
Van-Hau Pham
Lecturer of Information Security, University of Information Technology - VNU
Network & application securityAI for securitysecurity of AIblockchaincloud computing
Phan The Duy
Phan The Duy
University of Information Technology, VNU-HCM, Ho Chi Minh city
Cybersecurityblockchainmachine learningsoftware securitymalware detection