Teams of LLM Agents can Exploit Zero-Day Vulnerabilities

📅 2024-06-02

🏛️ arXiv.org

📈 Citations: 23

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Existing LLM-based agents perform well on known vulnerabilities but suffer from insufficient exploration breadth and weak long-horizon planning when confronting real-world zero-day vulnerabilities. To address these limitations, we propose HPTSA, a multi-agent collaborative architecture orchestrated by a planner agent capable of task decomposition and dynamic sub-agent scheduling. HPTSA integrates hierarchical planning, on-demand generation of specialized sub-agents, and systematic exploration of the vulnerability space. To our knowledge, this is the first framework achieving fully automated exploitation of 14 real-world zero-day vulnerabilities. On a zero-day vulnerability benchmark, HPTSA achieves up to a 4.3× improvement in exploitation success rate over prior state-of-the-art methods. The framework significantly enhances the autonomy of LLM agents in discovering and deeply exploiting previously unknown vulnerabilities.

Technology Category

Application Category

📝 Abstract

LLM agents have become increasingly sophisticated, especially in the realm of cybersecurity. Researchers have shown that LLM agents can exploit real-world vulnerabilities when given a description of the vulnerability and toy capture-the-flag problems. However, these agents still perform poorly on real-world vulnerabilities that are unknown to the agent ahead of time (zero-day vulnerabilities). In this work, we show that teams of LLM agents can exploit real-world, zero-day vulnerabilities. Prior agents struggle with exploring many different vulnerabilities and long-range planning when used alone. To resolve this, we introduce HPTSA, a system of agents with a planning agent that can launch subagents. The planning agent explores the system and determines which subagents to call, resolving long-term planning issues when trying different vulnerabilities. We construct a benchmark of 14 real-world vulnerabilities and show that our team of agents improve over prior agent frameworks by up to 4.3X.

Problem

Research questions and friction points this paper is trying to address.

Teams of LLM agents exploit zero-day vulnerabilities effectively

Prior agents struggle with exploring multiple vulnerabilities and planning

HPTSA system improves vulnerability exploitation by 4.3X

Innovation

Methods, ideas, or system contributions that make the work stand out.

Teams of LLM agents exploit zero-day vulnerabilities

HPTSA system uses planning agent and subagents

Improves prior frameworks by up to 4.3X

🔎 Similar Papers

No similar papers found.