Patch-to-PoC: A Systematic Study of Agentic LLM Systems for Linux Kernel N-Day Reproduction

📅 2026-02-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of systematic evaluation of large language model (LLM) agents in automatically reproducing N-Day vulnerabilities in the Linux kernel. We propose K-Repro, an end-to-end agent system that takes security patches as input and integrates controlled code exploration, virtual machine orchestration, and automated debugging to generate exploitable proof-of-concept (PoC) exploits. For the first time, we conduct a large-scale evaluation on 100 real-world vulnerabilities, demonstrating that LLM agents can effectively reproduce complex low-level system software flaws with over 50% success rate, while maintaining practical time and cost efficiency. Furthermore, we provide an in-depth analysis of key factors influencing performance, offering actionable insights for building reliable security automation agents.

Technology Category

Application Category

📝 Abstract
Autonomous large language model (LLM) based systems have recently shown promising results across a range of cybersecurity tasks. However, there is no systematic study on their effectiveness in autonomously reproducing Linux kernel vulnerabilities with concrete proofs-of-concept (PoCs). Owing to the size, complexity, and low-level nature of the Linux kernel, such tasks are widely regarded as particularly challenging for current LLM-based approaches. In this paper, we present the first large-scale study of LLM-based Linux kernel vulnerability reproduction. For this purpose, we develop K-Repro, an LLM-based agentic system equipped with controlled code-browsing, virtual machine management, interaction, and debugging capabilities. Using kernel security patches as input, K-Repro automates end-to-end bug reproduction of N-day vulnerabilities in the Linux kernel. On a dataset of 100 real-world exploitable Linux kernel vulnerabilities collected from KernelCTF, our results show that K-Repro can generate PoCs that reproduce over 50\% of the cases with practical time and monetary cost. Beyond aggregate success rates, we perform an extensive study of effectiveness, efficiency, stability, and impact factors to explain when agentic reproduction succeeds, where it fails, and which components drive performance. These findings provide actionable guidance for building more reliable autonomous security agents and for assessing real-world N-day risk from both offensive and defensive perspectives.
Problem

Research questions and friction points this paper is trying to address.

Linux kernel
vulnerability reproduction
proof-of-concept
N-Day vulnerabilities
autonomous LLM systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic LLM
Linux kernel vulnerability reproduction
Proof-of-Concept (PoC)
K-Repro
N-Day exploitation
🔎 Similar Papers
No similar papers found.
J
Juefei Pu
University of California, Riverside
Xingyu Li
Xingyu Li
UC Riverside
LLMAgentSecurity
H
Haonan Li
University of California, Riverside
Z
Zhengchuan Liang
University of California, Riverside
Jonathan Cox
Jonathan Cox
Hydrologist Caribbean Institute for Meteorology and Hydrology
Y
Yifan Wu
University of California, Riverside
K
Kareem Shehada
University of California, Riverside
A
Arrdya Srivastav
University of California, Riverside
Z
Zhiyun Qian
University of California, Riverside