SoK: DARPA's AI Cyber Challenge (AIxCC): Competition Design, Architectures, and Lessons Learned

πŸ“… 2026-02-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the development of autonomous Cyber Reasoning Systems (CRS) capable of discovering and patching real-world open-source software vulnerabilities, offering a comprehensive analysis of the DARPA AIxCC Challenge (2023–2025). Through a systematic review integrating competition documentation, source code, execution traces, and team interviews, this work presents the first holistic examination of large language model–based autonomous CRS. It identifies key factors influencing system performance, delineates current capabilities and limitations, and highlights critical technical breakthroughs. Furthermore, the study distills actionable insights regarding challenge design, reproducible methodologies, and deployment challenges, providing practical guidance for advancing research and real-world application of autonomous vulnerability repair systems.

Technology Category

Application Category

πŸ“ Abstract
DARPA's AI Cyber Challenge (AIxCC, 2023--2025) is the largest competition to date for building fully autonomous cyber reasoning systems (CRSs) that leverage recent advances in AI -- particularly large language models (LLMs) -- to discover and remediate vulnerabilities in real-world open-source software. This paper presents the first systematic analysis of AIxCC. Drawing on design documents, source code, execution traces, and discussions with organizers and competing teams, we examine the competition's structure and key design decisions, characterize the architectural approaches of finalist CRSs, and analyze competition results beyond the final scoreboard. Our analysis reveals the factors that truly drove CRS performance, identifies genuine technical advances achieved by teams, and exposes limitations that remain open for future research. We conclude with lessons for organizing future competitions and broader insights toward deploying autonomous CRSs in practice.
Problem

Research questions and friction points this paper is trying to address.

autonomous cyber reasoning systems
AI cybersecurity
vulnerability remediation
large language models
AI competition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cyber Reasoning Systems
Large Language Models
Autonomous Vulnerability Remediation
AI Cybersecurity Competition
LLM-based CRS Architecture
πŸ”Ž Similar Papers
No similar papers found.
Cen Zhang
Cen Zhang
Research Fellow of Nanyang Technological University
FuzzingTestingVulnerability
Y
Younggi Park
Independent Researcher
F
Fabian Fleischer
Georgia Institute of Technology
Yu-Fu Fu
Yu-Fu Fu
Georgia Institute of Technology
Program AnalysisSoftware VerificationSoftware SecuritySoftware Engineering
Jiho Kim
Jiho Kim
Ph.d student, KAIST
Computer Architecture
Dongkwan Kim
Dongkwan Kim
Texas A&M University
Graph Neural NetworkLarge Language Model
Y
Youngjoon Kim
Georgia Institute of Technology
Q
Qingxiao Xu
Texas A&M University
A
Andrew Chin
Georgia Institute of Technology
Ze Sheng
Ze Sheng
PhD Student in Computer Science @ Texas A&M University
Large Language ModelMachine LearningCybersecurity
Hanqing Zhao
Hanqing Zhao
Research Fellow, Nanyang Technological University
Computer VisionDeep Learning
B
Brian J. Lee
Georgia Institute of Technology
J
Joshua Wang
Georgia Institute of Technology
M
Michael Pelican
Smart Information Flow Technologies (SIFT)
D
David J. Musliner
Smart Information Flow Technologies (SIFT)
Jeff Huang
Jeff Huang
Professor, Computer Science and Engineering
Edge AIBlockchain SecurityLanguage-based SecurityHigh Performance Computing
J
Jon Silliman
Kudu Dynamics
M
Mikel Mcdaniel
Kudu Dynamics
J
Jefferson Casavant
Kudu Dynamics
I
Isaac Goldthwaite
Kudu Dynamics
N
Nicholas Vidovich
Kudu Dynamics
M
Matthew Lehman
Kudu Dynamics
Taesoo Kim
Taesoo Kim
Georgia Institute of Technology
SecurityOperating SystemSystems