STRIATUM-CTF: A Protocol-Driven Agentic Framework for General-Purpose CTF Solving

📅 2026-03-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges large language models face in maintaining reasoning coherence and effective environment interaction during multi-step, stateful cybersecurity offense–defense tasks, particularly due to the lack of evaluation benchmarks that realistically simulate dynamic vulnerability scenarios. To this end, the authors propose a modular agent framework based on the Model Context Protocol (MCP), which standardizes tool interfaces for system introspection, disassembly, and runtime debugging. This design ensures contextual consistency and reliable tool invocation over extended adversarial trajectories, substantially mitigating hallucination issues. The agent autonomously discovered and exploited vulnerabilities in the 2025 year-end collegiate CTF competition, outperforming 21 human teams to claim first place—marking the first demonstration of a general-purpose automated offensive–defensive agent’s effectiveness and adaptability in a real-world competitive setting.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have demonstrated potential in code generation, yet they struggle with the multi-step, stateful reasoning required for offensive cybersecurity operations. Existing research often relies on static benchmarks that fail to capture the dynamic nature of real-world vulnerabilities. In this work, we introduce STRIATUM-CTF (A Search-based Test-time Reasoning Inference Agent for Tactical Utility Maximization in Cybersecurity), a modular agentic framework built upon the Model Context Protocol (MCP). By standardizing tool interfaces for system introspection, decompilation, and runtime debugging, STRIATUM-CTF enables the agent to maintain a coherent context window across extended exploit trajectories. We validate this approach not merely on synthetic datasets, but in a live competitive environment. Our system participated in a university-hosted Capture-the-Flag (CTF) competition in late 2025, where it operated autonomously to identify and exploit vulnerabilities in real-time. STRIATUM-CTF secured First Place, outperforming 21 human teams and demonstrating strong adaptability in a dynamic problem-solving setting. We analyze the agent's decision-making logs to show how MCP-based tool abstraction significantly reduces hallucination compared to naive prompting strategies. These results suggest that standardized context protocols are a critical path toward robust autonomous cyber-reasoning systems.
Problem

Research questions and friction points this paper is trying to address.

cybersecurity
CTF
large language models
stateful reasoning
dynamic vulnerabilities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Model Context Protocol
agentic framework
cybersecurity reasoning
tool abstraction
Capture-the-Flag
J
James Hugglestone
Department of Computer Science, Florida State University, Tallahassee, USA
Samuel Jacob Chacko
Samuel Jacob Chacko
Research Scholar
D
Dawson Stoller
Department of Computer Science, Florida State University, Tallahassee, USA
Ryan Schmidt
Ryan Schmidt
Research Scientist, Autodesk Research
computer graphics
Xiuwen Liu
Xiuwen Liu
Department of Computer Science, Florida State University
Pattern recognitioncomputer visioncyber securityimage analysis