Towards Cybersecurity SuperIntelligence (CSI): What's the best harness for cybersecurity?

📅 2026-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current AI systems for cybersecurity are constrained by isolated, incompatible execution frameworks, hindering optimal performance across diverse security tasks. This work proposes CSI—the first unified meta-framework based on a blackboard architecture—that enables heterogeneous large language model (LLM)-driven security agents (e.g., Claude, Codex, GCAI) to collaborate in parallel and share intermediate results. Through a unified orchestration layer and the benchmarking platform cybench, CSI solves 19 out of 33 security challenges (57.6%), achieving a 27% higher success rate than the best single-agent framework, reducing execution time by 25%, and maintaining comparable computational cost. These results demonstrate that multi-agent collaboration consistently outperforms any individual state-of-the-art framework in complex cybersecurity scenarios.
📝 Abstract
What is the best harness for cybersecurity AI? Cybersecurity systems are converging on a single execution scaffold per agent, an iterative shell loop driven by a Large Language Model (LLM). However, scaffolds are not interchangeable, rarely interoperable, and no single scaffold dominates across all challenge types. In our path towards researching Cybersecurity SuperIntelligence (CSI), we present a meta-scaffold that unifies heterogeneous agent harnesses under a common orchestration layer, enabling any LLM-driven scaffold to be deployed, benchmarked, and composed within the same infrastructure. Using CSI, we benchmark five scaffolds (CSI::Claude, CSI::Codex, CSI::GCAI, CSI::Mistral, CSI::CAI) on the 33 cybench challenges, holding the model fixed at alias2-mini. The best individual scaffolds solve 15/33 (45.5%); the four-scaffold union solves 17/33 (51.5%), with the fifth (CSI::Mistral, 10/33) contributing one exclusive solve. We find that no single scaffold is the best harness: it is the combination of structurally heterogeneous scaffolds that yields the highest coverage. We validate this through CSI's blackboard-based multi-agent architecture, in which scaffold-specialised agents run in parallel and exchange intermediate findings via a shared substrate (a blackboard). The blackboard solves 19/33 (57.6%), a 27% relative gain over CSI::Claude, one of the best individual scaffolds (15/33, 45.5%), 25% faster (20.2 h vs. 26.8 h), at comparable cost ($5,480 vs. $5,122).
Problem

Research questions and friction points this paper is trying to address.

cybersecurity
AI scaffold
heterogeneous agents
execution framework
Cybersecurity SuperIntelligence
Innovation

Methods, ideas, or system contributions that make the work stand out.

meta-scaffold
blackboard architecture
heterogeneous agent harnesses
Cybersecurity SuperIntelligence
LLM-driven cybersecurity
🔎 Similar Papers
No similar papers found.
V
Víctor Mayoral-Vilches
Alias Robotics, Vitoria-Gasteiz, Álava, Spain
F
Francesco Balassone
Alias Robotics, Vitoria-Gasteiz, Álava, Spain
M
María Sanz-Gómez
Alias Robotics, Vitoria-Gasteiz, Álava, Spain
P
Paul Zabalegui Landa
Alias Robotics, Vitoria-Gasteiz, Álava, Spain
D
Daniel Sánchez Prieto
Alias Robotics, Vitoria-Gasteiz, Álava, Spain
M
Marina Oteiza Álvarez
Alias Robotics, Vitoria-Gasteiz, Álava, Spain
D
Davide Quarta
Alias Robotics, Vitoria-Gasteiz, Álava, Spain
Martin Pinzger
Martin Pinzger
Professor of Software Engineering, Universität Klagenfurt, Austria
AI for Software EngineeringMining Software RepositoriesProgram AnalysisSoftware Visualization