Towards Cybersecurity SuperIntelligence (CSI): What's the best harness for cybersecurity?

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Current AI systems for cybersecurity are constrained by isolated, incompatible execution frameworks, hindering optimal performance across diverse security tasks. This work proposes CSI—the first unified meta-framework based on a blackboard architecture—that enables heterogeneous large language model (LLM)-driven security agents (e.g., Claude, Codex, GCAI) to collaborate in parallel and share intermediate results. Through a unified orchestration layer and the benchmarking platform cybench, CSI solves 19 out of 33 security challenges (57.6%), achieving a 27% higher success rate than the best single-agent framework, reducing execution time by 25%, and maintaining comparable computational cost. These results demonstrate that multi-agent collaboration consistently outperforms any individual state-of-the-art framework in complex cybersecurity scenarios.

📝 Abstract

What is the best harness for cybersecurity AI? Cybersecurity systems are converging on a single execution scaffold per agent, an iterative shell loop driven by a Large Language Model (LLM). However, scaffolds are not interchangeable, rarely interoperable, and no single scaffold dominates across all challenge types. In our path towards researching Cybersecurity SuperIntelligence (CSI), we present a meta-scaffold that unifies heterogeneous agent harnesses under a common orchestration layer, enabling any LLM-driven scaffold to be deployed, benchmarked, and composed within the same infrastructure. Using CSI, we benchmark five scaffolds (CSI::Claude, CSI::Codex, CSI::GCAI, CSI::Mistral, CSI::CAI) on the 33 cybench challenges, holding the model fixed at alias2-mini. The best individual scaffolds solve 15/33 (45.5%); the four-scaffold union solves 17/33 (51.5%), with the fifth (CSI::Mistral, 10/33) contributing one exclusive solve. We find that no single scaffold is the best harness: it is the combination of structurally heterogeneous scaffolds that yields the highest coverage. We validate this through CSI's blackboard-based multi-agent architecture, in which scaffold-specialised agents run in parallel and exchange intermediate findings via a shared substrate (a blackboard). The blackboard solves 19/33 (57.6%), a 27% relative gain over CSI::Claude, one of the best individual scaffolds (15/33, 45.5%), 25% faster (20.2 h vs. 26.8 h), at comparable cost ($5,480 vs. $5,122).

Problem

Research questions and friction points this paper is trying to address.

cybersecurity

AI scaffold

heterogeneous agents

execution framework

Cybersecurity SuperIntelligence

Innovation

Methods, ideas, or system contributions that make the work stand out.

meta-scaffold

blackboard architecture

heterogeneous agent harnesses