🤖 AI Summary
This study addresses the challenge small organizations face in conducting cybersecurity risk assessments due to cost and complexity constraints. The authors propose a persistent context–enabled multi-agent collaborative architecture that overcomes the limitations of traditional sequential workflows by deeply integrating six interdependent stages: organizational profiling, asset mapping, threat analysis, control evaluation, risk scoring, and recommendation generation—enabling knowledge inheritance across phases. The system leverages both a general-purpose large language model (Mistral-7B) and a domain-specific fine-tuned variant, operating efficiently on modest hardware. Evaluated on a real-world healthcare organization, the framework completed a comprehensive assessment within 15 minutes, covered 92% of known risks, and achieved 85% agreement with three CISSP-certified experts on risk severity ratings, with fine-tuning significantly enhancing detection of industry-specific threats.
📝 Abstract
Getting a real cybersecurity risk assessment for a small organization is expensive -- a NIST CSF-aligned engagement runs $15,000 on the low end, takes weeks, and depends on practitioners who are genuinely scarce. Most small companies skip it entirely. We built a six-agent AI system where each agent handles one analytical stage: profiling the organization, mapping assets, analyzing threats, evaluating controls, scoring risks, and generating recommendations. Agents share a persistent context that grows as the assessment proceeds, so later agents build on what earlier ones concluded -- the mechanism that distinguishes this from standard sequential agent pipelines. We tested it on a 15-person HIPAA-covered healthcare company and compared outputs to independent assessments by three CISSP practitioners -- the system agreed with them 85% of the time on severity classifications, covered 92% of identified risks, and finished in under 15 minutes. We then ran 30 repeated single-agent assessments across five synthetic but sector-realistic organizational profiles in healthcare, fintech, manufacturing, retail, and SaaS, comparing a general-purpose Mistral-7B against a domain fine-tuned model. Both completed every run. The fine-tuned model flagged threats the baseline could not see at all: PHI exposure in healthcare, OT/IIoT vulnerabilities in manufacturing, platform-specific risks in retail. The full multi-agent pipeline, however, failed every one of 30 attempts on a Tesla T4 with its 4,096-token default context window -- context capacity, not model quality, turned out to be the binding constraint.