🤖 AI Summary
Traditional vector retrieval struggles to effectively support complex queries in cyber threat intelligence (CTI) that require multi-hop reasoning across documents. This work systematically evaluates four RAG architectures: standard vector retrieval, graph-based retrieval leveraging a CTI knowledge graph, an agent-based approach with graph query repair capabilities, and a hybrid retrieval strategy integrating both graph and textual information. For the first time in real-world CTI scenarios, it provides a comparative analysis of graph retrieval, agent-augmented, and hybrid methods, revealing the critical role and practical boundaries of graph structures in multi-hop relational reasoning. Experimental results demonstrate that the hybrid graph-text retrieval approach improves answer quality by up to 35% over vector-based RAG on multi-hop questions, significantly outperforms purely text-based methods on structured factual queries, and exhibits greater robustness than pure graph-based retrieval.
📝 Abstract
Cyber threat intelligence (CTI) analysts must answer complex questions over large collections of narrative security reports. Retrieval-augmented generation (RAG) systems help language models access external knowledge, but traditional vector retrieval often struggles with queries that require reasoning over relationships between entities such as threat actors, malware, and vulnerabilities. This limitation arises because relevant evidence is often distributed across multiple text fragments and documents. Knowledge graphs address this challenge by enabling structured multi-hop reasoning through explicit representations of entities and relationships. However, multiple retrieval paradigms, including graph-based, agentic, and hybrid approaches, have emerged with different assumptions and failure modes. It remains unclear how these approaches compare in realistic CTI settings and when graph grounding improves performance. We present a systematic evaluation of four RAG architectures for CTI analysis: standard vector retrieval, graph-based retrieval over a CTI knowledge graph, an agentic variant that repairs failed graph queries, and a hybrid approach combining graph queries with text retrieval. We evaluate these systems on 3,300 CTI question-answer pairs spanning factual lookups, multi-hop relational queries, analyst-style synthesis questions, and unanswerable cases. Results show that graph grounding improves performance on structured factual queries. The hybrid graph-text approach improves answer quality by up to 35 percent on multi-hop questions compared to vector RAG, while maintaining more reliable performance than graph-only systems.