RAGRank: Using PageRank to Counter Poisoning in CTI LLM Pipelines

📅 2025-10-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Retrieval-augmented generation (RAG) systems for cyber threat intelligence (CTI) are vulnerable to data poisoning attacks—particularly because emerging threats exhibit semantic novelty, and adversaries can faithfully mimic legitimate formatting and terminology, thereby evading conventional defenses. To address this, we propose a robustness-enhancing method grounded in source credibility ranking. Innovatively, we adapt a PageRank-style algorithm—introduced here for the first time—to model CTI document authority via a graph-based representation that quantifies source trustworthiness, enabling effective discrimination between poisoned content and authentic intelligence. Integrated into the RAG retrieval front-end, our approach is evaluated on MS MARCO and real-world CTI data streams: malicious documents exhibit a 37.2% average reduction in authority scores, while top-5 recall for trustworthy intelligence improves by 21.8%, demonstrating substantial resilience against format-simulating poisoning attacks.

Technology Category

Application Category

📝 Abstract
Retrieval-Augmented Generation (RAG) has emerged as the dominant architectural pattern to operationalize Large Language Model (LLM) usage in Cyber Threat Intelligence (CTI) systems. However, this design is susceptible to poisoning attacks, and previously proposed defenses can fail for CTI contexts as cyber threat information is often completely new for emerging attacks, and sophisticated threat actors can mimic legitimate formats, terminology, and stylistic conventions. To address this issue, we propose that the robustness of modern RAG defenses can be accelerated by applying source credibility algorithms on corpora, using PageRank as an example. In our experiments, we demonstrate quantitatively that our algorithm applies a lower authority score to malicious documents while promoting trusted content, using the standardized MS MARCO dataset. We also demonstrate proof-of-concept performance of our algorithm on CTI documents and feeds.
Problem

Research questions and friction points this paper is trying to address.

Addressing poisoning attacks in CTI RAG systems using PageRank
Detecting malicious documents by assigning lower authority scores
Improving robustness of cyber threat intelligence LLM pipelines
Innovation

Methods, ideas, or system contributions that make the work stand out.

Applies PageRank algorithm to CTI RAG pipelines
Ranks source credibility to counter poisoning attacks
Reduces authority scores for malicious threat documents
🔎 Similar Papers
No similar papers found.
A
Austin Jia
Applied Research Laboratories, The University of Texas at Austin, Texas, USA
A
Avaneesh Ramesh
Applied Research Laboratories, The University of Texas at Austin, Texas, USA
Z
Zain Shamsi
Applied Research Laboratories, The University of Texas at Austin, Texas, USA
D
Daniel Zhang
Applied Research Laboratories, The University of Texas at Austin, Texas, USA
Alex Liu
Alex Liu
University of Washington
AI in educationStrategic Teacher EngagementK-12 education policy