RAG-targeted Adversarial Attack on LLM-based Threat Detection and Mitigation Framework

📅 2025-11-09

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This work identifies a novel adversarial vulnerability in LLM-driven IoT threat detection frameworks operating under retrieval-augmented generation (RAG) architectures: susceptibility to targeted data poisoning attacks. We propose a word-level, semantics-preserving adversarial perturbation method that performs fine-grained textual modifications on attack description datasets to contaminate the RAG knowledge base and degrade model reasoning. Human expert evaluation quantitatively demonstrates that minimal perturbations significantly impair ChatGPT-5 Thinking’s accuracy and practical utility in attack behavior correlation identification and device-specific mitigation recommendation. To our knowledge, this is the first systematic empirical validation of adversarial fragility in LLM-based network intrusion detection systems (LLM-NIDS) within realistic IoT security settings. Our findings provide critical evidence and actionable insights for developing robust, AI-powered cybersecurity systems resilient to data poisoning in RAG-enabled threat intelligence pipelines.

Technology Category

Application Category

📝 Abstract

The rapid expansion of the Internet of Things (IoT) is reshaping communication and operational practices across industries, but it also broadens the attack surface and increases susceptibility to security breaches. Artificial Intelligence has become a valuable solution in securing IoT networks, with Large Language Models (LLMs) enabling automated attack behavior analysis and mitigation suggestion in Network Intrusion Detection Systems (NIDS). Despite advancements, the use of LLMs in such systems further expands the attack surface, putting entire networks at risk by introducing vulnerabilities such as prompt injection and data poisoning. In this work, we attack an LLM-based IoT attack analysis and mitigation framework to test its adversarial robustness. We construct an attack description dataset and use it in a targeted data poisoning attack that applies word-level, meaning-preserving perturbations to corrupt the Retrieval-Augmented Generation (RAG) knowledge base of the framework. We then compare pre-attack and post-attack mitigation responses from the target model, ChatGPT-5 Thinking, to measure the impact of the attack on model performance, using an established evaluation rubric designed for human experts and judge LLMs. Our results show that small perturbations degrade LLM performance by weakening the linkage between observed network traffic features and attack behavior, and by reducing the specificity and practicality of recommended mitigations for resource-constrained devices.

Problem

Research questions and friction points this paper is trying to address.

Targeted data poisoning attacks compromise RAG knowledge bases in LLM security systems

Word-level perturbations degrade IoT threat detection and mitigation framework performance

Adversarial attacks reduce specificity of security recommendations for resource-constrained devices

Innovation

Methods, ideas, or system contributions that make the work stand out.

Targeted data poisoning attack on RAG knowledge base

Word-level meaning-preserving perturbations applied

Measuring attack impact using expert evaluation rubric

🔎 Similar Papers

Phantom: General Trigger Attacks on Retrieval Augmented Language Generation