ControlNET: A Firewall for RAG-based LLM System

📅 2025-04-13

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

RAG-enhanced large language models (LLMs) face critical privacy and security risks—including data leakage and poisoning attacks—in sensitive domains such as healthcare and finance; existing work lacks a controllable, end-to-end interception mechanism for the query–response pipeline. To address this, we propose the first AI firewall specifically designed for RAG systems, featuring a novel two-stage control paradigm: “activation shift detection” followed by “semantic divergence mitigation.” Our approach integrates neural activation analysis, semantic similarity modeling, and a lightweight intervention module, and is compatible with mainstream open-source LLMs (e.g., Llama3, Vicuna, Mistral). Evaluated on four standard benchmarks including MSMARCO, it achieves an AUROC of 0.909+, effectively blocking malicious queries and harmful responses while preserving response safety and task performance. This work fills a fundamental gap in end-to-end query-flow governance for RAG systems.

Technology Category

Application Category

📝 Abstract

Retrieval-Augmented Generation (RAG) has significantly enhanced the factual accuracy and domain adaptability of Large Language Models (LLMs). This advancement has enabled their widespread deployment across sensitive domains such as healthcare, finance, and enterprise applications. RAG mitigates hallucinations by integrating external knowledge, yet introduces privacy risk and security risk, notably data breaching risk and data poisoning risk. While recent studies have explored prompt injection and poisoning attacks, there remains a significant gap in comprehensive research on controlling inbound and outbound query flows to mitigate these threats. In this paper, we propose an AI firewall, ControlNET, designed to safeguard RAG-based LLM systems from these vulnerabilities. ControlNET controls query flows by leveraging activation shift phenomena to detect adversarial queries and mitigate their impact through semantic divergence. We conduct comprehensive experiments on four different benchmark datasets including Msmarco, HotpotQA, FinQA, and MedicalSys using state-of-the-art open source LLMs (Llama3, Vicuna, and Mistral). Our results demonstrate that ControlNET achieves over 0.909 AUROC in detecting and mitigating security threats while preserving system harmlessness. Overall, ControlNET offers an effective, robust, harmless defense mechanism, marking a significant advancement toward the secure deployment of RAG-based LLM systems.

Problem

Research questions and friction points this paper is trying to address.

Mitigating privacy and security risks in RAG-based LLM systems

Detecting adversarial queries using activation shift phenomena

Ensuring secure deployment of RAG systems in sensitive domains

Innovation

Methods, ideas, or system contributions that make the work stand out.

AI firewall for RAG-based LLM security

Detects adversarial queries via activation shift

Mitigates threats through semantic divergence

🔎 Similar Papers

No similar papers found.