RAGFort: Dual-Path Defense Against Proprietary Knowledge Base Extraction in Retrieval-Augmented Generation

📅 2025-11-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies, for the first time, a dual-path coupling mechanism—comprising intra-class fine-grained extraction and cross-class knowledge diffusion—underlying reconstruction attacks against proprietary knowledge bases in retrieval-augmented generation (RAG) systems. Existing defenses address only one path, leaving systems vulnerable. To bridge this gap, we propose a structure-aware, dual-module defense framework: (1) contrastive re-indexing at the retrieval stage to enforce cross-class semantic isolation, and (2) constraint-cascaded decoding at the generation stage to suppress intra-class leakage of sensitive information. Experiments demonstrate that our method reduces knowledge base reconstruction success rate by over 80%, while preserving answer quality and response latency. It thus achieves a balanced trade-off among security, utility, and robustness—constituting the first systematic defense tailored specifically to dual-path coupled threats in RAG-based knowledge protection.

Technology Category

Application Category

📝 Abstract
Retrieval-Augmented Generation (RAG) systems deployed over proprietary knowledge bases face growing threats from reconstruction attacks that aggregate model responses to replicate knowledge bases. Such attacks exploit both intra-class and inter-class paths, progressively extracting fine-grained knowledge within topics and diffusing it across semantically related ones, thereby enabling comprehensive extraction of the original knowledge base. However, existing defenses target only one path, leaving the other unprotected. We conduct a systematic exploration to assess the impact of protecting each path independently and find that joint protection is essential for effective defense. Based on this, we propose RAGFort, a structure-aware dual-module defense combining"contrastive reindexing"for inter-class isolation and"constrained cascade generation"for intra-class protection. Experiments across security, performance, and robustness confirm that RAGFort significantly reduces reconstruction success while preserving answer quality, offering comprehensive defense against knowledge base extraction attacks.
Problem

Research questions and friction points this paper is trying to address.

Defends against proprietary knowledge base extraction in RAG systems
Addresses dual-path attacks through intra-class and inter-class protection
Prevents comprehensive knowledge reconstruction while maintaining answer quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-module defense combining contrastive reindexing and constrained generation
Contrastive reindexing isolates knowledge across semantic classes
Constrained cascade generation protects fine-grained intra-class knowledge