SD-RAG: A Prompt-Injection-Resilient Framework for Selective Disclosure in Retrieval-Augmented Generation

📅 2026-01-16

📈 Citations: 1

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work addresses the vulnerability of existing Retrieval-Augmented Generation (RAG) systems to sensitive information leakage during retrieval and their susceptibility to prompt injection attacks due to reliance on prompt-based privacy mechanisms. To mitigate these issues, the authors propose SD-RAG, a novel framework that shifts security and privacy controls to the retrieval stage, applying semantic-level filtering and selective disclosure before data reaches the large language model. SD-RAG innovatively decouples policy enforcement from text generation by introducing dynamically interpretable security policies, a semantics-driven policy parsing mechanism, and a fine-grained, policy-aware graph-based retrieval model. Experimental results demonstrate that SD-RAG improves privacy protection scores by up to 58% over baseline methods while significantly enhancing robustness against prompt injection attacks.

Technology Category

Application Category

📝 Abstract

Retrieval-Augmented Generation (RAG) has attracted significant attention due to its ability to combine the generative capabilities of Large Language Models (LLMs) with knowledge obtained through efficient retrieval mechanisms over large-scale data collections. Currently, the majority of existing approaches overlook the risks associated with exposing sensitive or access-controlled information directly to the generation model. Only a few approaches propose techniques to instruct the generative model to refrain from disclosing sensitive information; however, recent studies have also demonstrated that LLMs remain vulnerable to prompt injection attacks that can override intended behavioral constraints. For these reasons, we propose a novel approach to Selective Disclosure in Retrieval-Augmented Generation, called SD-RAG, which decouples the enforcement of security and privacy constraints from the generation process itself. Rather than relying on prompt-level safeguards, SD-RAG applies sanitization and disclosure controls during the retrieval phase, prior to augmenting the language model's input. Moreover, we introduce a semantic mechanism to allow the ingestion of human-readable dynamic security and privacy constraints together with an optimized graph-based data model that supports fine-grained, policy-aware retrieval. Our experimental evaluation demonstrates the superiority of SD-RAG over baseline existing approaches, achieving up to a $58\%$ improvement in the privacy score, while also showing a strong resilience to prompt injection attacks targeting the generative model.

Problem

Research questions and friction points this paper is trying to address.

Retrieval-Augmented Generation

prompt injection

selective disclosure

privacy constraints

sensitive information

Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective Disclosure

Prompt Injection Resilience

Policy-Aware Retrieval