🤖 AI Summary
This work addresses the vulnerability of Retrieval-Augmented Generation (RAG) systems in cloud environments to sensitive data leakage, where existing privacy-preserving approaches often degrade retrieval quality through noise injection or partial encryption. To overcome this limitation, the authors propose PRAG, an end-to-end privacy-preserving dual-mode architecture that integrates homomorphic-friendly approximate computation, client-assisted interactive retrieval, and an Operation Error Estimation (OEE) mechanism. Operating entirely on encrypted documents and queries, PRAG simultaneously achieves low latency, high retrieval accuracy, and robust resistance against graph reconstruction attacks. Experimental results on large-scale datasets demonstrate recall rates of 72.45%–74.45%, confirming that PRAG delivers practical performance alongside strong security guarantees and establishing the feasibility of high-performance secure RAG systems.
📝 Abstract
Retrieval-Augmented Generation (RAG) is essential for enhancing Large Language Models (LLMs) with external knowledge, but its reliance on cloud environments exposes sensitive data to privacy risks. Existing privacy-preserving solutions often sacrifice retrieval quality due to noise injection or only provide partial encryption. We propose PRAG, an end-to-end privacy-preserving RAG system that achieves end-to-end confidentiality for both documents and queries without sacrificing the scalability of cloud-hosted RAG. PRAG features a dual-mode architecture: a non-interactive PRAG-I utilizes homomorphic-friendly approximations for low-latency retrieval, while an interactive PRAG-II leverages client assistance to match the accuracy of non-private RAG. To ensure robust semantic ordering, we introduce Operation-Error Estimation (OEE), a mechanism that stabilizes ranking against homomorphic noise. Experiments on large-scale datasets demonstrate that PRAG achieves competitive recall (72.45%-74.45%), practical retrieval latency, and strong resilience against graph reconstruction attacks while maintaining end-to-end confidentiality. This work confirms the feasibility of secure, high-performance RAG at scale.