$p^2$RAG: Privacy-Preserving RAG Service Supporting Arbitrary Top-$k$ Retrieval

📅 2026-03-15

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the inefficiency of existing privacy-preserving RAG systems, which rely on secure sorting and thus struggle to support arbitrary top-$k$ retrieval, limiting large language models’ ability to leverage larger retrieval sets for improved accuracy. To overcome this, the authors propose a novel privacy-preserving top-$k$ retrieval mechanism that eliminates explicit sorting. Operating under a two-server semi-honest, non-colluding model, the approach integrates secret sharing, interactive binary search, and access verification to ensure both query and data privacy while supporting any $k$ value. Experimental results demonstrate that, for $k$ ranging from 16 to 1024, the proposed method achieves a 3× to 300× speedup over the current state-of-the-art PRAG system, substantially enhancing retrieval efficiency and flexibility at scale.

Technology Category

Application Category

📝 Abstract

Retrieval-Augmented Generation (RAG) enables large language models to use external knowledge, but outsourcing the RAG service raises privacy concerns for both data owners and users. Privacy-preserving RAG systems address these concerns by performing secure top-$k$ retrieval, which typically is secure sorting to identify relevant documents. However, existing systems face challenges supporting arbitrary $k$ due to their inability to change $k$, new security issues, or efficiency degradation with large $k$. This is a significant limitation because modern long-context models generally achieve higher accuracy with larger retrieval sets. We propose $p^2$RAG, a privacy-preserving RAG service that supports arbitrary top-$k$ retrieval. Unlike existing systems, $p^2$RAG avoids sorting candidate documents. Instead, it uses an interactive bisection method to determine the set of top-$k$ documents. For security, $p^2$RAG uses secret sharing on two semi-honest non-colluding servers to protect the data owner's database and the user's prompt. It enforces restrictions and verification to defend against malicious users and tightly bound the information leakage of the database. The experiments show that $p^2$RAG is 3--300$\times$ faster than the state-of-the-art PRAG for $k = 16$--$1024$.

Problem

Research questions and friction points this paper is trying to address.

Privacy-Preserving RAG

Top-k Retrieval

Arbitrary k

Secure Retrieval

Large Language Models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Privacy-Preserving RAG

Arbitrary Top-k Retrieval

Interactive Bisection