🤖 AI Summary
This work addresses privacy risks in retrieval-augmented generation (RAG) systems, where outputs may inadvertently leak the existence of sensitive documents. To overcome limitations of existing membership inference attacks—such as reliance on rigid templates, high query costs, or susceptibility to detection—the authors propose MEntA, a novel attack method grounded in natural language inference. MEntA employs an information-maximizing query strategy that requires only five non-templated queries to effectively infer whether a target document resides in the retrieval corpus, without needing a proxy model or assuming specific defense mechanisms. Evaluated on NFCorpus, SCIDOCS, and TREC-COVID benchmarks, MEntA achieves up to 0.991 AUC, outperforming prior methods by 0.20–0.50, reduces query cost by up to 65×, and successfully evades state-of-the-art RAG defenses.
📝 Abstract
Retrieval-augmented generation (RAG) has become central to large language model (LLM) deployments, grounding responses in enterprise or proprietary data to reduce hallucinations. However, this design introduces a new privacy risk: model outputs may signal the presence of specific documents in the retrieval corpus, enabling membership inference attacks (MIAs) that leak sensitive information. Existing MIAs are feasible, but they often rely on easily detected templated queries or require many non-templated yet costly and repetitive queries, limiting practicality. We ask: Can an adversary launch a limited-budget, surrogate-free, stealthy, and defense-agnostic membership inference attack using non-templated queries? We present MEntA (Membership Entailment Attack), a query-efficient MIA that leverages natural-language entailment to maximize information gained per query. By asking low-cost, broad, information-seeking questions and measuring entailment between model responses and candidate documents, MEntA eliminates the need for costly shadow models and large query budgets. Across NFCorpus, SCIDOCS, and TREC-COVID, MEntA achieves up to 0.991 AUC with only 5 queries, outperforming prior methods by 0.20 to 0.50 AUC under equivalent conditions. It remains effective under state-of-the-art (SOTA) RAG defenses, while current detectors either miss MEntA or flag benign queries at high rates. Regarding cost, MEntA reduces total attack cost by up to 65 $\times$ lower compared to SOTA attacks under the same attack setting. Our findings expose the feasibility of realistic, low-cost privacy leakage in RAG systems and highlight the urgent need for privacy-aware retrieval and defense mechanisms.