🤖 AI Summary
This work addresses the challenge of applying traditional Retrieval-Augmented Generation (RAG) in scenarios where relevant knowledge is fragmented across multiple private data silos. We propose the first RAG system that integrates confidential computing with federated retrieval: each participant performs local retrieval, while a central server—operating within a remotely attested Trusted Execution Environment (TEE)—securely aggregates the results and generates the final response. To enhance generation quality without compromising privacy, we introduce a novel cascaded inference mechanism that safely incorporates non-confidential third-party large language models. Implemented on the Flower framework, our system ensures end-to-end confidentiality against honest-but-curious or compromised servers while significantly improving the accuracy and practical utility of generated answers.
📝 Abstract
RAG typically assumes centralized access to documents, which breaks down when knowledge is distributed across private data silos. We propose a secure Federated RAG system built using Flower that performs local silo retrieval, while server-side aggregation and text generation run inside an attested, confidential compute environment, enabling confidential remote LLM inference even in the presence of honest-but-curious or compromised servers. We also propose a cascading inference approach that incorporates a non-confidential third-party model (e.g., Amazon Nova) as auxiliary context without weakening confidentiality.