🤖 AI Summary
This study addresses the challenge of simultaneously achieving security, accuracy, and efficiency in cross-organizational Retrieval-Augmented Generation (RAG) systems, where existing encrypted approaches require plaintext exposure upon decryption and federated methods suffer from fragmented resources and high overhead. The authors propose a secure retrieval framework grounded in a vector-space linguistic paradigm, introducing a novel query-centric multi-stage vector transformation mechanism (vector2Trans). This approach constructs a near-orthogonal semantic space and enables cross-organizational semantic alignment, facilitating efficient ciphertext retrieval without decryption. Experimental results across eight retrievers, three datasets, and three large language models demonstrate that the method incurs only a 3.5% drop in nDCG@10 while achieving a 99.81% isolation rate and an 89.90° inter-space angle, significantly outperforming homomorphic encryption schemes in computational efficiency.
📝 Abstract
Retrieval Augmented Generation (RAG) systems deployed across organizational boundaries face fundamental tensions between security, accuracy, and efficiency. Current encryption methods expose plaintext during decryption, while federated architectures prevent resource integration and incur substantial overhead. We introduce Trans-RAG, implementing a novel vector space language paradigm where each organization's knowledge exists in a mathematically isolated semantic space. At the core lies vector2Trans, a multi-stage transformation technique that enables queries to dynamically"speak"each organization's vector space"language"through query-centric transformations, eliminating decryption overhead while maintaining native retrieval efficiency. Security evaluations demonstrate near-orthogonal vector spaces with 89.90{\deg} angular separation and 99.81% isolation rates. Experiments across 8 retrievers, 3 datasets, and 3 LLMs show minimal accuracy degradation (3.5% decrease in nDCG@10) and significant efficiency improvements over homomorphic encryption.