Secure Multifaceted-RAG for Enterprise: Hybrid Knowledge Retrieval with Security Filtering

📅 2025-04-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Enterprise RAG systems face dual challenges: insufficient coverage of internal domain knowledge and risks of sensitive data leakage. To address these, we propose a three-source collaborative retrieval architecture integrating internal documents, a pre-constructed expert knowledge graph, and secure, controllable external LLM-generated knowledge. Our approach introduces a novel dynamic security filtering mechanism that enables on-demand invocation of external LLMs while coordinating with local open-source models (e.g., Llama 3) for response generation—ensuring zero data upload and strict containment of training data. The method combines hybrid retrieval, rule-based and lightweight classifier-guided prompt filtering, and multi-dimensional evaluation (LLM- and human-based). Evaluated on automotive report generation, our system achieves LLM-assessed win rates of 79.3–91.9% and human-assessed win rates of 56.3–70.4%, significantly outperforming conventional RAG baselines.

Technology Category

Application Category

📝 Abstract
Existing Retrieval-Augmented Generation (RAG) systems face challenges in enterprise settings due to limited retrieval scope and data security risks. When relevant internal documents are unavailable, the system struggles to generate accurate and complete responses. Additionally, using closed-source Large Language Models (LLMs) raises concerns about exposing proprietary information. To address these issues, we propose the Secure Multifaceted-RAG (SecMulti-RAG) framework, which retrieves not only from internal documents but also from two supplementary sources: pre-generated expert knowledge for anticipated queries and on-demand external LLM-generated knowledge. To mitigate security risks, we adopt a local open-source generator and selectively utilize external LLMs only when prompts are deemed safe by a filtering mechanism. This approach enhances completeness, prevents data leakage, and reduces costs. In our evaluation on a report generation task in the automotive industry, SecMulti-RAG significantly outperforms traditional RAG - achieving 79.3 to 91.9 percent win rates across correctness, richness, and helpfulness in LLM-based evaluation, and 56.3 to 70.4 percent in human evaluation. This highlights SecMulti-RAG as a practical and secure solution for enterprise RAG.
Problem

Research questions and friction points this paper is trying to address.

Enhances retrieval scope in enterprise RAG systems
Mitigates data security risks with hybrid knowledge sources
Improves response accuracy using local and external knowledge
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid knowledge retrieval from multiple sources
Local open-source LLM with security filtering
Combines internal, expert, and external LLM knowledge
🔎 Similar Papers
No similar papers found.
G
Grace Byun
Emory University
S
Shinsun Lee
Emory University, Hyundai Motor Company
Nayoung Choi
Nayoung Choi
PhD Student @ Emory CS
Natural Language ProcessingInformation Retrieval
J
Jinho Choi
Emory University