Secure Multifaceted-RAG for Enterprise: Hybrid Knowledge Retrieval with Security Filtering

📅 2025-04-18

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Enterprise RAG systems face dual challenges: insufficient coverage of internal domain knowledge and risks of sensitive data leakage. To address these, we propose a three-source collaborative retrieval architecture integrating internal documents, a pre-constructed expert knowledge graph, and secure, controllable external LLM-generated knowledge. Our approach introduces a novel dynamic security filtering mechanism that enables on-demand invocation of external LLMs while coordinating with local open-source models (e.g., Llama 3) for response generation—ensuring zero data upload and strict containment of training data. The method combines hybrid retrieval, rule-based and lightweight classifier-guided prompt filtering, and multi-dimensional evaluation (LLM- and human-based). Evaluated on automotive report generation, our system achieves LLM-assessed win rates of 79.3–91.9% and human-assessed win rates of 56.3–70.4%, significantly outperforming conventional RAG baselines.

Technology Category

Application Category

📝 Abstract

Existing Retrieval-Augmented Generation (RAG) systems face challenges in enterprise settings due to limited retrieval scope and data security risks. When relevant internal documents are unavailable, the system struggles to generate accurate and complete responses. Additionally, using closed-source Large Language Models (LLMs) raises concerns about exposing proprietary information. To address these issues, we propose the Secure Multifaceted-RAG (SecMulti-RAG) framework, which retrieves not only from internal documents but also from two supplementary sources: pre-generated expert knowledge for anticipated queries and on-demand external LLM-generated knowledge. To mitigate security risks, we adopt a local open-source generator and selectively utilize external LLMs only when prompts are deemed safe by a filtering mechanism. This approach enhances completeness, prevents data leakage, and reduces costs. In our evaluation on a report generation task in the automotive industry, SecMulti-RAG significantly outperforms traditional RAG - achieving 79.3 to 91.9 percent win rates across correctness, richness, and helpfulness in LLM-based evaluation, and 56.3 to 70.4 percent in human evaluation. This highlights SecMulti-RAG as a practical and secure solution for enterprise RAG.

Problem

Research questions and friction points this paper is trying to address.

Enhances retrieval scope in enterprise RAG systems

Mitigates data security risks with hybrid knowledge sources

Improves response accuracy using local and external knowledge

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid knowledge retrieval from multiple sources

Local open-source LLM with security filtering

Combines internal, expert, and external LLM knowledge

🔎 Similar Papers

On the Vulnerability of Applying Retrieval-Augmented Generation within Knowledge-Intensive Application Domains