MES-RAG: Bringing Multi-modal, Entity-Storage, and Secure Enhancements to RAG

📅 2025-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low precision, response inconsistency, and the absence of proactive security safeguards in Retrieval-Augmented Generation (RAG) for fine-grained entity information retrieval, this paper proposes a multimodal entity-aware RAG framework. Methodologically: (i) an entity-aware retrieval mechanism integrates entity indexing with dynamic re-ranking to enhance retrieval accuracy; (ii) a proactive zero-trust security architecture embeds fine-grained access control prior to data access; and (iii) a lightweight cross-modal alignment encoder and generation adapter ensures consistent, real-time responses across text, image, and audio-video modalities. Experiments on targeted question-answering tasks demonstrate an accuracy of 0.83 (+0.25 absolute improvement) and significantly improved recall. The source code and dataset are publicly released.

Technology Category

Application Category

📝 Abstract
Retrieval-Augmented Generation (RAG) improves Large Language Models (LLMs) by using external knowledge, but it struggles with precise entity information retrieval. In this paper, we proposed MES-RAG framework, which enhances entity-specific query handling and provides accurate, secure, and consistent responses. MES-RAG introduces proactive security measures that ensure system integrity by applying protections prior to data access. Additionally, the system supports real-time multi-modal outputs, including text, images, audio, and video, seamlessly integrating into existing RAG architectures. Experimental results demonstrate that MES-RAG significantly improves both accuracy and recall, highlighting its effectiveness in advancing the security and utility of question-answering, increasing accuracy to 0.83 (+0.25) on targeted task. Our code and data are available at https://github.com/wpydcr/MES-RAG.
Problem

Research questions and friction points this paper is trying to address.

Enhances entity-specific query handling in RAG systems.
Introduces proactive security measures for system integrity.
Supports real-time multi-modal outputs for diverse applications.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhances entity-specific query handling
Introduces proactive security measures
Supports real-time multi-modal outputs
🔎 Similar Papers
No similar papers found.
Pingyu Wu
Pingyu Wu
University of Science and Technology of China
computer vision
Daiheng Gao
Daiheng Gao
DINQ
AIGC
J
Jing Tang
HUST
H
Huimin Chen
Independent Researcher
W
Wenbo Zhou
USTC
W
Weiming Zhang
USTC
N
Neng H. Yu
USTC