VaccineRAG: Boosting Multimodal Large Language Models' Immunity to Harmful RAG Samples

📅 2025-09-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Retrieval inaccuracies in RAG often introduce irrelevant or misleading documents, severely degrading LLM generation quality. To address this, we propose VaccineRAG: the first CoT-based retrieval-augmented generation dataset explicitly modeling adversarial retrieval scenarios with controllable positive-to-negative sample ratios. VaccineRAG features a multi-component output structure integrating CoT prompting with Partial-GRPO optimization to enable fine-grained preference learning. Crucially, it incorporates explicit CoT reasoning to enhance model robustness against noisy retrieval outputs. Experiments demonstrate that VaccineRAG significantly improves LLM resilience to retrieval noise and answer accuracy—achieving an average 12.7% accuracy gain on multimodal RAG benchmarks. The dataset and code will be publicly released.

Technology Category

Application Category

📝 Abstract
Retrieval Augmented Generation enhances the response accuracy of Large Language Models (LLMs) by integrating retrieval and generation modules with external knowledge, demonstrating particular strength in real-time queries and Visual Question Answering tasks. However, the effectiveness of RAG is frequently hindered by the precision of the retriever: many retrieved samples fed into the generation phase are irrelevant or misleading, posing a critical bottleneck to LLMs' performance. To address this challenge, we introduce VaccineRAG, a novel Chain-of-Thought-based retrieval-augmented generation dataset. On one hand, VaccineRAG employs a benchmark to evaluate models using data with varying positive/negative sample ratios, systematically exposing inherent weaknesses in current LLMs. On the other hand, it enhances models' sample-discrimination capabilities by prompting LLMs to generate explicit Chain-of-Thought (CoT) analysis for each sample before producing final answers. Furthermore, to enhance the model's ability to learn long-sequence complex CoT content, we propose Partial-GRPO. By modeling the outputs of LLMs as multiple components rather than a single whole, our model can make more informed preference selections for complex sequences, thereby enhancing its capacity to learn complex CoT. Comprehensive evaluations and ablation studies on VaccineRAG validate the effectiveness of the proposed scheme. The code and dataset will be publicly released soon.
Problem

Research questions and friction points this paper is trying to address.

Addresses harmful irrelevant samples in RAG systems
Enhances discrimination of misleading retrieval samples
Improves learning of complex Chain-of-Thought reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-Thought dataset for RAG evaluation and training
Partial-GRPO method for learning complex reasoning sequences
Explicit CoT analysis per sample to improve discrimination
🔎 Similar Papers
No similar papers found.