NeuroGenPoisoning: Neuron-Guided Attacks on Retrieval-Augmented Generation of LLM via Genetic Optimization of External Knowledge

📅 2025-10-24

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Existing RAG adversarial attacks primarily target the retrieval or prompting stages, overlooking internal neuronal dynamics and the conflict mechanisms between parametric knowledge embedded in the model and external retrieved knowledge. Method: This paper introduces, for the first time, the concept of “poisoned response neurons,” leveraging neuron attribution analysis to identify sensitive units and guiding a genetic algorithm to optimize adversarial external knowledge—enabling precise output manipulation. The method integrates attribution-driven evolution, large-scale knowledge mutation, and reuse to effectively mitigate the suppression of retrieved knowledge by strong parametric knowledge. Contribution/Results: Our approach achieves >90% knowledge coverage success across multiple LLMs and datasets while preserving textual fluency. Empirical results reveal the coverage mechanism and controllability of knowledge conflict in RAG systems.

Technology Category

Application Category

📝 Abstract

Retrieval-Augmented Generation (RAG) empowers Large Language Models (LLMs) to dynamically integrate external knowledge during inference, improving their factual accuracy and adaptability. However, adversaries can inject poisoned external knowledge to override the model's internal memory. While existing attacks iteratively manipulate retrieval content or prompt structure of RAG, they largely ignore the model's internal representation dynamics and neuron-level sensitivities. The underlying mechanism of RAG poisoning has not been fully studied and the effect of knowledge conflict with strong parametric knowledge in RAG is not considered. In this work, we propose NeuroGenPoisoning, a novel attack framework that generates adversarial external knowledge in RAG guided by LLM internal neuron attribution and genetic optimization. Our method first identifies a set of Poison-Responsive Neurons whose activation strongly correlates with contextual poisoning knowledge. We then employ a genetic algorithm to evolve adversarial passages that maximally activate these neurons. Crucially, our framework enables massive-scale generation of effective poisoned RAG knowledge by identifying and reusing promising but initially unsuccessful external knowledge variants via observed attribution signals. At the same time, Poison-Responsive Neurons guided poisoning can effectively resolves knowledge conflict. Experimental results across models and datasets demonstrate consistently achieving high Population Overwrite Success Rate (POSR) of over 90% while preserving fluency. Empirical evidence shows that our method effectively resolves knowledge conflict.

Problem

Research questions and friction points this paper is trying to address.

Attacks exploit neuron sensitivities in retrieval-augmented generation systems

Generates adversarial external knowledge via neuron attribution and genetic optimization

Resolves knowledge conflicts to override model memory with poisoned data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neuron-guided poisoning via genetic optimization of knowledge

Identifying poison-responsive neurons for targeted attacks

Resolving knowledge conflicts through neuron attribution signals

🔎 Similar Papers

No similar papers found.