The Impact of Ideological Discourses in RAG: A Case Study with COVID-19 Treatments

📅 2026-03-16

📈 Citations: 0

✨ Influential: 0

career value

152K/year

🤖 AI Summary

This study addresses the underexplored influence of ideologically charged texts on large language model outputs within retrieval-augmented generation (RAG) systems, highlighting potential risks of bias and manipulation. The authors construct a corpus of 1,117 texts reflecting ideological disputes over COVID-19 treatments and, for the first time, integrate corpus linguistics with Lexical Multidimensional Discourse Analysis (LMDA) into the RAG framework to identify salient ideological dimensions. Two prompting strategies are designed to evaluate shifts in model output orientation. Experimental results demonstrate that incorporating ideologically laden retrieved texts significantly aligns model responses with the source ideology. Moreover, augmenting prompts with LMDA-derived descriptors enables explicit modulation of this alignment effect, thereby confirming the steering influence—and associated risks—of ideological discourse embedded in external knowledge sources on RAG systems.

Technology Category

Application Category

📝 Abstract

This paper studies the impact of retrieved ideological texts on the outputs of large language models (LLMs). While interest in understanding ideology in LLMs has recently increased, little attention has been given to this issue in the context of Retrieval-Augmented Generation (RAG). To fill this gap, we design an external knowledge source based on ideological loaded texts about COVID-19 treatments. Our corpus is based on 1,117 academic articles representing discourses about controversial and endorsed treatments for the disease. We propose a corpus linguistics framework, based on Lexical Multidimensional Analysis (LMDA), to identify the ideologies within the corpus. LLMs are tasked to answer questions derived from three identified ideological dimensions, and two types of contextual prompts are adopted: the first comprises the user question and ideological texts; and the second contains the question, ideological texts, and LMDA descriptions. Ideological alignment between reference ideological texts and LLMs' responses is assessed using cosine similarity for lexical and semantic representations. Results demonstrate that LLMs' responses based on ideological retrieved texts are more aligned with the ideology encountered in the external knowledge, with the enhanced prompt further influencing LLMs' outputs. Our findings highlight the importance of identifying ideological discourses within the RAG framework in order to mitigate not just unintended ideological bias, but also the risks of malicious manipulation of such models.

Problem

Research questions and friction points this paper is trying to address.

ideological bias

Retrieval-Augmented Generation

large language models

misinformation

RAG

Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Augmented Generation (RAG)

Ideological Bias

Lexical Multidimensional Analysis (LMDA)