The Impact of Ideological Discourses in RAG: A Case Study with COVID-19 Treatments

📅 2026-03-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the underexplored influence of ideologically charged texts on large language model outputs within retrieval-augmented generation (RAG) systems, highlighting potential risks of bias and manipulation. The authors construct a corpus of 1,117 texts reflecting ideological disputes over COVID-19 treatments and, for the first time, integrate corpus linguistics with Lexical Multidimensional Discourse Analysis (LMDA) into the RAG framework to identify salient ideological dimensions. Two prompting strategies are designed to evaluate shifts in model output orientation. Experimental results demonstrate that incorporating ideologically laden retrieved texts significantly aligns model responses with the source ideology. Moreover, augmenting prompts with LMDA-derived descriptors enables explicit modulation of this alignment effect, thereby confirming the steering influence—and associated risks—of ideological discourse embedded in external knowledge sources on RAG systems.

Technology Category

Application Category

📝 Abstract
This paper studies the impact of retrieved ideological texts on the outputs of large language models (LLMs). While interest in understanding ideology in LLMs has recently increased, little attention has been given to this issue in the context of Retrieval-Augmented Generation (RAG). To fill this gap, we design an external knowledge source based on ideological loaded texts about COVID-19 treatments. Our corpus is based on 1,117 academic articles representing discourses about controversial and endorsed treatments for the disease. We propose a corpus linguistics framework, based on Lexical Multidimensional Analysis (LMDA), to identify the ideologies within the corpus. LLMs are tasked to answer questions derived from three identified ideological dimensions, and two types of contextual prompts are adopted: the first comprises the user question and ideological texts; and the second contains the question, ideological texts, and LMDA descriptions. Ideological alignment between reference ideological texts and LLMs' responses is assessed using cosine similarity for lexical and semantic representations. Results demonstrate that LLMs' responses based on ideological retrieved texts are more aligned with the ideology encountered in the external knowledge, with the enhanced prompt further influencing LLMs' outputs. Our findings highlight the importance of identifying ideological discourses within the RAG framework in order to mitigate not just unintended ideological bias, but also the risks of malicious manipulation of such models.
Problem

Research questions and friction points this paper is trying to address.

ideological bias
Retrieval-Augmented Generation
large language models
misinformation
RAG
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Augmented Generation (RAG)
Ideological Bias
Lexical Multidimensional Analysis (LMDA)
Large Language Models (LLMs)
Corpus Linguistics
🔎 Similar Papers
No similar papers found.
E
Elmira Salari
Wichita State University
Maria Claudia Nunes Delfino
Maria Claudia Nunes Delfino
São Paulo Catholic University
Corpus Linguistics
H
Hazem Amamou
Institut national de la recherche scientifique
J
José Victor de Souza
Institut national de la recherche scientifique
Shruti Kshirsagar
Shruti Kshirsagar
Wichita State University
Deep LearningHealthcare & AISignal ProcessingEmotion RecognitionDeep Fake
A
Alan Davoust
Université du Québec en Outaouais
Anderson Avila
Anderson Avila
Institut national de la recherche scientifique
voice biometricsquality assessmentemotion recognitionnatural language processing