Probing the Subtle Ideological Manipulation of Large Language Models

📅 2025-04-19

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses the vulnerability of large language models (LLMs) to covert ideological manipulation along the political spectrum—a risk inadequately captured by traditional left–right binary bias frameworks. We introduce the first systematic assessment of LLMs’ ideological plasticity across a continuous axis from progressive left to conservative right. To this end, we propose a multi-task, multi-dimensional ideological alignment evaluation paradigm and construct a fine-grained, human-annotated benchmark comprising ideological question answering, statement ranking, manifesto cloze, and U.S. congressional bill comprehension. Using Phi-2, Mistral, and Llama-3, we conduct supervised fine-tuning and compare its efficacy against explicit prompt-based control. Results demonstrate that fine-tuning substantially enhances ideological alignment, whereas prompt engineering yields only marginal shifts—confirming a tangible risk of gradual, non-explicit ideological manipulation in current LLMs.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have transformed natural language processing, but concerns have emerged about their susceptibility to ideological manipulation, particularly in politically sensitive areas. Prior work has focused on binary Left-Right LLM biases, using explicit prompts and fine-tuning on political QA datasets. In this work, we move beyond this binary approach to explore the extent to which LLMs can be influenced across a spectrum of political ideologies, from Progressive-Left to Conservative-Right. We introduce a novel multi-task dataset designed to reflect diverse ideological positions through tasks such as ideological QA, statement ranking, manifesto cloze completion, and Congress bill comprehension. By fine-tuning three LLMs-Phi-2, Mistral, and Llama-3-on this dataset, we evaluate their capacity to adopt and express these nuanced ideologies. Our findings indicate that fine-tuning significantly enhances nuanced ideological alignment, while explicit prompts provide only minor refinements. This highlights the models' susceptibility to subtle ideological manipulation, suggesting a need for more robust safeguards to mitigate these risks.

Problem

Research questions and friction points this paper is trying to address.

Exploring LLM susceptibility to diverse political ideologies

Assessing fine-tuning impact on nuanced ideological alignment

Highlighting need for safeguards against ideological manipulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-task dataset for diverse ideological positions

Fine-tuning LLMs for nuanced ideological alignment

Evaluating susceptibility to subtle ideological manipulation

🔎 Similar Papers

No similar papers found.