GermanPartiesQA: Benchmarking Commercial Large Language Models for Political Bias and Sycophancy

📅 2024-07-25
🏛️ arXiv.org
📈 Citations: 9
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically evaluates political bias and acquiescence in six commercial large language models (LLMs) from OpenAI, Anthropic, and Cohere within Germany’s multi-party democratic context. Method: We introduce GermanPartiesQA—a novel benchmark built upon the Wahl-o-Mat platform—incorporating electoral data from ten federal states and national elections, and design political identity prompting experiments using legislators’ sociodemographic attributes. We innovatively decouple acquiescence from controllability via “I am” versus “You are” prompting paradigms. Contribution/Results: Our analysis reveals a stable left-green ideological bias across mainstream LLMs; demonstrates that response shifts stem primarily from contextual adaptation rather than unconditional compliance; and shows that political identity prompts significantly steer model outputs. We propose a hybrid qualitative–quantitative analytical framework, establishing a methodological benchmark for assessing AI-driven political bias.

Technology Category

Application Category

📝 Abstract
LLMs are changing the way humans create and interact with content, potentially affecting citizens' political opinions and voting decisions. As LLMs increasingly shape our digital information ecosystems, auditing to evaluate biases, sycophancy, or steerability has emerged as an active field of research. In this paper, we evaluate and compare the alignment of six LLMs by OpenAI, Anthropic, and Cohere with German party positions and evaluate sycophancy based on a prompt experiment. We contribute to evaluating political bias and sycophancy in multi-party systems across major commercial LLMs. First, we develop the benchmark dataset GermanPartiesQA based on the Voting Advice Application Wahl-o-Mat covering 10 state and 1 national elections between 2021 and 2023. In our study, we find a left-green tendency across all examined LLMs. We then conduct our prompt experiment for which we use the benchmark and sociodemographic data of leading German parliamentarians to evaluate changes in LLMs responses. To differentiate between sycophancy and steerabilty, we use 'I am [politician X], ...' and 'You are [politician X], ...' prompts. Against our expectations, we do not observe notable differences between prompting 'I am' and 'You are'. While our findings underscore that LLM responses can be ideologically steered with political personas, they suggest that observed changes in LLM outputs could be better described as personalization to the given context rather than sycophancy.
Problem

Research questions and friction points this paper is trying to address.

Evaluates political alignment of commercial LLMs via German voting statements
Assesses factual accuracy and ideological steerability in AI political responses
Investigates sycophancy claims versus persona-based adaptability in role-playing experiments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmarking political alignment via role-playing personas
Evaluating factual limitations and ideological steerability of LLMs
Assessing sycophancy claims through model-specific alignment patterns
🔎 Similar Papers
No similar papers found.