🤖 AI Summary
This study systematically evaluates political bias and acquiescence in six commercial large language models (LLMs) from OpenAI, Anthropic, and Cohere within Germany’s multi-party democratic context. Method: We introduce GermanPartiesQA—a novel benchmark built upon the Wahl-o-Mat platform—incorporating electoral data from ten federal states and national elections, and design political identity prompting experiments using legislators’ sociodemographic attributes. We innovatively decouple acquiescence from controllability via “I am” versus “You are” prompting paradigms. Contribution/Results: Our analysis reveals a stable left-green ideological bias across mainstream LLMs; demonstrates that response shifts stem primarily from contextual adaptation rather than unconditional compliance; and shows that political identity prompts significantly steer model outputs. We propose a hybrid qualitative–quantitative analytical framework, establishing a methodological benchmark for assessing AI-driven political bias.
📝 Abstract
LLMs are changing the way humans create and interact with content, potentially affecting citizens' political opinions and voting decisions. As LLMs increasingly shape our digital information ecosystems, auditing to evaluate biases, sycophancy, or steerability has emerged as an active field of research. In this paper, we evaluate and compare the alignment of six LLMs by OpenAI, Anthropic, and Cohere with German party positions and evaluate sycophancy based on a prompt experiment. We contribute to evaluating political bias and sycophancy in multi-party systems across major commercial LLMs. First, we develop the benchmark dataset GermanPartiesQA based on the Voting Advice Application Wahl-o-Mat covering 10 state and 1 national elections between 2021 and 2023. In our study, we find a left-green tendency across all examined LLMs. We then conduct our prompt experiment for which we use the benchmark and sociodemographic data of leading German parliamentarians to evaluate changes in LLMs responses. To differentiate between sycophancy and steerabilty, we use 'I am [politician X], ...' and 'You are [politician X], ...' prompts. Against our expectations, we do not observe notable differences between prompting 'I am' and 'You are'. While our findings underscore that LLM responses can be ideologically steered with political personas, they suggest that observed changes in LLM outputs could be better described as personalization to the given context rather than sycophancy.