Leveraging In-Context Learning for Political Bias Testing of LLMs

📅 2025-06-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM political bias evaluations suffer from unstable prompting strategies, undermining cross-model comparability. To address this, we propose the Questionnaire Modeling (QM) task—introducing real human survey data as in-context examples for bias assessment, thereby enhancing response consistency and comparability via in-context learning. We innovatively design a QM framework that systematically reveals how instruction tuning modulates bias directionality. Our analysis further uncovers that model scale positively correlates with contextual utilization capability, and larger models exhibit lower, more stable bias scores under QM. Experiments across multiple LLM sizes demonstrate that QM significantly improves evaluation stability. This work establishes a novel, reproducible, and interpretable paradigm for assessing political bias in large language models.

Technology Category

Application Category

📝 Abstract
A growing body of work has been querying LLMs with political questions to evaluate their potential biases. However, this probing method has limited stability, making comparisons between models unreliable. In this paper, we argue that LLMs need more context. We propose a new probing task, Questionnaire Modeling (QM), that uses human survey data as in-context examples. We show that QM improves the stability of question-based bias evaluation, and demonstrate that it may be used to compare instruction-tuned models to their base versions. Experiments with LLMs of various sizes indicate that instruction tuning can indeed change the direction of bias. Furthermore, we observe a trend that larger models are able to leverage in-context examples more effectively, and generally exhibit smaller bias scores in QM. Data and code are publicly available.
Problem

Research questions and friction points this paper is trying to address.

Evaluating political bias in LLMs with unstable probing methods
Improving bias evaluation stability using Questionnaire Modeling
Assessing bias changes from instruction tuning in different-sized LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Questionnaire Modeling for bias testing
Leverages human survey data as context
Compares instruction-tuned models effectively
🔎 Similar Papers
No similar papers found.
Patrick Haller
Patrick Haller
PhD Student, Humboldt Universität zu Berlin
NLPLanguage Modelling
Jannis Vamvas
Jannis Vamvas
University of Zurich
R
Rico Sennrich
Department of Computational Linguistics, University of Zurich
L
Lena A. Jäger
Department of Computational Linguistics, University of Zurich