🤖 AI Summary
This study addresses the limitation in large language model (LLM) political bias assessment—overreliance on single-point ideological estimates—by introducing the “Political Overton Window” framework, the first to adapt the political science concept of acceptability boundaries to AI ethics auditing. We propose PRISM, a task-driven, indirect probing methodology grounded in prompt engineering and Political Compass testing, to systematically identify ideological boundaries across 28 mainstream LLMs from eight vendors. Results reveal a pervasive economic leftward and socially liberal orientation; however, models exhibit marked heterogeneity in expressive tolerance—e.g., DeepSeek is most conservative, while Gemini is most permissive. Crucially, the framework transcends conventional stance measurement by exposing systematic variation in models’ willingness to express positions across the ideological spectrum. This yields a more granular, interpretable, and behaviorally grounded paradigm for political bias evaluation in foundation models.
📝 Abstract
Political bias in Large Language Models (LLMs) presents a growing concern for the responsible deployment of AI systems. Traditional audits often attempt to locate a model's political position as a point estimate, masking the broader set of ideological boundaries that shape what a model is willing or unwilling to say. In this paper, we draw upon the concept of the Overton Window as a framework for mapping these boundaries: the range of political views that a given LLM will espouse, remain neutral on, or refuse to endorse. To uncover these windows, we applied an auditing-based methodology, called PRISM, that probes LLMs through task-driven prompts designed to elicit political stances indirectly. Using the Political Compass Test, we evaluated twenty-eight LLMs from eight providers to reveal their distinct Overton Windows. While many models default to economically left and socially liberal positions, we show that their willingness to express or reject certain positions varies considerably, where DeepSeek models tend to be very restrictive in what they will discuss and Gemini models tend to be most expansive. Our findings demonstrate that Overton Windows offer a richer, more nuanced view of political bias in LLMs and provide a new lens for auditing their normative boundaries.