🤖 AI Summary
This study investigates the implicit political biases of large language models (LLMs) in news summarization and generation, which may compromise informational objectivity. The authors systematically evaluate nine prominent LLMs on news-related tasks using few-shot prompting combined with ideologically directed generation instructions (FAITHFUL, CENTRIST, LEFT, RIGHT) and quantify their ideological consistency through a unified evaluator. The work reveals, for the first time, a prevalent “centrism collapse” phenomenon—where models overwhelmingly default to neutral outputs—while identifying Grok 4 as the most ideologically expressive model. Furthermore, Claude Sonnet 4.5 and Llama 3.1 demonstrate the highest accuracy in bias detection among commercial and open-source models, respectively.
📝 Abstract
Large Language Model (LLM) based summarization and text generation are increasingly used for producing and rewriting text, raising concerns about political framing in journalism where subtle wording choices can shape interpretation. Across nine state-of-the-art LLMs, we study political framing by testing whether LLMs'classification-based bias signals align with framing behavior in their generated summaries. We first compare few-shot ideology predictions against LEFT/CENTER/RIGHT labels. We then generate"steered"summaries under FAITHFUL, CENTRIST, LEFT, and RIGHT prompts, and score all outputs using a single fixed ideology evaluator. We find pervasive ideological center-collapse in both article-level ratings and generated text, indicating a systematic tendency toward centrist framing. Among evaluated models, Grok 4 is by far the most ideologically expressive generator, while Claude Sonnet 4.5 and Llama 3.1 achieve the strongest bias-rating performance among commercial and open-weight models, respectively.