Bridging Human and Model Perspectives: A Comparative Analysis of Political Bias Detection in News Media Using Large Language Models

📅 2025-11-18

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This study investigates the alignment between large language models (LLMs) and human annotators in detecting political bias in news articles. Addressing the lack of systematic evaluation of human–model judgment agreement, we construct a high-quality manually annotated dataset and propose a human–model comparative evaluation framework to quantitatively assess the alignment of GPT, BERT, RoBERTa, and FLAN-T5 with human annotations on bias polarity and intensity. Methodologically, we innovatively integrate zero-shot and fine-tuned paradigms to uncover systematic discrepancies both across models and between models and humans. Results show that the fine-tuned RoBERTa model achieves the highest accuracy and human-label alignment, whereas generative models—particularly GPT—exhibit the strongest zero-shot agreement with human judgments. The study advocates a novel hybrid evaluation paradigm that synergizes *human interpretability* with *model scalability*, advancing rigorous, human-centered LLM assessment for politically sensitive tasks.

Technology Category

Application Category

📝 Abstract

Detecting political bias in news media is a complex task that requires interpreting subtle linguistic and contextual cues. Although recent advances in Natural Language Processing (NLP) have enabled automatic bias classification, the extent to which large language models (LLMs) align with human judgment still remains relatively underexplored and not yet well understood. This study aims to present a comparative framework for evaluating the detection of political bias across human annotations and multiple LLMs, including GPT, BERT, RoBERTa, and FLAN. We construct a manually annotated dataset of news articles and assess annotation consistency, bias polarity, and inter-model agreement to quantify divergence between human and model perceptions of bias. Experimental results show that among traditional transformer-based models, RoBERTa achieves the highest alignment with human labels, whereas generative models such as GPT demonstrate the strongest overall agreement with human annotations in a zero-shot setting. Among all transformer-based baselines, our fine-tuned RoBERTa model acquired the highest accuracy and the strongest alignment with human-annotated labels. Our findings highlight systematic differences in how humans and LLMs perceive political slant, underscoring the need for hybrid evaluation frameworks that combine human interpretability with model scalability in automated media bias detection.

Problem

Research questions and friction points this paper is trying to address.

Evaluating political bias detection alignment between human judgment and LLMs

Comparing bias perception across multiple models including GPT and RoBERTa

Quantifying divergence in bias interpretation between human and model perspectives

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using comparative framework for human-LLM bias detection

Fine-tuning RoBERTa achieves best human alignment

Proposing hybrid evaluation combining human and model

🔎 Similar Papers

No similar papers found.