Are Large Language Models Good at Detecting Propaganda?

📅 2025-05-19

📈 Citations: 1

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This study systematically evaluates large language models’ (LLMs) capability to perform fine-grained identification of six propaganda techniques in news texts. We employ prompt engineering to interface GPT-4, GPT-3.5, and Claude 3 Opus, benchmarking them against domain-specialized supervised models—RoBERTa-CRF and a multi-granularity network (MGN). Results reveal that general-purpose LLMs significantly underperform domain-adapted models on overall propaganda detection (GPT-4 F1 = 0.16 vs. RoBERTa-CRF F1 = 0.67), exposing critical limitations in logical fallacy reasoning. Surprisingly, GPT-3.5 and GPT-4 outperform MGN on three affect-driven techniques—name-calling, appeal to fear, and flag-waving—suggesting stronger alignment with emotionally charged rhetorical patterns. This work establishes the first empirical benchmark characterizing LLMs’ capabilities and boundaries in cognitive bias identification, thereby informing the design of hybrid paradigms that synergistically integrate general-purpose and task-specific models for propaganda analysis.

Technology Category

Application Category

📝 Abstract

Propagandists use rhetorical devices that rely on logical fallacies and emotional appeals to advance their agendas. Recognizing these techniques is key to making informed decisions. Recent advances in Natural Language Processing (NLP) have enabled the development of systems capable of detecting manipulative content. In this study, we look at several Large Language Models and their performance in detecting propaganda techniques in news articles. We compare the performance of these LLMs with transformer-based models. We find that, while GPT-4 demonstrates superior F1 scores (F1=0.16) compared to GPT-3.5 and Claude 3 Opus, it does not outperform a RoBERTa-CRF baseline (F1=0.67). Additionally, we find that all three LLMs outperform a MultiGranularity Network (MGN) baseline in detecting instances of one out of six propaganda techniques (name-calling), with GPT-3.5 and GPT-4 also outperforming the MGN baseline in detecting instances of appeal to fear and flag-waving.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' ability to detect propaganda techniques in news articles

Comparing LLM performance with transformer-based models for propaganda detection

Assessing effectiveness of GPT-4 versus baselines in identifying specific propaganda methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using Large Language Models for propaganda detection

Comparing LLMs with transformer-based models

Evaluating performance via F1 scores and baselines

🔎 Similar Papers

PropaInsight: Toward Deeper Understanding of Propaganda in Terms of Techniques, Appeals, and Intent