Are Large Language Models Good at Detecting Propaganda?

📅 2025-05-19
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically evaluates large language models’ (LLMs) capability to perform fine-grained identification of six propaganda techniques in news texts. We employ prompt engineering to interface GPT-4, GPT-3.5, and Claude 3 Opus, benchmarking them against domain-specialized supervised models—RoBERTa-CRF and a multi-granularity network (MGN). Results reveal that general-purpose LLMs significantly underperform domain-adapted models on overall propaganda detection (GPT-4 F1 = 0.16 vs. RoBERTa-CRF F1 = 0.67), exposing critical limitations in logical fallacy reasoning. Surprisingly, GPT-3.5 and GPT-4 outperform MGN on three affect-driven techniques—name-calling, appeal to fear, and flag-waving—suggesting stronger alignment with emotionally charged rhetorical patterns. This work establishes the first empirical benchmark characterizing LLMs’ capabilities and boundaries in cognitive bias identification, thereby informing the design of hybrid paradigms that synergistically integrate general-purpose and task-specific models for propaganda analysis.

Technology Category

Application Category

📝 Abstract
Propagandists use rhetorical devices that rely on logical fallacies and emotional appeals to advance their agendas. Recognizing these techniques is key to making informed decisions. Recent advances in Natural Language Processing (NLP) have enabled the development of systems capable of detecting manipulative content. In this study, we look at several Large Language Models and their performance in detecting propaganda techniques in news articles. We compare the performance of these LLMs with transformer-based models. We find that, while GPT-4 demonstrates superior F1 scores (F1=0.16) compared to GPT-3.5 and Claude 3 Opus, it does not outperform a RoBERTa-CRF baseline (F1=0.67). Additionally, we find that all three LLMs outperform a MultiGranularity Network (MGN) baseline in detecting instances of one out of six propaganda techniques (name-calling), with GPT-3.5 and GPT-4 also outperforming the MGN baseline in detecting instances of appeal to fear and flag-waving.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' ability to detect propaganda techniques in news articles
Comparing LLM performance with transformer-based models for propaganda detection
Assessing effectiveness of GPT-4 versus baselines in identifying specific propaganda methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using Large Language Models for propaganda detection
Comparing LLMs with transformer-based models
Evaluating performance via F1 scores and baselines
🔎 Similar Papers
No similar papers found.