🤖 AI Summary
This study investigates the capability of large language models (LLMs) to detect fine-grained populist rhetoric in political discourse. To this end, we construct the first human-annotated, multi-label dataset specifically designed for fine-grained populism identification and propose a unified evaluation framework. Methodologically, we benchmark fine-tuned RoBERTa classifiers against both open- and closed-source instruction-tuned LLMs, under diverse prompting strategies, on corpora comprising Donald Trump’s speeches and European politicians’ statements. Our contributions are threefold: (1) We introduce the first annotated dataset and benchmark for fine-grained populism detection; (2) We demonstrate that fine-tuned models substantially outperform instruction-tuned LLMs within-domain but exhibit weaker cross-context generalization—whereas instruction models show superior transferability; (3) We empirically characterize Trump’s populist rhetorical distribution and validate model efficacy in European political contexts, thereby delineating the technical boundaries and practical applicability of automated populism analysis.
📝 Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of instruction-following tasks, yet their grasp of nuanced social science concepts remains underexplored. This paper examines whether LLMs can identify and classify fine-grained forms of populism, a complex and contested concept in both academic and media debates. To this end, we curate and release novel datasets specifically designed to capture populist discourse. We evaluate a range of pre-trained (large) language models, both open-weight and proprietary, across multiple prompting paradigms. Our analysis reveals notable variation in performance, highlighting the limitations of LLMs in detecting populist discourse. We find that a fine-tuned RoBERTa classifier vastly outperforms all new-era instruction-tuned LLMs, unless fine-tuned. Additionally, we apply our best-performing model to analyze campaign speeches by Donald Trump, extracting valuable insights into his strategic use of populist rhetoric. Finally, we assess the generalizability of these models by benchmarking them on campaign speeches by European politicians, offering a lens into cross-context transferability in political discourse analysis. In this setting, we find that instruction-tuned LLMs exhibit greater robustness on out-of-domain data.