A Generalizable Rhetorical Strategy Annotation Model Using LLM-based Debate Simulation and Labelling

📅 2025-10-16

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

Rhetorical strategy annotation has traditionally relied on labor-intensive manual efforts, suffering from high costs, low inter-annotator consistency, and poor generalizability; existing datasets are further constrained by narrow topical coverage and limited rhetorical categories. To address these limitations, we propose an LLM-based framework for automated debate simulation and rhetorical annotation, generating high-quality synthetic data spanning four core rhetorical dimensions: causal, experiential, emotional, and moral. We fine-tune a Transformer-based classifier on this data and rigorously evaluate it across multiple manually annotated domain-diverse datasets and external corpora. Our model achieves strong cross-domain generalization and state-of-the-art accuracy in rhetorical strategy classification, significantly advancing persuasive argument prediction. Applied to U.S. presidential debates (1960–2020), it reveals—quantitatively for the first time—a sustained historical rise in emotionally grounded arguments and a marked decline in cognitively oriented ones.

Technology Category

Application Category

📝 Abstract

Rhetorical strategies are central to persuasive communication, from political discourse and marketing to legal argumentation. However, analysis of rhetorical strategies has been limited by reliance on human annotation, which is costly, inconsistent, difficult to scale. Their associated datasets are often limited to specific topics and strategies, posing challenges for robust model development. We propose a novel framework that leverages large language models (LLMs) to automatically generate and label synthetic debate data based on a four-part rhetorical typology (causal, empirical, emotional, moral). We fine-tune transformer-based classifiers on this LLM-labeled dataset and validate its performance against human-labeled data on this dataset and on multiple external corpora. Our model achieves high performance and strong generalization across topical domains. We illustrate two applications with the fine-tuned model: (1) the improvement in persuasiveness prediction from incorporating rhetorical strategy labels, and (2) analyzing temporal and partisan shifts in rhetorical strategies in U.S. Presidential debates (1960-2020), revealing increased use of affective over cognitive argument in U.S. Presidential debates.

Problem

Research questions and friction points this paper is trying to address.

Automating rhetorical strategy annotation to replace costly human labeling

Overcoming dataset limitations for robust rhetorical analysis models

Enabling cross-domain generalization of rhetorical strategy classification

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs generate synthetic debate data automatically

Transformer classifiers fine-tuned on LLM-labeled dataset

Model achieves cross-domain generalization and high performance

🔎 Similar Papers

The Power of LLM-Generated Synthetic Data for Stance Detection in Online Political Discussions