đ¤ AI Summary
This study investigates the causal effect of emotional intensity on argument persuasiveness and its heterogeneity across linguistic, domain, and topical dimensions, while evaluating large language modelsâ (LLMs) capacity to model this effect. Methodologically, it introduces, for the first time in NLP argumentation research, a psychological manipulation-check paradigm, coupled with an LLM-based framework for emotion-controllable argument generation and discrimination. Human evaluations and behavioral analyses across 11 state-of-the-art LLMs are conducted on multilingual, multi-domain texts. Key contributions/results: (1) In over 50% of cases, emotional variation does not significantly alter persuasiveness judgments; (2) when emotion exerts an effect, it predominantly enhancesârather than diminishesâpersuasiveness; (3) LLMs broadly capture aggregate population-level trends but fail to replicate fine-grained, individual-level effects. The work proposes the first dynamic, empirically verifiable causal framework for emotionâpersuasiveness analysis, revealing the non-universality and enhancement-dominant nature of this relationship.
đ Abstract
Emotions have been shown to play a role in argument convincingness, yet this aspect is underexplored in the natural language processing (NLP) community. Unlike prior studies that use static analyses, focus on a single text domain or language, or treat emotion as just one of many factors, we introduce a dynamic framework inspired by manipulation checks commonly used in psychology and social science; leveraging LLM-based manipulation checks, this framework examines the extent to which perceived emotional intensity influences perceived convincingness. Through human evaluation of arguments across different languages, text domains, and topics, we find that in over half of cases, judgments of convincingness remain unchanged despite variations in perceived emotional intensity; when emotions do have an impact, they more often enhance rather than weaken convincingness. We further analyze how 11 LLMs behave in the same scenario, finding that while LLMs generally mirror human patterns, they struggle to capture nuanced emotional effects in individual judgments.