Counterfactual LLM-based Framework for Measuring Rhetorical Style

📅 2025-12-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the growing issue of “rhetorical hype” in machine learning papers—a concern increasingly undermining scientific credibility. Method: We propose the first quantitative framework for rhetorical style analysis, grounded in counterfactual text generation and LLM-based adjudication. Using multi-role prompting, multiple LLMs generate diverse rhetorical variants of identical technical content; pairwise LLM comparisons, aggregated via the Bradley–Terry model, disentangle rhetorical intensity from substantive content. Applied to 8,485 ICLR submissions (2017–2025), the framework produces >250,000 counterfactual texts, validated against human annotations. Contribution/Results: Rhetorical intensity—particularly “visionary framing”—independently predicts citation counts and media coverage; its sharp rise post-2023 is attributable to widespread adoption of LLM-assisted writing. The framework exhibits cross-persona robustness, with LLM judgments strongly aligning with human assessments (Spearman’s ρ > 0.85).

Technology Category

Application Category

📝 Abstract
The rise of AI has fueled growing concerns about ``hype'' in machine learning papers, yet a reliable way to quantify rhetorical style independently of substantive content has remained elusive. Because bold language can stem from either strong empirical results or mere rhetorical style, it is often difficult to distinguish between the two. To disentangle rhetorical style from substantive content, we introduce a counterfactual, LLM-based framework: multiple LLM rhetorical personas generate counterfactual writings from the same substantive content, an LLM judge compares them through pairwise evaluations, and the outcomes are aggregated using a Bradley--Terry model. Applying this method to 8,485 ICLR submissions sampled from 2017 to 2025, we generate more than 250,000 counterfactual writings and provide a large-scale quantification of rhetorical style in ML papers. We find that visionary framing significantly predicts downstream attention, including citations and media attention, even after controlling for peer-review evaluations. We also observe a sharp rise in rhetorical strength after 2023, and provide empirical evidence showing that this increase is largely driven by the adoption of LLM-based writing assistance. The reliability of our framework is validated by its robustness to the choice of personas and the high correlation between LLM judgments and human annotations. Our work demonstrates that LLMs can serve as instruments to measure and improve scientific evaluation.
Problem

Research questions and friction points this paper is trying to address.

Quantify rhetorical style separate from content
Distinguish empirical strength from bold language
Measure hype in machine learning papers
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM personas generate counterfactual writings from content
LLM judge compares writings via pairwise evaluations
Aggregate outcomes using Bradley-Terry model for quantification
🔎 Similar Papers
No similar papers found.
J
Jingyi Qiu
School of Information, University of Michigan, Ann Arbor
H
Hong Chen
School of Information, University of Michigan, Ann Arbor
Zongyi Li
Zongyi Li
MIT
Machine learningScientific computingNeural operator