Benchmarking Political Persuasion Risks Across Frontier Large Language Models

📅 2026-03-10

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This study evaluates the potential risks and persuasive influence of state-of-the-art large language models (LLMs) in shaping political opinions relative to traditional political advertising. Through two large-scale online survey experiments (N = 19,145), the research systematically compares the persuasive efficacy of seven leading LLMs across partisan political issues and employs data-driven analysis of LLM-assisted dialogues to identify both general and model-specific persuasion strategies. The work presents the first multi-model benchmarking framework for political persuasion and introduces a model-agnostic approach to strategy identification, revealing that the effectiveness of informational prompts varies significantly across models. Findings indicate that LLMs collectively outperform conventional political ads in persuasion, with Claude demonstrating the strongest effect and Grok the weakest, underscoring the necessity and feasibility of systematic risk assessment of LLMs’ political influence.

Technology Category

Application Category

📝 Abstract

Concerns persist regarding the capacity of Large Language Models (LLMs) to sway political views. Although prior research has claimed that LLMs are not more persuasive than standard political campaign practices, the recent rise of frontier models warrants further study. In two survey experiments (N=19,145) across bipartisan issues and stances, we evaluate seven state-of-the-art LLMs developed by Anthropic, OpenAI, Google, and xAI. We find that LLMs outperform standard campaign advertisements, with heterogeneity in performance across models. Specifically, Claude models exhibit the highest persuasiveness, while Grok exhibits the lowest. The results are robust across issues and stances. Moreover, in contrast to the findings in Hackenburg et al. (2025b) and Lin et al. (2025) that information-based prompts boost persuasiveness, we find that the effectiveness of information-based prompts is model-dependent: they increase the persuasiveness of Claude and Grok while substantially reducing that of GPT. We introduce a data-driven and strategy-agnostic LLM-assisted conversation analysis approach to identify and assess underlying persuasive strategies. Our work benchmarks the persuasive risks of frontier models and provides a framework for cross-model comparative risk assessment.

Problem

Research questions and friction points this paper is trying to address.

political persuasion

large language models

persuasive risk

frontier models

benchmarking

Innovation

Methods, ideas, or system contributions that make the work stand out.

political persuasion

large language models

benchmarking