🤖 AI Summary
This study presents the first systematic evaluation of large language models’ (LLMs’) capacity to track shifts in public support for U.S. presidential candidates during the 2024 election cycle. By querying nine distinct LLM configurations daily for candidate favorability predictions and comparing these outputs against high-quality time-series polling data from five authoritative sources—including Reuters and Gallup—the analysis reveals that all models consistently overestimated Kamala Harris’s support by 10–40%, while exhibiting smaller, source-dependent biases for Donald Trump (5–10%). These findings demonstrate that current LLMs cannot reliably capture real-world public opinion dynamics. Notably, the observed biases persist irrespective of retrieval-augmented generation, highlighting a fundamental limitation in LLMs’ ability to perform dynamic social sensing tasks.
📝 Abstract
We investigate whether Large Language Models (LLMs) can track public opinion as measured by exit polls during the 2024 U.S. presidential election cycle. Our analysis focuses on headline favorability (e.g.,"Favorable"vs."Unfavorable") of presidential candidates across multiple LLMs queried daily throughout the election season. Using the publicly available llm-election-data-2024 dataset, we evaluate predictions from nine LLM configurations against a curated set of five high-quality polls from major organizations including Reuters, CNN, Gallup, Quinnipiac, and ABC. We find systematic directional miscalibration. For Kamala Harris, all models overpredict favorability by 10-40% relative to polls. For Donald Trump, biases are smaller (5-10%) and poll-dependent, with substantially lower cross-model variation. These deviations persist under temporal smoothing and are not corrected by internet-augmented retrieval. We conclude that off-the-shelf LLMs do not reliably track polls when queried in a straightforward manner and discuss implications for election forecasting.