Using AI to Summarize US Presidential Campaign TV Advertisement Videos, 1952-2012

📅 2025-03-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the longstanding absence of large-scale, high-quality digital datasets of U.S. presidential campaign television advertisements. We propose the first end-to-end, parallelized AI analytics pipeline—integrating automatic speech recognition (ASR), large language model (LLM)-driven video understanding and abstractive summarization, distributed preprocessing, and human-in-the-loop quality verification—to automatically construct a digital dataset comprising 9,707 ads spanning 1952–2012. To date, this is the most comprehensive and temporally extensive (70-year) resource of its kind, enabling longitudinal analysis of campaign issue evolution across decades. Human evaluation confirms that generated transcripts and summaries achieve parity with manual annotations in accuracy and informativeness. The entire pipeline—including code, models, and data—is fully open-sourced, establishing both a benchmark dataset and a reproducible methodological framework for political communication research and fine-grained video semantic analysis.

Technology Category

Application Category

📝 Abstract
This paper introduces the largest and most comprehensive dataset of US presidential campaign television advertisements, available in digital format. The dataset also includes machine-searchable transcripts and high-quality summaries designed to facilitate a variety of academic research. To date, there has been great interest in collecting and analyzing US presidential campaign advertisements, but the need for manual procurement and annotation led many to rely on smaller subsets. We design a large-scale parallelized, AI-based analysis pipeline that automates the laborious process of preparing, transcribing, and summarizing videos. We then apply this methodology to the 9,707 presidential ads from the Julian P. Kanter Political Commercial Archive. We conduct extensive human evaluations to show that these transcripts and summaries match the quality of manually generated alternatives. We illustrate the value of this data by including an application that tracks the genesis and evolution of current focal issue areas over seven decades of presidential elections. Our analysis pipeline and codebase also show how to use LLM-based tools to obtain high-quality summaries for other video datasets.
Problem

Research questions and friction points this paper is trying to address.

Automates transcription and summarization of US campaign ads
Creates largest dataset of presidential ad videos and transcripts
Tracks issue evolution in elections over seven decades
Innovation

Methods, ideas, or system contributions that make the work stand out.

AI-based pipeline automates video processing
Large-scale dataset with machine-searchable transcripts
LLM tools generate high-quality video summaries
🔎 Similar Papers
No similar papers found.
Adam Breuer
Adam Breuer
Harvard
Political ScienceComputer Science
Bryce J. Dietrich
Bryce J. Dietrich
Purdue University, Department of Political Science, West Lafayette, IN, USA
M
Michael H. Crespin
University of Oklahoma, Carl Albert Congressional Research and Studies Center, Norman, OK, USA; University of Oklahoma, Department of Political Science, Norman, OK, USA
Matthew Butler
Matthew Butler
Monash University
accessibilityinclusive technologyaccessible graphics
J
J. A. Pyrse
University of Oklahoma, Carl Albert Congressional Research and Studies Center, Norman, OK, USA
Kosuke Imai
Kosuke Imai
Professor of Government and of Statistics, Harvard University
applied statisticscausal inferencecomputational social sciencequantitative social sciencepolitical methodology