Using AI to Summarize US Presidential Campaign TV Advertisement Videos, 1952-2012

📅 2025-03-28

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This study addresses the longstanding absence of large-scale, high-quality digital datasets of U.S. presidential campaign television advertisements. We propose the first end-to-end, parallelized AI analytics pipeline—integrating automatic speech recognition (ASR), large language model (LLM)-driven video understanding and abstractive summarization, distributed preprocessing, and human-in-the-loop quality verification—to automatically construct a digital dataset comprising 9,707 ads spanning 1952–2012. To date, this is the most comprehensive and temporally extensive (70-year) resource of its kind, enabling longitudinal analysis of campaign issue evolution across decades. Human evaluation confirms that generated transcripts and summaries achieve parity with manual annotations in accuracy and informativeness. The entire pipeline—including code, models, and data—is fully open-sourced, establishing both a benchmark dataset and a reproducible methodological framework for political communication research and fine-grained video semantic analysis.

Technology Category

Application Category

📝 Abstract

This paper introduces the largest and most comprehensive dataset of US presidential campaign television advertisements, available in digital format. The dataset also includes machine-searchable transcripts and high-quality summaries designed to facilitate a variety of academic research. To date, there has been great interest in collecting and analyzing US presidential campaign advertisements, but the need for manual procurement and annotation led many to rely on smaller subsets. We design a large-scale parallelized, AI-based analysis pipeline that automates the laborious process of preparing, transcribing, and summarizing videos. We then apply this methodology to the 9,707 presidential ads from the Julian P. Kanter Political Commercial Archive. We conduct extensive human evaluations to show that these transcripts and summaries match the quality of manually generated alternatives. We illustrate the value of this data by including an application that tracks the genesis and evolution of current focal issue areas over seven decades of presidential elections. Our analysis pipeline and codebase also show how to use LLM-based tools to obtain high-quality summaries for other video datasets.

Problem

Research questions and friction points this paper is trying to address.

Automates transcription and summarization of US campaign ads

Creates largest dataset of presidential ad videos and transcripts

Tracks issue evolution in elections over seven decades

Innovation

Methods, ideas, or system contributions that make the work stand out.

AI-based pipeline automates video processing

Large-scale dataset with machine-searchable transcripts

LLM tools generate high-quality video summaries

🔎 Similar Papers

No similar papers found.