An Image is Worth $K$ Topics: A Visual Structural Topic Model with Pretrained Image Embeddings

📅 2025-04-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Political communication research urgently requires scalable, interpretable image analysis methods capable of covariate modeling; however, existing vision models fail to simultaneously satisfy semantic richness and social-scientific inferential requirements. This paper introduces the visual Structured Topic Model (vSTM), the first framework to integrate pretrained image embeddings—such as those from CLIP—into the structured topic modeling (STM) paradigm. Leveraging variational inference, vSTM produces multi-topic mixture representations for images while enabling explicit topic–covariate association analysis. By preserving fine-grained visual semantics, vSTM yields both interpretable topic distributions and statistically grounded estimates of covariate effects. Experiments on large-scale online political communication data demonstrate that vSTM significantly outperforms conventional visual topic modeling approaches in topic coherence, interpretability, and substantive relevance. It thus establishes a novel, principled paradigm for large-scale visual content analysis in political science.

Technology Category

Application Category

📝 Abstract
Political scientists are increasingly interested in analyzing visual content at scale. However, the existing computational toolbox is still in need of methods and models attuned to the specific challenges and goals of social and political inquiry. In this article, we introduce a visual Structural Topic Model (vSTM) that combines pretrained image embeddings with a structural topic model. This has important advantages compared to existing approaches. First, pretrained embeddings allow the model to capture the semantic complexity of images relevant to political contexts. Second, the structural topic model provides the ability to analyze how topics and covariates are related, while maintaining a nuanced representation of images as a mixture of multiple topics. In our empirical application, we show that the vSTM is able to identify topics that are interpretable, coherent, and substantively relevant to the study of online political communication.
Problem

Research questions and friction points this paper is trying to address.

Analyzing visual content for political science research
Combining pretrained image embeddings with topic modeling
Improving interpretability of visual political communication topics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines pretrained image embeddings with topic model
Captures semantic complexity of political images
Analyzes topic-covariate relations with nuanced representations
🔎 Similar Papers
No similar papers found.
M
Matías Piqueras
InfoLab, Department of Information Technology, Uppsala University, Sweden
A
Alexandra Segerberg
Department of Government, Uppsala University, Sweden
Matteo Magnani
Matteo Magnani
Professor, Uppsala University
Social Data ScienceSocial NetworksSocial MediaVisual Communication
Måns Magnusson
Måns Magnusson
Department of Statistics, Uppsala University, Sweden
Bayesian StatisticsProbabilistic Machine LearningText-as-DataComputational Social Science
N
Nataša Sladoje
Centre for Image Analysis, Department of Information Technology, Uppsala University, Sweden