An Image is Worth $K$ Topics: A Visual Structural Topic Model with Pretrained Image Embeddings

📅 2025-04-14

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

Political communication research urgently requires scalable, interpretable image analysis methods capable of covariate modeling; however, existing vision models fail to simultaneously satisfy semantic richness and social-scientific inferential requirements. This paper introduces the visual Structured Topic Model (vSTM), the first framework to integrate pretrained image embeddings—such as those from CLIP—into the structured topic modeling (STM) paradigm. Leveraging variational inference, vSTM produces multi-topic mixture representations for images while enabling explicit topic–covariate association analysis. By preserving fine-grained visual semantics, vSTM yields both interpretable topic distributions and statistically grounded estimates of covariate effects. Experiments on large-scale online political communication data demonstrate that vSTM significantly outperforms conventional visual topic modeling approaches in topic coherence, interpretability, and substantive relevance. It thus establishes a novel, principled paradigm for large-scale visual content analysis in political science.

Technology Category

Application Category

📝 Abstract

Political scientists are increasingly interested in analyzing visual content at scale. However, the existing computational toolbox is still in need of methods and models attuned to the specific challenges and goals of social and political inquiry. In this article, we introduce a visual Structural Topic Model (vSTM) that combines pretrained image embeddings with a structural topic model. This has important advantages compared to existing approaches. First, pretrained embeddings allow the model to capture the semantic complexity of images relevant to political contexts. Second, the structural topic model provides the ability to analyze how topics and covariates are related, while maintaining a nuanced representation of images as a mixture of multiple topics. In our empirical application, we show that the vSTM is able to identify topics that are interpretable, coherent, and substantively relevant to the study of online political communication.

Problem

Research questions and friction points this paper is trying to address.

Analyzing visual content for political science research

Combining pretrained image embeddings with topic modeling

Improving interpretability of visual political communication topics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines pretrained image embeddings with topic model

Captures semantic complexity of political images

Analyzes topic-covariate relations with nuanced representations

🔎 Similar Papers

GINopic: Topic Modeling with Graph Isomorphism Network