Seeing Candidates at Scale: Multimodal LLMs for Visual Political Communication on Instagram

📅 2026-04-21

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This study addresses the challenge of efficiently and accurately identifying prominent political candidates and counting individuals in Instagram visual content during the 2021 German federal election. It pioneers the integration of the multimodal large language model GPT-4o into political visual communication analysis, complemented by established computer vision techniques including FaceNet512, RetinaFace, and Google Cloud Vision. Experimental results demonstrate that GPT-4o achieves a macro F1 score of 0.89 for face recognition and 0.86 for person counting on Instagram Stories, substantially outperforming existing approaches. These findings underscore the innovative potential and superior performance of multimodal large models in analyzing political imagery, offering a significant advancement for computational methods in political communication research.

Technology Category

Application Category

📝 Abstract

This paper presents a computational case study that evaluates the capabilities of specialized machine learning models and emerging multimodal large language models for Visual Political Communication (VPC) analysis. Focusing on concentrated visibility in Instagram stories and posts during the 2021 German federal election campaign, we compare the performance of traditional computer vision models (FaceNet512, RetinaFace, Google Cloud Vision) with a multimodal large language model (GPT-4o) in identifying front-runner politicians and counting individuals in images. GPT-4o outperformed the other models, achieving a macro F1-score of 0.89 for face recognition and 0.86 for person counting in stories. These findings demonstrate the potential of advanced AI systems to scale and refine visual content analysis in political communication while highlighting methodological considerations for future research.

Problem

Research questions and friction points this paper is trying to address.

Visual Political Communication

multimodal LLMs

face recognition

person counting

Instagram

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal large language models

visual political communication

GPT-4o