From Codebooks to VLMs: Evaluating Automated Visual Discourse Analysis for Climate Change on Social Media

πŸ“… 2026-04-23
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

187K/year
πŸ€– AI Summary
This study addresses the need for efficient analysis of climate change–related images on social media to identify public communication strategies by developing an application-oriented classification framework and systematically evaluating the suitability of various vision-language models (VLMs) for climate visual discourse analysis. Through multidimensional annotation of six promptable VLMs and fifteen zero-shot CLIP variants on two X (formerly Twitter) datasets, the work introduces a distribution-level evaluation paradigm, demonstrating that VLMs can reliably recover aggregate trends even with moderate single-image accuracy. The findings reveal that task-specific prompts outperform generic ones, while chain-of-thought reasoning degrades performance. Among the models tested, Gemini-3.1-flash-lite achieves the best results. The project publicly releases tweet IDs, labels, and code, establishing a foundational toolkit for large-scale climate visual discourse research.

Technology Category

Application Category

πŸ“ Abstract
Social media platforms have become primary arenas for climate communication, generating millions of images and posts that - if systematically analysed - can reveal which communication strategies mobilise public concern and which fall flat. We aim to facilitate such research by analysing how computer vision methods can be used for social media discourse analysis. This analysis includes application-based taxonomy design, model selection, prompt engineering, and validation. We benchmark six promptable vision-language models and 15 zero-shot CLIP-like models on two datasets from X (formerly Twitter) - a 1,038-image expert-annotated set and a larger corpus of over 1.2 million images, with 50,000 labels manually validated - spanning five annotation dimensions: animal content, climate change consequences, climate action, image setting, and image type. Among the models benchmarked, Gemini-3.1-flash-lite outperforms all others across all super-categories and both datasets, while the gap to open-weight models of moderate size remains relatively small. Beyond instance-level metrics, we advocate for distributional evaluation: VLM predictions can reliably recover population level trends even when per-image accuracy is moderate, making them a viable starting point for discourse analysis at scale. We find that chain-of-thought reasoning reduces rather than improves performance, and that annotation dimension specific prompt design improves performance. We release tweet IDs and labels along with our code at https://github.com/KathPra/Codebooks2VLMs.git.
Problem

Research questions and friction points this paper is trying to address.

visual discourse analysis
climate change
social media
computer vision
vision-language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

vision-language models
automated visual discourse analysis
distributional evaluation
prompt engineering
climate change communication