π€ AI Summary
This study addresses the challenge of disentangling institutional climate advocacy from organic public discourse, a distinction often obscured by prior work that analyzes paid advertisements and social media content in isolation. To this end, we propose the first interpretable, end-to-end topic modeling framework designed for heterogeneous communication environments. Integrating semantic clustering, large language model (LLM)-based topic labeling, dual-track evaluation, stance prediction, and topic-guided retrieval, our approach enables cross-platform comparison of climate narratives in Metaβs paid ads and Bluesky public posts. The framework systematically reveals how platform-specific incentive structures shape the thematic distribution and stance orientation of climate discourse, successfully identifying structural divergences between institutional and grassroots narratives. It further demonstrates dynamic responsiveness during major political events and outperforms conventional topic models in both semantic coherence and downstream task performance.
π Abstract
Climate discourse online plays a crucial role in shaping public understanding of climate change and influencing political and policy outcomes. However, climate communication unfolds across structurally distinct platforms with fundamentally different incentive structures: paid advertising ecosystems incentivize targeted, strategic persuasion, while public social media platforms host largely organic, user-driven discourse. Existing computational studies typically analyze these environments in isolation, limiting our ability to distinguish institutional messaging from public expression. In this work, we present a comparative analysis of climate discourse across paid advertisements on Meta (previously known as Facebook) and public posts on Bluesky from July 2024 to September 2025. We introduce an interpretable, end-to-end thematic discovery and assignment framework that clusters texts by semantic similarity and leverages large language models (LLMs) to generate concise, human-interpretable theme labels. We evaluate the quality of the induced themes against traditional topic modeling baselines using both human judgments and an LLM-based evaluator, and further validate their semantic coherence through downstream stance prediction and theme-guided retrieval tasks. Applying the resulting themes, we characterize systematic differences between paid climate messaging and public climate discourse and examine how thematic prevalence shifts around major political events. Our findings show that platform-level incentives are reflected in the thematic structure, stance alignment, and temporal responsiveness of climate narratives. While our empirical analysis focuses on climate communication, the proposed framework is designed to support comparative narrative analysis across heterogeneous communication environments.