🤖 AI Summary
This study addresses the lack of large-scale, multimodal datasets for systematically analyzing diverse public perspectives and interaction dynamics surrounding climate change. The authors introduce ClimateChat-300K, a novel dataset comprising nearly 300,000 publicly available Facebook posts from over 26,000 pages worldwide between 2020 and 2024, along with their multimodal content and metadata. Through integrated topic modeling, sentiment analysis, and engagement metrics, the research identifies ten core themes across five domains—policy, action, collaboration, science, and conservation—and demonstrates that visually rich and emotionally charged content significantly enhances public engagement. As the first long-term, multilingual, geographically and institutionally diverse dataset on climate communication, ClimateChat-300K is openly released to support reproducible research on polarization, misinformation, and digital discourse dynamics.
📝 Abstract
We present ClimateChat-300K, a large-scale dataset of 299,329 public Facebook posts about climate change collected between May 2020 and May 2024 through the CrowdTangle platform. The dataset contains 41 metadata features including post content, engagement metrics, and page attributes, covering material from more than 26,000 global pages. Each post includes rich contextual information such as language, timestamp, page category, and interaction counts, enabling comprehensive analyses of public discourse around climate communication. Using topic modeling and sentiment analysis, we identify ten main themes grouped into five domains: policy, activism, cooperation, science, and conservation. The results reveal that emotional tone, post format, and page identity strongly influence audience engagement, with visually rich and emotionally charged content receiving the highest levels of interaction. The dataset also demonstrates how online discussions evolved in response to major events such as international climate summits and the COVID-19 pandemic period. ClimateChat-300K provides an open resource for reproducible and interdisciplinary research on polarization, misinformation, and the dynamics of digital climate discourse. By releasing this dataset, we aim to support transparent, data-driven research and contribute to a deeper un-derstanding of how public engagement with climate issues develops across time, geography, and institutional contexts.