🤖 AI Summary
Existing research on assessing the socioeconomic impacts of climate disasters using textual data often suffers from ambiguous impact definitions, inadequate handling of spatiotemporal biases, and inconsistent modeling strategies, which undermine result transparency and comparability. This study addresses these limitations by systematically integrating large-scale textual sources—including news articles, social media posts, and official reports—and proposes the first standardized methodological framework tailored to this task. The framework explicitly defines impact criteria, corrects for spatiotemporal biases, and standardizes model selection. Leveraging natural language processing and large language models within a “text-as-data” paradigm, the work introduces a reproducible, transparent, and comparable set of best practices for extracting and quantifying disaster-related information, thereby significantly enhancing the accuracy of climate disaster impact assessment and attribution studies.
📝 Abstract
Recent advances in natural language processing (NLP) and large language models (LLMs) have enabled the systematic use of large-scale textual data from news, social media, and reports to create datasets with socio-economic impacts of climate hazards such as floods, droughts, storms, and multi-hazard events. As the field of text-as-data for impact assessment expands, so does its methodological complexity. Yet research remains fragmented, with no clear guidelines for defining what constitutes an impact, handling temporal and spatial biases, and selecting appropriate modeling and post-processing strategies. This lack of coherence limits transparency and comparability across studies. Here, we address this gap by synthesising common practices, describing key challenges specific to the use of text-as-data methods for analyzing socio-economic impact data, and proposing recommendations to address them. By providing guidance on best practices, we aim to support the construction of robust text-derived socio-economic impact datasets that can more accurately inform disaster risk management and attribution studies.