ClimaEmpact: Domain-Aligned Small Language Models and Datasets for Extreme Weather Analytics

📅 2025-04-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scarcity of fine-grained data for localized extreme weather impact assessment—hindering vulnerability analysis and decision support—this paper proposes Extreme Weather Reasoning-aware Alignment (EWRA), the first framework to distill structured reasoning paths from large language models into lightweight small language models. We introduce ExtremeWeatherNews, the first large-scale extreme weather news dataset, and its domain-aligned subset, ExtremeAlign. Our approach integrates news structure parsing, domain-adaptive alignment training, and multi-task joint modeling. The resulting domain-aligned small model achieves state-of-the-art performance across three core tasks—vulnerability classification, topic labeling, and sentiment analysis—outperforming task-specific baselines. It significantly enhances factual accuracy, meteorological domain consistency, model interpretability, and inference efficiency, enabling scalable deployment in operational weather risk assessment systems.

Technology Category

Application Category

📝 Abstract
Accurate assessments of extreme weather events are vital for research and policy, yet localized and granular data remain scarce in many parts of the world. This data gap limits our ability to analyze potential outcomes and implications of extreme weather events, hindering effective decision-making. Large Language Models (LLMs) can process vast amounts of unstructured text data, extract meaningful insights, and generate detailed assessments by synthesizing information from multiple sources. Furthermore, LLMs can seamlessly transfer their general language understanding to smaller models, enabling these models to retain key knowledge while being fine-tuned for specific tasks. In this paper, we propose Extreme Weather Reasoning-Aware Alignment (EWRA), a method that enhances small language models (SLMs) by incorporating structured reasoning paths derived from LLMs, and ExtremeWeatherNews, a large dataset of extreme weather event-related news articles. EWRA and ExtremeWeatherNews together form the overall framework, ClimaEmpact, that focuses on addressing three critical extreme-weather tasks: categorization of tangible vulnerabilities/impacts, topic labeling, and emotion analysis. By aligning SLMs with advanced reasoning strategies on ExtremeWeatherNews (and its derived dataset ExtremeAlign used specifically for SLM alignment), EWRA improves the SLMs' ability to generate well-grounded and domain-specific responses for extreme weather analytics. Our results show that the approach proposed guides SLMs to output domain-aligned responses, surpassing the performance of task-specific models and offering enhanced real-world applicability for extreme weather analytics.
Problem

Research questions and friction points this paper is trying to address.

Addresses scarcity of localized extreme weather data globally
Enhances small language models for extreme weather analytics
Improves categorization, topic labeling, and emotion analysis tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhances SLMs with structured reasoning from LLMs
Introduces ExtremeWeatherNews dataset for weather analytics
Aligns SLMs for domain-specific extreme weather tasks