🤖 AI Summary
This study addresses three key limitations of large language models (LLMs) in weather report generation: poor interpretability, weak multi-scale temporal consistency, and insufficient factual accuracy. To this end, we propose a hierarchical LLM-agent meteorological system that integrates hourly, 6-hourly, and daily time-series analysis to establish a multi-granularity reasoning architecture. We further introduce a keyword-guided semantic anchoring verification mechanism to explicitly enforce temporal coherence and factual alignment. The system combines structured meteorological data processing (from OpenWeather and Meteostat), multi-scale temporal modeling, and controllable natural language generation. Experimental results demonstrate significant improvements in factual accuracy (+12.7%) and cross-scale consistency. Moreover, we present the first reproducible, semantics-aware evaluation framework tailored to meteorological reporting—laying foundational groundwork for trustworthy, automated weather forecasting and dissemination.
📝 Abstract
We present the Hierarchical AI-Meteorologist, an LLM-agent system that generates explainable weather reports using a hierarchical forecast reasoning and weather keyword generation. Unlike standard approaches that treat forecasts as flat time series, our framework performs multi-scale reasoning across hourly, 6-hour, and daily aggregations to capture both short-term dynamics and long-term trends. Its core reasoning agent converts structured meteorological inputs into coherent narratives while simultaneously extracting a few keywords effectively summarizing the dominant meteorological events. These keywords serve as semantic anchors for validating consistency, temporal coherence and factual alignment of the generated reports. Using OpenWeather and Meteostat data, we demonstrate that hierarchical context and keyword-based validation substantially improve interpretability and robustness of LLM-generated weather narratives, offering a reproducible framework for semantic evaluation of automated meteorological reporting and advancing agent-based scientific reasoning.