Climate Knowledge in Large Language Models

📅 2025-10-09

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This study systematically evaluates the endogenous memory capability of large language models (LLMs) for climatological normals—specifically, July 2-meter air temperature over global land for 1991–2020—without external retrieval. We construct a standardized, 1°-resolution query benchmark grounded in ERA5 reanalysis data, incorporating both geographic coordinates and natural-language descriptions. Prediction errors are quantified across elevation and latitude bands for multiple LLMs. Results show that LLMs capture broad spatial temperature patterns with RMSEs of 3–6°C; incorporating geographic context reduces mean error by 27%. However, errors increase markedly at high elevations and LLMs completely fail to reproduce the spatial heterogeneity of recent warming trends. This work introduces the first reproducible framework for assessing endogenous climate knowledge in LLMs. It reveals that while LLMs possess rudimentary geospatial–climatic representations, they lack deep physical understanding of climate dynamics and regional variability—establishing critical capability boundaries for AI-driven climate science applications.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) are increasingly deployed for climate-related applications, where understanding internal climatological knowledge is crucial for reliability and misinformation risk assessment. Despite growing adoption, the capacity of LLMs to recall climate normals from parametric knowledge remains largely uncharacterized. We investigate the capacity of contemporary LLMs to recall climate normals without external retrieval, focusing on a prototypical query: mean July 2-m air temperature 1991-2020 at specified locations. We construct a global grid of queries at 1° resolution land points, providing coordinates and location descriptors, and validate responses against ERA5 reanalysis. Results show that LLMs encode non-trivial climate structure, capturing latitudinal and topographic patterns, with root-mean-square errors of 3-6 °C and biases of $pm$1 °C. However, spatially coherent errors remain, particularly in mountains and high latitudes. Performance degrades sharply above 1500 m, where RMSE reaches 5-13 °C compared to 2-4 °C at lower elevations. We find that including geographic context (country, city, region) reduces errors by 27% on average, with larger models being most sensitive to location descriptors. While models capture the global mean magnitude of observed warming between 1950-1974 and 2000-2024, they fail to reproduce spatial patterns of temperature change, which directly relate to assessing climate change. This limitation highlights that while LLMs may capture present-day climate distributions, they struggle to represent the regional and local expression of long-term shifts in temperature essential for understanding climate dynamics. Our evaluation framework provides a reproducible benchmark for quantifying parametric climate knowledge in LLMs and complements existing climate communication assessments.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' ability to recall climate normals without external retrieval

Assessing spatial accuracy of LLM climate knowledge across global locations

Identifying limitations in reproducing spatial patterns of temperature change

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluating LLMs' parametric climate knowledge recall

Using global grid queries with ERA5 validation

Assessing model performance with geographic context enhancement

🔎 Similar Papers

CLLMate: A Multimodal Benchmark for Weather and Climate Events Forecasting