Subnational Geocoding of Global Disasters Using Large Language Models

πŸ“… 2025-11-13
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Unstructured, heterogeneous, and inconsistently spelled location descriptions in disaster databases (e.g., EM-DAT) impede subnational geocoding. Method: We propose the first fully automated, GPT-4o–driven geocoding workflow: large language models perform text cleaning and semantic parsing; cross-validated geographic matching integrates GADM, OpenStreetMap, and Wikidata to generate subnational coordinates with reliability scores. Contribution/Results: The method enables flexible, multi-hazard, cross-administrative mapping and introduces the first LLM-powered, multi-source trustworthy geolocation framework. Applied to EM-DAT records from 2000–2024, it successfully geocoded 14,215 disaster events and 17,948 unique locations at subnational resolution, achieving high precision. This significantly enhances spatial comparability, interoperability, and analytical utility of disaster data.

Technology Category

Application Category

πŸ“ Abstract
Subnational location data of disaster events are critical for risk assessment and disaster risk reduction. Disaster databases such as EM-DAT often report locations in unstructured textual form, with inconsistent granularity or spelling, that make it difficult to integrate with spatial datasets. We present a fully automated LLM-assisted workflow that processes and cleans textual location information using GPT-4o, and assigns geometries by cross-checking three independent geoinformation repositories: GADM, OpenStreetMap and Wikidata. Based on the agreement and availability of these sources, we assign a reliability score to each location while generating subnational geometries. Applied to the EM-DAT dataset from 2000 to 2024, the workflow geocodes 14,215 events across 17,948 unique locations. Unlike previous methods, our approach requires no manual intervention, covers all disaster types, enables cross-verification across multiple sources, and allows flexible remapping to preferred frameworks. Beyond the dataset, we demonstrate the potential of LLMs to extract and structure geographic information from unstructured text, offering a scalable and reliable method for related analyses.
Problem

Research questions and friction points this paper is trying to address.

Automating geocoding of unstructured disaster location data from databases
Resolving inconsistent location granularity and spelling in disaster records
Generating reliable subnational geometries for disaster risk assessment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses GPT-4o to process unstructured location text
Cross-checks three geoinformation repositories for geometries
Assigns reliability scores based on source agreement
πŸ”Ž Similar Papers
No similar papers found.
Michele Ronco
Michele Ronco
Joint Research Centre - European Commission
Artificial IntelligenceDisaster Risk ManagementFood Insecurity
D
Damien Delforge
Institute of Health and Society (IRSS), University of Louvain (UCLouvain), Clos Chapelle-aux-champs 30, Woluwe St Lambert, 1200, Brussels, Belgium
W
Wiebke S. Jager
Institute for Environmental Studies, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
C
Christina Corbane
European Commission, Joint Research Centre, Ispra, 21027, Italy