Subnational Geocoding of Global Disasters Using Large Language Models

📅 2025-11-13

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Unstructured, heterogeneous, and inconsistently spelled location descriptions in disaster databases (e.g., EM-DAT) impede subnational geocoding. Method: We propose the first fully automated, GPT-4o–driven geocoding workflow: large language models perform text cleaning and semantic parsing; cross-validated geographic matching integrates GADM, OpenStreetMap, and Wikidata to generate subnational coordinates with reliability scores. Contribution/Results: The method enables flexible, multi-hazard, cross-administrative mapping and introduces the first LLM-powered, multi-source trustworthy geolocation framework. Applied to EM-DAT records from 2000–2024, it successfully geocoded 14,215 disaster events and 17,948 unique locations at subnational resolution, achieving high precision. This significantly enhances spatial comparability, interoperability, and analytical utility of disaster data.

Technology Category

Application Category

📝 Abstract

Subnational location data of disaster events are critical for risk assessment and disaster risk reduction. Disaster databases such as EM-DAT often report locations in unstructured textual form, with inconsistent granularity or spelling, that make it difficult to integrate with spatial datasets. We present a fully automated LLM-assisted workflow that processes and cleans textual location information using GPT-4o, and assigns geometries by cross-checking three independent geoinformation repositories: GADM, OpenStreetMap and Wikidata. Based on the agreement and availability of these sources, we assign a reliability score to each location while generating subnational geometries. Applied to the EM-DAT dataset from 2000 to 2024, the workflow geocodes 14,215 events across 17,948 unique locations. Unlike previous methods, our approach requires no manual intervention, covers all disaster types, enables cross-verification across multiple sources, and allows flexible remapping to preferred frameworks. Beyond the dataset, we demonstrate the potential of LLMs to extract and structure geographic information from unstructured text, offering a scalable and reliable method for related analyses.

Problem

Research questions and friction points this paper is trying to address.

Automating geocoding of unstructured disaster location data from databases

Resolving inconsistent location granularity and spelling in disaster records

Generating reliable subnational geometries for disaster risk assessment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses GPT-4o to process unstructured location text

Cross-checks three geoinformation repositories for geometries

Assigns reliability scores based on source agreement

🔎 Similar Papers

No similar papers found.