🤖 AI Summary
Large language models (LLMs) frequently exhibit spatial hallucinations—factually incorrect or geometrically inconsistent geographic assertions—when generating geospatial knowledge. Method: This paper introduces the first systematic evaluation framework specifically designed for geospatial hallucinations, leveraging a structured geographic knowledge graph to enable multi-dimensional, quantitative assessment. It proposes a Kahneman–Tversky-inspired dynamic factual alignment mechanism, integrated with KTO (Kahneman–Tversky Optimization) to real-time calibrate model outputs against ground-truth geographic facts during inference. Contribution/Results: Evaluated across 20 mainstream LLMs, the approach achieves an average 29.6% improvement in geographic knowledge accuracy, significantly enhancing model fidelity and robustness in location reasoning and spatial relation judgment. This work bridges critical gaps in both systematic evaluation and controllable mitigation of geospatial hallucinations.
📝 Abstract
Large language models (LLMs) possess extensive world knowledge, including geospatial knowledge, which has been successfully applied to various geospatial tasks such as mobility prediction and social indicator prediction. However, LLMs often generate inaccurate geospatial knowledge, leading to geospatial hallucinations (incorrect or inconsistent representations of geospatial information) that compromise their reliability. While the phenomenon of general knowledge hallucination in LLMs has been widely studied, the systematic evaluation and mitigation of geospatial hallucinations remain largely unexplored. To address this gap, we propose a comprehensive evaluation framework for geospatial hallucinations, leveraging structured geospatial knowledge graphs for controlled assessment. Through extensive evaluation across 20 advanced LLMs, we uncover the hallucinations in their geospatial knowledge. Building on these insights, we introduce a dynamic factuality aligning method based on Kahneman-Tversky Optimization (KTO) to mitigate geospatial hallucinations in LLMs, leading to a performance improvement of over 29.6% on the proposed benchmark. Extensive experimental results demonstrate the effectiveness of our benchmark and learning algorithm in enhancing the trustworthiness of LLMs in geospatial knowledge and reasoning tasks.