🤖 AI Summary
Existing OSM query tools (e.g., Overpass Turbo) rely on complex domain-specific query languages, hindering efficient geospatial verification by non-technical users such as investigative journalists. To address this, we propose the first high-accuracy natural language–to–geospatial query system for OpenStreetMap. Our method combines LLM fine-tuning, an OSM-specific synthetic data generation pipeline, a semantic bundling mechanism, and a structured geospatial query parser—jointly mitigating LLM hallucination, OSM tag heterogeneity, and user input noise. The system enables direct retrieval of spatial object configurations from scenario-based natural language descriptions and integrates interactive map visualization. Experiments demonstrate significant improvements in both accuracy and accessibility for geospatial verification; real-world deployment in investigative journalism confirms practical efficacy. The code and models are open-sourced and actively used in production.
📝 Abstract
OpenStreetMap (OSM) is a vital resource for investigative journalists doing geolocation verification. However, existing tools to query OSM data such as Overpass Turbo require familiarity with complex query languages, creating barriers for non-technical users. We present SPOT, an open source natural language interface that makes OSM's rich, tag-based geographic data more accessible through intuitive scene descriptions. SPOT interprets user inputs as structured representations of geospatial object configurations using fine-tuned Large Language Models (LLMs), with results being displayed in an interactive map interface. While more general geospatial search tasks are conceivable, SPOT is specifically designed for use in investigative journalism, addressing real-world challenges such as hallucinations in model output, inconsistencies in OSM tagging, and the noisy nature of user input. It combines a novel synthetic data pipeline with a semantic bundling system to enable robust, accurate query generation. To our knowledge, SPOT is the first system to achieve reliable natural language access to OSM data at this level of accuracy. By lowering the technical barrier to geolocation verification, SPOT contributes a practical tool to the broader efforts to support fact-checking and combat disinformation.