🤖 AI Summary
This study addresses the limitations of existing geographic information systems (GIS) and knowledge graphs in handling the vast number of non-location-centric geographic queries prevalent in web search. For the first time, it conducts an unbiased analysis of 1.01 million real-world Bing queries from MS MARCO without relying on predefined toponyms or spatial keyword filters, identifying 18.0% (181,827) as geospatial queries. The authors introduce a fine-grained taxonomy comprising 88 categories and employ dense sentence embeddings, a lightweight SetFit classifier, and density-based clustering to characterize query intent. Results reveal that transactional information—particularly cost- and price-related queries, which account for 15.3%—dominates user intent, significantly surpassing all natural geography topics combined. The project publicly releases the annotated dataset, classifier, and taxonomy to establish a new benchmark for hybrid retrieval and geospatial reasoning with large language models.
📝 Abstract
Web search queries concern place far more often than existing labelling schemes suggest, yet the landscape of geospatial web search queries - what people ask of place, and how often - remains poorly characterised at scale. We apply dense sentence embeddings, a lightweight SetFit classifier, and density-based clustering to the full MS MARCO corpus of 1.01 million real Bing queries without prior filtering for toponyms or spatial keywords, identifying 181,827 geospatial queries (18.0%), nearly threefold the 6.17% labelled as Location in the original annotations. The resulting taxonomy of 88 query categories reveals that geospatial web search is dominated by transactional and practical lookups: costs and prices alone account for 15.3% of geospatial queries, nearly twice the size of the entire physical geography theme. Much of this activity - costs, opening hours, contact details, weather, travel recommendations - falls outside the scope traditional GIS systems and knowledge graphs are built to serve. The categories vary substantially in the kind of answer they admit, from deterministic lookups answerable from spatial databases or knowledge graphs to evaluative or temporally volatile queries that require generative or real-time systems. We discuss implications for hybrid retrieval architectures and for benchmarks of geographic reasoning in large language models. We openly release the labelled dataset, classifier, and taxonomy.